WO2017215622A1 - 物体分割方法及装置、计算设备 - Google Patents

物体分割方法及装置、计算设备 Download PDF

Info

Publication number
WO2017215622A1
WO2017215622A1 PCT/CN2017/088380 CN2017088380W WO2017215622A1 WO 2017215622 A1 WO2017215622 A1 WO 2017215622A1 CN 2017088380 W CN2017088380 W CN 2017088380W WO 2017215622 A1 WO2017215622 A1 WO 2017215622A1
Authority
WO
WIPO (PCT)
Prior art keywords
local candidate
candidate regions
image
local
segmentation
Prior art date
Application number
PCT/CN2017/088380
Other languages
English (en)
French (fr)
Inventor
石建萍
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2017215622A1 publication Critical patent/WO2017215622A1/zh
Priority to US15/857,304 priority Critical patent/US10489913B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30261Obstacle

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular, to an object segmentation method and apparatus, and a computing device.
  • Image segmentation is a fundamental problem in the field of image processing, and has a wide range of applications in the fields of object recognition, robot navigation, and scene understanding.
  • Image segmentation techniques can be used to separate different objects in an image from each other. Quickly segmenting objects in an image and determining the boundaries of the object is very important for image segmentation.
  • the present disclosure provides an object segmentation scheme.
  • an object segmentation method comprising the steps of:
  • multiple local candidate regions are respectively selected according to two or more different preset scales
  • Two or more local candidate regions are subjected to fusion processing according to two or more object classification categories to which the local candidate region belongs and two or more local candidate regions, to obtain an object segmentation image.
  • an object segmentation apparatus comprising the following modules:
  • a local candidate region generating module configured to select a plurality of local candidate regions according to two or more different preset scales for an image to be processed
  • An image segmentation module configured to perform image segmentation processing on two or more local candidate regions, and predict a binary segmentation mask of the local candidate region;
  • An image classification module configured to perform image classification processing on two or more local candidate regions, and predict an object category to which the local candidate region belongs;
  • An image fusion module configured to perform fusion processing on two or more local candidate regions according to an object classification to which two or more local candidate regions belong and a binary segmentation mask of two or more local candidate regions, to obtain an object Split the image.
  • a computing device including: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface, and the memory through the communication bus Complete mutual communication;
  • the memory is for storing at least one instruction; the instruction causes the processor to perform the following operations:
  • multiple local candidate regions are respectively selected according to two or more different preset scales
  • Two or more local candidate regions are subjected to fusion processing according to two or more object classification categories to which the local candidate region belongs and two or more local candidate regions, to obtain an object segmentation image.
  • a computer storage medium for storing computer readable instructions, the instructions comprising: selecting, for an image to be processed, two or more different preset scales An instruction of a plurality of local candidate regions; performing image segmentation processing on two or more local candidate regions, predicting an instruction of a binary segmentation mask of the local candidate region; performing image classification processing on two or more local candidate regions, and predicting Obtaining an instruction of an object class to which the local candidate region belongs; and two or more localizations according to an object class to which the two or more local candidate regions belong and a binary segmentation mask of two or more of the local candidate regions
  • the candidate region is subjected to fusion processing to obtain an instruction for dividing the image of the object.
  • the technical solution provided by the present disclosure uses a multi-scale local candidate region generation method, and utilizes multi-scale features of an image, which is beneficial to improve fault tolerance of the object segmentation technology; the present disclosure can segment an object while detecting an object, and Determine its exact boundaries. By dividing the local candidate region, the present disclosure uses an effective local region fusion method to obtain an object segmentation effect after obtaining the segmentation result of the local candidate region.
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of another application scenario of an embodiment of the present disclosure.
  • FIG. 3 shows a block diagram of an exemplary device that implements an embodiment of the present disclosure
  • FIG. 4 shows a block diagram of another exemplary device that implements an embodiment of the present disclosure
  • FIG. 5 is a flow chart showing an object segmentation method provided by the present disclosure
  • FIG. 6 shows another flow chart of the object segmentation method provided by the present disclosure
  • FIG. 7 is a schematic diagram showing a network model of an object segmentation method provided by the present disclosure.
  • FIG. 8 is a schematic diagram showing an overlap of local candidate regions provided by the present disclosure.
  • FIG. 9 is a flow chart showing the fusion processing of all local candidate regions provided by the present disclosure.
  • FIG. 10 is a block diagram showing a functional structure of an object segmentation apparatus provided by the present disclosure.
  • FIG. 11 is a block diagram showing another functional structure of an object dividing device provided by the present disclosure.
  • FIG. 12 shows a block diagram of a computing device for performing an object segmentation method in accordance with an embodiment of the present disclosure
  • FIG. 13 illustrates a storage unit for holding or carrying program code that implements an object segmentation method in accordance with the present disclosure.
  • FIG. 1 schematically illustrates an application scenario in which it may be implemented in accordance with the present disclosure.
  • a driving assistance system is installed in the automobile 1, and the driving assistance system in the automobile 1 needs to divide an object such as a pedestrian 2, a vehicle, and a traffic light 3 in a road environment presented by the taken image, so as to be better. Identifying the road environment in the image, for example, the image features of several connected objects on the road look very similar to the shape of the vehicle. However, by dividing the object, several connected objects can be separated to facilitate Accurately identify objects on the road.
  • FIG. 2 schematically illustrates another application scenario in which it may be implemented in accordance with the present disclosure.
  • Fig. 2 four chairs 20 are wrapped around the square table 21.
  • the robot 22 needs four chairs 20 in the image collected by the image capturing device.
  • the square table 21 divides the object to accurately identify the chair 20 that needs to be moved or the square table 21 that needs to be moved.
  • the present disclosure proposes an object segmentation scheme.
  • the present disclosure forms a local candidate region by using a multi-scale local candidate region generation manner in an object segmentation scheme for an image, and fully utilizes multi-scale features of an image, thereby dividing the object of the present disclosure.
  • the technology has certain fault tolerance; the present disclosure performs image segmentation processing on the local candidate regions while performing image segmentation processing on the local candidate regions, thereby realizing segmentation of the individual objects in the image while detecting the object; By obtaining the segmentation result of the local candidate region and the object class to which the local candidate region belongs, and using the obtained segmentation result and the object class to fuse two or more local candidate regions, an object segmentation based on multi-level local region fusion is formed.
  • the object segmentation technology of the present disclosure is beneficial to improve the object segmentation effect.
  • FIG. 3 illustrates a block diagram of an exemplary device 30 (eg, a computer system/server) suitable for implementing the present disclosure.
  • the device 30 shown in FIG. 3 is merely an example and should not impose any limitation on the function and scope of use of the present disclosure.
  • device 30 can be embodied in the form of a general purpose computing device.
  • Components of device 30 may include, but are not limited to, one or more processing units 301 (ie, processors), system memory 302, and a bus 303 that connects different system components, including system memory 302 and processing unit 301.
  • Device 30 can include a variety of computer system readable media. These media can be any available media that can be accessed by device 30, including volatile and non-volatile media, removable and non-removable media, and the like.
  • System memory 302 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 3021 and/or cache memory 3022. Device 30 may further include other removable/non-removable, Volatile/non-volatile computer system storage media.
  • ROM 3023 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 3, commonly referred to as "hard disk drives").
  • system memory 302 can provide disk drives for reading and writing to removable non-volatile disks (eg, "floppy disks"), as well as removable non-volatile optical disks (eg, CD-ROMs). , DVD-ROM or other optical media) read and write optical drive.
  • each drive can be coupled to bus 303 via one or more data medium interfaces.
  • At least one program product can be included in system memory 302, the program product having a set (eg, at least one) of program modules configured to perform the functions of the present disclosure.
  • Program module 3024 typically performs the functions and/or methods described in this disclosure.
  • Device 30 may also be in communication with one or more external devices 304 (eg, a keyboard, pointing device, display, etc.). Such communication may occur through an input/output (I/O) interface 305, and device 30 may also be through network adapter 306 with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, For example, the Internet) communication. As shown in FIG. 3, network adapter 306 communicates with other modules of device 30 (e.g., processing unit 301, etc.) via bus 303. It should be understood that although not shown in FIG. 3, other hardware and/or software modules may be utilized by device 30.
  • I/O input/output
  • network adapter 306 communicates with other modules of device 30 (e.g., processing unit 301, etc.) via bus 303. It should be understood that although not shown in FIG. 3, other hardware and/or software modules may be utilized by device 30.
  • the processing unit 301 executes various functional applications and data processing by executing a computer program stored in the system memory 302, for example, executing instructions for implementing the steps in the above methods; in particular, the processing unit 301 can execute the system
  • the computer program is stored in the memory 302, and the computer program is executed, the following steps are executed: for a to-be-processed image, a plurality of local candidate regions are respectively selected according to two or more different preset scales; Performing image segmentation processing on two or more local candidate regions, predicting a binary segmentation mask of the local candidate region; performing image classification processing on two or more local candidate regions, and predicting an object class to which the local candidate region belongs; Two or more local candidate regions are subjected to fusion processing according to two or more object classification categories to which the local candidate region belongs and two or more local candidate regions, to obtain an object segmentation image.
  • computer system 400 includes one or more processors, communication units, etc., which may be: one or more central processing units (CPUs) 401, and/or one or more An image processor (GPU) 413 or the like, the processor may execute various kinds according to executable instructions stored in a read only memory (ROM) 402 or executable instructions loaded from the storage portion 408 into the random access memory (RAM) 403. Proper action and handling.
  • the communication unit 412 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with read only memory 402 and/or random access memory 430 to execute executable instructions, connect to communication portion 412 via bus 404, and communicate with other target devices via communication portion 412 to accomplish the corresponding in this disclosure. step.
  • the processor performs the steps of: for a to-be-processed image, the processor selects multiple local candidate regions according to two or more different preset sizes; the processor pairs two or more portions.
  • the candidate region performs image segmentation processing, and predicts a binary segmentation mask of the local candidate region; the processor performs image classification processing on two or more local candidate regions, and predicts an object class to which the local candidate region belongs; Two or more of the object class to which the local candidate region belongs and a binary segmentation mask of two or more of the partial candidate regions, for two or The above partial candidate regions are subjected to fusion processing to obtain an object segmentation image.
  • RAM 403 various programs and data required for the operation of the device can be stored.
  • the CPU 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • ROM 402 is an optional module.
  • the RAM 403 stores executable instructions or writes executable instructions to the ROM 402 at runtime, the executable instructions causing the central processing unit 401 to perform the steps included in the object segmentation method described above.
  • An input/output (I/O) interface 405 is also coupled to bus 404.
  • the communication unit 412 may be integrated, or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and connected to the bus, respectively.
  • the following components are connected to the I/O interface 405: an input portion 406 including a keyboard, a mouse, etc.; an output portion 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 408 including a hard disk or the like. And a communication portion 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the Internet.
  • Driver 410 is also coupled to I/O interface 405 as needed.
  • a removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 410 as needed so that a computer program read therefrom is installed in the storage portion 408 as needed.
  • FIG. 4 is only an optional implementation manner.
  • the number and type of components in the foregoing FIG. 4 may be selected, deleted, added, or replaced according to actual needs; Different function components can be set up, such as separate settings or integrated settings.
  • the GPU and the CPU can be separated, and the GPU can be integrated on the CPU.
  • the communication unit can be separated or integrated. CPU or GPU, etc.
  • embodiments of the present disclosure include a computer program product comprising tangibly embodied on a machine readable medium
  • Computer program comprising program code for performing the steps shown in the flowchart, the program code comprising instructions corresponding to performing the steps provided by the application, for example, for an image to be processed, according to two or The plurality of different preset sizes respectively select an instruction of a plurality of local candidate regions; the image segmentation processing is performed on two or more local candidate regions, and the instruction of the binary segmentation mask of the local candidate region is predicted; Performing image classification processing on two or more local candidate regions, predicting an instruction of an object category to which the local candidate region belongs; and using two or more object categories according to two or more local candidate regions Binary partition mask for local candidate regions, for two or More topical candidate region fused to give an instruction object image segmentation.
  • the computer program can be downloaded and installed from the network via the communication portion 409, and/or installed from the removable medium 411.
  • the computer program is executed by the central processing unit (CPU) 401, the above-described instructions described in the present disclosure are executed.
  • step S101 for an image to be processed, the processor selects a plurality of local candidate regions according to two or more different preset scales.
  • Step S101 can be performed by the local candidate region generation module 60 that is executed by the processor.
  • the present disclosure proposes a multi-scale local candidate region generation scheme, in which an object in an image to be processed can be split into a plurality of local candidate regions for learning.
  • the present disclosure simultaneously selects the selected local candidate regions as processing objects for subsequent image segmentation and image classification.
  • Step S102 The processor performs image segmentation processing on two or more local candidate regions, and predicts a binary segmentation mask of the local candidate region. Step S102 can be performed by an image segmentation module 61 that is executed by the processor.
  • the processor performs image segmentation processing on each local candidate region with a local candidate region as an input processing object, and predicts a binary mask of each local candidate region.
  • Step S103 The processor performs image classification processing on two or more local candidate regions, and predicts an object category to which the local candidate region belongs.
  • Step S103 can be performed by image classification module 62 that is executed by the processor.
  • the processor uses the local candidate region as an input processing object, performs image classification processing on each local candidate region, and predicts an object category to which each local candidate region belongs.
  • the processor may perform the above steps S102 and S103 at the same time, or may be performed sequentially.
  • the present disclosure does not limit the order in which the processor performs the two steps.
  • Step S104 The processor divides two or more according to a binary classification mask of two or more local candidate regions (for example, all local candidate regions) and two or more local candidate regions (for example, all local candidate regions)
  • the local candidate regions are subjected to fusion processing to obtain an object segmentation image, that is, an individual segmentation result of the object is obtained.
  • Step S104 can be performed by image fusion module 63 that is executed by the processor.
  • the processor combines the local segmentation result obtained by the local candidate region generated by the multi-scale local candidate region generation scheme with the object local segmentation result, and finally obtains the object individual segmentation result of the entire image.
  • the object segmentation technique provided by the present disclosure uses a multi-scale local candidate region generation method to make the object segmentation technology have certain fault tolerance by utilizing the multi-scale feature of the image; the present disclosure can detect the object in the image while the image is The individual objects in the segment are segmented and their boundaries are determined. By dividing the local candidate region, the present disclosure can accurately determine the individual object in the image by using the local region fusion method after obtaining the segmentation result of the local candidate region.
  • step S201 the processor processes the image 3-0 to be processed by convolutional neural network convolution layer 3-1 and/or pooling layer processing to obtain a convolutional neural network intermediate result 3-2.
  • Step S201 can be performed by a convolutional neural network computing module 64 that is executed by the processor.
  • the image to be processed 30 described above may specifically be an image of 384 ⁇ 384 ⁇ 3, of which 384 ⁇ 384 represents the size of the image 3-0 to be processed, and 3 represents the number of channels (for example, R, G, B).
  • the present disclosure does not limit the size of the image 30 to be processed.
  • a non-linear response unit is disposed after a portion or each convolutional layer.
  • the nonlinear response unit adopts a Rectified Linear Units (hereinafter referred to as ReLU).
  • ReLU Rectified Linear Units
  • the present disclosure can minimize the mapping result of the convolutional layer by simulating the ReLU after the convolutional layer to simulate a human visual response. Thereby, it is advantageous to improve the image processing effect.
  • the present disclosure may set a convolution kernel of a convolutional layer in a convolutional neural network according to actual conditions. For example, considering the factors that are advantageous for integrating local information, the present disclosure generally convolves a convolution kernel of a convolutional layer in a convolutional neural network.
  • the convolution kernel can also be set to 1 ⁇ 1 or 2 ⁇ 2 or 4 ⁇ 4 or the like.
  • the present disclosure can set the step stride of the pooling layer, so that the upper layer feature is favorable for expanding the visual field without increasing the calculation amount, and the step stride of the pooling layer also has the feature of enhancing the spatial invariance. That is, the same input is allowed to appear at different image positions, and the output response is the same.
  • the convolutional layer of convolutional neural networks is mainly used for information induction and fusion. Max pooling is mainly about the induction of high-level information.
  • the structure of the convolutional neural network can be fine-tuned to accommodate the trade-offs between different performance and efficiency.
  • the number in front is the current number of layers
  • the latter number is the number of input layers, for example, 2.
  • the convolution layer parameters are shown in parentheses after the convolutional layer, for example, 3 ⁇ 3 ⁇ 64, indicating that the convolution kernel size is 3 ⁇ 3, and the number of channels is 64.
  • the pooling layer parameters are shown in parentheses after the pooling layer. For example, 3 ⁇ 3/2 indicates that the pooling kernel size is 3 ⁇ 3 and the interval is 2.
  • FIG. 7 indicates the size of the intermediate result 3-2 of the convolutional neural network, and the size of the intermediate result 3-2 of the convolutional neural network varies with the size of the image 3-0 to be processed. For example, when the size of the image to be processed 3-0 becomes large, the size of the convolutional neural network intermediate result 3-2 also becomes large.
  • the above convolutional neural network intermediate result 3-2 is shared data of subsequent image classification processing and image segmentation processing. Using these convolutional neural network intermediate results 3-2 can greatly reduce the complexity of subsequent processing.
  • Step S202 the processor selects the local candidate region generating layer 3-3 by using the convolutional neural network intermediate result 3-2, and the processor performs two or more on the feature map corresponding to the local candidate region generating layer 3-3 through the sliding frame.
  • a plurality of local candidate regions 3-4 are respectively selected by different preset scales.
  • Step S202 can be performed by local candidate region generation module 60 that is executed by the processor.
  • the present disclosure splits one of the images 3-0 to be processed into a plurality of local candidate regions 3-4 for learning.
  • the present disclosure may select four different preset size local candidate regions 3-4, respectively, 48 ⁇ 48 local candidate regions 3-4 (as shown in the uppermost square of the braces on the right side of FIG. 7), 96 ⁇ 96.
  • the local candidate area 3-4 (the square at the middle of the right side of the braces in Fig. 7), the local candidate area 3-4 of 192 ⁇ 192 (the lowermost square on the right side of the braces in Fig. 7) and 384 ⁇ 384 Local candidate region (not shown in Figure 7 due to omission).
  • This is merely an example, and the present disclosure is not limited to the manner in which it is exemplified.
  • the processor slides on the feature map corresponding to the local candidate region generating layer 3-3 by controlling the sliding frame, and selects the local candidate regions 3-4 by using a plurality of different preset sizes respectively.
  • the disclosure discloses a sliding frame on a feature map, and each feature point in the feature map covered by the sliding frame forms a set of feature points, and the feature points included in the different groups are not completely identical.
  • the above feature map may be a feature map obtained by the processor correspondingly processing the image to be processed 3-0, for example, the processor treats the processing using a VGG16 (Visual Geometry Group) network, GoogleNet (Google Network) or ResNet technology.
  • the image of the image 3-0 is obtained by convolution operation.
  • each partial candidate corresponding feature region Pi grid G i sampled at, G i can be expressed as
  • the sliding frame forms a local candidate region 3-4 and a feature mesh for each sliding on the feature map, and the spatial size of the feature mesh and the local candidate region 3-4 is determined by the sliding frame.
  • the present disclosure utilizes a shared convolutional neural network intermediate result 3-2 from the selected convolutional layer (local candidate region generation layer 3- 3)
  • the local candidate regions 3-4 are selected on the corresponding feature maps by a plurality of different preset scales, and the calculation cost is not increased.
  • the present disclosure can cover as many objects as possible by as many preset scales as possible; each of the local candidate regions 3-4 can cover a part of the object in the image, and does not have to completely contain the object, so each part The information learned in the candidate area is more abundant.
  • the processor of the present disclosure unifies the local candidate regions of different sizes into a fixed size by performing deconvolution layer and/or pooling layer processing.
  • the spatial size of G i may be various, for example, may be 3 ⁇ 3, 6 ⁇ 6, 12 ⁇ 12, 24 ⁇ 24, etc., and unified by deconvolution layer or pool layer technology. It is a fixed size, for example, 12 x 12 or 10 x 10 or 11 x 11 or 13 x 13 or the like.
  • the upsampling process using the deconvolution layer technique is unified to 12 ⁇ 12.
  • the maximum cell size of space technology G i 24 ⁇ 24, the use of (2 ⁇ 2/2) of the spatial dimensions of the unified G i is 12 ⁇ 12.
  • Step S203 the processor performs image segmentation processing on each local candidate region, and predicts a binary segmentation mask 3-5 of the local candidate region.
  • Step S203 can be performed by the image segmentation module 61 that is executed by the processor.
  • the image segmentation step performed by the processor takes G i as an input, and uses the convolutional neural network intermediate result 32 to perform image segmentation processing on each of the local candidate regions 34 to predict the binary mask M i of each of the partial candidate regions 32.
  • the present embodiment will make the local candidate region P i correspond to the scaled object O n . Thereby determining the local binary mask candidate region M i P i should belong to this part of the object O n was calibrated.
  • the above-mentioned calibrated object is usually an object that is manually calibrated in advance.
  • 1 ⁇ 1 ⁇ 2304 in Fig. 7 indicates the size of the convolution kernel of the convolutional layer involved in the image division processing.
  • the reconstruction in Fig. 7 indicates that the local candidate regions after each convolution layer processing are rearranged to form a binary mask 3-5, which is the size of the binary mask 3-5.
  • Step S204 The processor performs image classification processing on each local candidate region, and predicts an object category to which the local candidate region belongs.
  • Step S204 can be performed by image classification module 62, which is executed by the processor.
  • the object type may be an object class in an existing data set, and an existing data set such as a PASCAL VOC (Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes) data set. Wait.
  • PASCAL VOC Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes
  • the image classification step performed by the processor also takes G i as an input, and the processor uses the convolutional neural network intermediate result 3-2 to perform image classification processing on each local candidate region, and predicts the object category to which each local candidate region belongs. l i .
  • O n calibrated object candidate area in the partial area P i representing the object O n calibrated area ratio is larger than a first threshold value (a first threshold value 50 ⁇ ⁇ 75), such as greater than 50%;
  • the ratio of the area of the estimated object O n in the local candidate region P i to the area of the local candidate region P i is greater than a second threshold (the second threshold is usually smaller than the first threshold, such as 10 ⁇ the second threshold ⁇ 20), For example, greater than 20%.
  • the processor predicts the class process as follows:
  • 1 ⁇ 1 ⁇ 4096 and 1 ⁇ 1 ⁇ 21 in Fig. 7 indicate the size of the convolution kernel of the convolutional layer involved in the image classification processing.
  • the processor may perform the above steps S203 and S204 at the same time, or may be performed sequentially.
  • the present disclosure does not limit the order in which the processor performs the two steps.
  • step S205 the processor trains the loss of image classification and image segmentation by using a preset training loss function.
  • Step S205 can be performed by the training loss module 65 that is executed by the processor.
  • the present disclosure is directed to the task of image classification and image segmentation described above, and a training loss function that allows the processor to discriminate whether the image classification and the image segmentation are accurately combined is set in advance as follows:
  • w is the network parameter
  • f c (P i ) is the classification loss of the local candidate region P i , corresponding to the 44th layer in the above example
  • f s (P i ) is the loss of the segmentation mask of the local candidate region P i
  • is the weight of the adjustment f c (P i ) and f s (P i ), which can be set to 1, 1 ⁇ i ⁇ N, and N is the number of local candidate regions.
  • the training loss function employed by the processor of the present disclosure is not limited to the above specific form.
  • the processor employs this form of training loss function to effectively train the convolutional neural network as shown in FIG. 7 designed by the present disclosure.
  • Step S206 the processor performs fusion processing on the two or more local candidate regions according to the object class to which the two or more local candidate regions belong and the binary segmentation mask of the two or more local candidate regions, to obtain an object segmentation image.
  • Step S206 may be performed by the image fusion module 63 executed by the processor.
  • the image fusion module 63 applies all the localities according to the object category to which each local candidate region belongs and the binary segmentation mask 3-5 of each local candidate region 3-4.
  • the candidate region 3-4 performs a blending process to obtain an object segmentation image.
  • FIG. 8 shows a schematic diagram of the overlap of local candidate regions provided by the present disclosure.
  • the parameter defining the overlapping area of the binary split masks 3-5 reflecting the two partial candidate regions 3-4 is IoU (Intersection over Union).
  • the processor selects a plurality of local candidate regions by using a sliding frame, and the processor determines which local candidate regions should be assigned to the same object by calculating the IoU and the object category to which the local candidate region belongs, thereby performing fusion processing on all the local candidate regions.
  • a specific example of determining whether the overlap area between the binary split masks satisfies a predetermined requirement is that the processor obtains a binary mask of a plurality of local candidate regions through the sliding frame, that is, 4-1, 4-2 in FIG. 4-3, 4-4, and 4-5, while waiting
  • the processor (for example, the image fusion module 63 run by the processor) performs fusion processing on the at least two local candidate regions (such as all local candidate regions), including: the processor determines the binary of two adjacent local candidate regions. Dividing the overlapping area of the mask; in response to the overlapping area being greater than a predetermined threshold, two adjacent local candidate regions belonging to the same object category, and two adjacent local candidate regions are not assigned an object, the processor generates The new object assigns the two adjacent local candidate regions to the object.
  • the processor (for example, the image fusion module 63 run by the processor) performs a fusion process on all the local candidate regions, including: the processor determines an overlapping area of the binary segmentation masks of the two adjacent local candidate regions; If the overlapping area is greater than a preset threshold, two adjacent local candidate regions belong to the same object class, and one of the two adjacent local candidate regions is assigned an object, the processor merges the two adjacent local candidates The region assigns another local candidate region to the object.
  • the processor (for example, the image fusion module 63 run by the processor) performs a fusion process on all the local candidate regions, including: the processor determines an overlapping area of the binary segmentation masks of the two adjacent local candidate regions; If the overlapping area is greater than the preset threshold, two adjacent local candidate regions belong to the same object category, and two adjacent local candidate regions are assigned to two objects, the processor merges the two objects.
  • FIG. 9 shows a flow chart of the fusion processing for all local candidate regions provided by the present disclosure.
  • the fusion process performed by the processor includes the following steps:
  • step S2061 the processor calculates the overlapping area of the binary segmentation masks of two adjacent local candidate regions.
  • the adjacent local candidate regions include adjacent local candidate regions of the row dimension and adjacent local candidate regions of the column dimension.
  • Adjacent local candidate regions of the row dimension generally refer to local candidate regions that are adjacent in the horizontal direction
  • adjacent local candidate regions of the column dimensions generally refer to local candidate regions that are adjacent in the vertical direction.
  • Step S2062 the processor determines whether the overlap area is greater than a preset threshold; if so, the processor performs step S2063; otherwise, the processor performs step S2067;
  • Step S2063 the processor determines whether the two adjacent local candidate regions belong to the same object category; if so, the processor performs step S2064; otherwise, the processor performs step S2067;
  • Step S2064 the processor determines whether the two adjacent local candidate regions are not assigned to an object; if so, the processor performs step S2065; otherwise, the processor performs step S2066;
  • Step S2065 the processor generates a new object, the two adjacent local candidate regions are assigned to the object, the processor performs step S2067;
  • Step S2066 if one of the two adjacent local candidate regions is assigned an object, the processor merges the two adjacent local candidate regions, and the processor assigns another local candidate region to the object; These two adjacent local candidates The area is assigned two objects, and the processor merges the two objects, and the processor executes step S2067.
  • step S2067 the processor determines whether all the local candidate regions are assigned to the corresponding objects. If all the local candidate regions are assigned to the corresponding objects, then to step S2068, the fusion processing of the present disclosure ends, otherwise, processing The process proceeds to step S2061; that is, the processor loops through the above steps S2061 to S2066 until all the local candidate regions are assigned to the corresponding objects, and finally all the object lists are obtained, and the processor obtains the object segmentation image.
  • the present disclosure generates a local candidate region of an object, and an object may be covered by a plurality of local candidate regions, so that objects of different sizes may be covered, and each local candidate region may cover a part of the object without completely covering the object, so Each local candidate region learns a wealth of information, which is beneficial to improve the robustness of object segmentation techniques.
  • the image classification processing result and the image segmentation processing result of different local candidate regions can be integrated, so that the object segmentation processing result is combined with the results of different classifiers, which is beneficial to improving the object.
  • the accuracy of the segmentation results By jointly optimizing the local candidate regions, the present disclosure enables the final result to guide the current local candidate region selection module, which can make the results more accurate.
  • the present disclosure can perform end-to-end complete object individual segmentation training and testing with unified deep learning.
  • modules in the devices of the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the present disclosure may be combined into one module or unit or component, and further, they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification, including the accompanying claims, the abstract and the drawings, and any methods so disclosed, or All processes or units of the device are combined.
  • Each feature disclosed in this specification (including the accompanying claims, the abstract and the drawings) may be replaced by alternative features that provide the same, equivalent or similar purpose.
  • Various component embodiments of the present disclosure may be implemented in hardware, or in a software module running on one or more processors, or in a combination thereof.
  • a microprocessor or digital signal processor may be used in practice to implement some or all of the functionality of some or all of the components of the device for acquiring application information in accordance with embodiments of the present disclosure.
  • the present disclosure may also be implemented as a device or device program (eg, a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the present disclosure may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • Figure 12 illustrates a computing device that can implement the object segmentation method of the present disclosure.
  • the computing device conventionally includes a processor 810 and a computer program product or computer readable medium in the form of a storage device 820, in addition to a communication interface and a communication bus.
  • Storage device 820 can be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the processor, the communication interface, and the memory communicate with each other through the communication bus.
  • the storage device 820 has a storage space 830 storing program code 831 for performing any of the method steps described above for storing at least one instruction that causes the processor to perform various steps in the object segmentation method of the present disclosure.
  • storage space 830 storing program code may include various program code 831 for implementing various steps in the above methods, respectively.
  • the program code can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as a hard disk, a compact disk (CD), a memory card, or a floppy disk.
  • Such a computer program product is typically a portable or fixed storage unit such as that shown in FIG.
  • the storage unit may have storage segments, storage spaces, and the like that are similarly arranged to storage device 820 in the computing device of FIG.
  • the program code can be compressed, for example, in an appropriate form.
  • a storage unit includes computer readable code 831' for performing steps of the method in accordance with the present disclosure, ie, code that can be read by a processor, such as 810, that when executed by a computing device causes the computing device Perform the various steps in the method described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种物体分割方法及装置、计算设备,属于计算机视觉技术领域,其中的物体分割方法包括下述步骤:对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;对两个或者以上局部候选区域进行图像分割处理,预测得到局部候选区域的二进制分割掩膜;对两个或者以上局部候选区域进行图像分类处理,预测得到局部候选区域所属的物体类别;根据两个或者以上局部候选区域所属的物体类别和两个或者以上局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。

Description

物体分割方法及装置、计算设备
本公开要求在2016年6月15日提交中国专利局、申请号为201610425391.0、发明名称为“基于多层次局部区域融合的物体分割方法及装置、计算设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及一种物体分割方法及装置、计算设备。
背景技术
图像分割是图像处理领域的一个基础问题,在物体识别、机器人导航、场景理解等领域具有广泛的应用。
利用图像分割技术可以将图像中的不同物体相互区隔开。快速的对图像中的物体进行分割,确定物体的边界,对于图像分割而言,是非常重要的。
发明内容
本公开提供一种物体分割方案。
根据本公开的一个方面,提供了一种物体分割方法,所述方法包括下述步骤:
对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;
对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;
对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;
根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
根据本公开的另一个方面,提供了一种物体分割装置,所述装置包括下述模块:
局部候选区域生成模块,用于对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;
图像分割模块,用于对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;
图像分类模块,用于对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;
图像融合模块,用于根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
根据本公开的又一个方面,提供了一种计算设备,包括:一处理器、一通信接口、一存储器以及一通信总线;所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一指令;所述指令使所述处理器执行以下操作:
对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;
对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;
对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;
根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
根据本公开的又一个方面,提供了一种计算机存储介质,用于存储计算机可读取的指令,所述指令包括:对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域的指令;对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜的指令;对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别的指令;以及根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像的指令。
本公开提供的技术方案使用了多尺度局部候选区域生成方式,利用了图像的多尺度特征,有利于提高物体分割技术的容错能力;本公开能够在检测物体的同时,对物体个体进行分割,并确定其精确边界。本公开通过对局部候选区域进行分割,在得到局部候选区域的分割结果后,使用有效的局部区域融合方式,有利于提高物体分割效果。
上述说明仅是本公开技术方案的概述,为了能够更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为了让本公开的上述和其他目的、特征和优点能够更明显易懂,以下特举本公开的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本公开的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本公开实施方式的一应用场景示意图;
图2示出了本公开实施方式的另一应用场景示意图;
图3示出了实现本公开实施方式的一示例性设备的框图;
图4示出了实现本公开实施方式的另一示例性设备的框图;
图5示出了本公开提供的物体分割方法的一流程图;
图6示出了本公开提供的物体分割方法的另一流程图;
图7示出了本公开提供的物体分割方法的网络模型示意图;
图8示出了本公开提供的局部候选区域的重叠情况的示意图;
图9示出了本公开提供的对所有局部候选区域进行融合处理的流程图;
图10示出了本公开提供的物体分割装置的一功能结构框图;
图11示出了本公开提供的物体分割装置的另一功能结构框图;
图12示出了用于执行根据本公开实施例的物体分割方法的计算设备的框图;
图13示出了用于保持或者携带实现根据本公开的物体分割方法的程序代码的存储单元。
具体实施例
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反的,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
图1示意性地示出了根据本公开可以在其中实现的一应用场景。
图1中,汽车1中安装有驾驶辅助系统,汽车1中的驾驶辅助系统需要对摄取的图像所呈现出的道路环境中的行人2、车辆以及交通信号灯3等物体进行分割,以更好的识别图像中的道路环境,例如,道路上几个连接在一起的物体的图像特征看起来与车辆的外形很像,然而,通过物体分割,可以对几个连接在一起的物体分割开,以便于准确的识别道路上的物体。
图2示意性地示出了根据本公开可以在其中实现的另一应用场景。
图2中,四把椅子20环绕在方桌21的周围,机器人22在搬取其中一把椅子20或者挪动方桌21的过程中,需要对其图像采集装置所采集到的图像中的四把椅子20以及方桌21进行物体分割,以准确的识别需要搬取的椅子20或者需要挪动的方桌21。
本领域技术人员可以理解,本公开还可以适用于其他应用场景中,即本公开所能够适用的应用场景并不会受上述例举的两个应用场景的限制。
本公开提出了一种物体分割方案,本公开通过在针对图像的物体分割方案中使用多尺度局部候选区域生成方式形成局部候选区域,充分利用了图像的多尺度特征,从而使本公开的物体分割技术具有一定的容错能力;本公开在对局部候选区域进行图像分割处理的同时,对局部候选区域进行图像分类处理,从而在检测物体的同时,实现了对图像中的物体个体的分割;本公开通过获得局部候选区域的分割结果和局部候选区域所属的物体类别,并利用获得的分割结果和物体类别对两个或者以上局部候选区域进行融合处理,形成了基于多层次局部区域融合的物体分割的技术方案,本公开的物体分割技术有利于提高物体分割效果。
下面结合附图通过具体的实施例对本公开的物体分割方案进行详细介绍。
图3示出了适于实现本公开的示例性设备30(例如,计算机系统/服务器)的框图。图3显示的设备30仅仅是一个示例,不应对本公开的功能和使用范围带来任何限制。
如图3所示,设备30可以以通用计算设备的形式表现。设备30的组件可以包括但不限于:一个或者多个处理单元301(即处理器),系统存储器302,连接不同系统组件(包括系统存储器302和处理单元301)的总线303。设备30可以包括多种计算机系统可读介质。这些介质可以是任何能够被设备30访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质等。
系统存储器302可以包括易失性存储器形式的计算机系统可读介质,例如,随机存取存储器(RAM)3021和/或高速缓存存储器3022。设备30可以进一步包括其他可移动/不可移动的、 易失性/非易失性计算机系统存储介质。仅作为举例,ROM 3023可以用于读写不可移动的、非易失性磁介质(图3中未显示,通常称为“硬盘驱动器”)。尽管未在图3中示出,系统存储器302可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其他光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线303相连。系统存储器302中可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开的功能。
具有一组(至少一个)程序模块3024的程序/实用工具3025,可以存储在例如系统存储器302中,这样的程序模块3024包括但不限于:操作系统、一个或者多个应用程序、其他程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块3024通常执行本公开所描述的功能和/或方法。
设备30也可以与一个或多个外部设备304(如键盘、指向设备、显示器等)通信。这种通信可以通过输入/输出(I/O)接口305进行,并且,设备30还可以通过网络适配器306与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或者公共网络,例如因特网)通信。如图3所示,网络适配器306通过总线303与设备30的其他模块(如处理单元301等)通信。应当明白,尽管图3中未示出,可以通过设备30使用其他硬件和/或软件模块。
处理单元301通过运行存储在系统存储器302中的计算机程序,从而执行各种功能应用以及数据处理,例如,执行用于实现上述方法中的各步骤的指令;具体而言,处理单元301可以执行系统存储器302中存储的计算机程序,且该计算机程序被执行时,下述步骤被运行:对于一待处理的图像,按照两个或者以上多个不同的预设尺度分别选取多个局部候选区域;对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
图4示出了适于实现本公开的示例性设备400,设备400可以是移动终端、个人计算机(PC)、平板电脑以及服务器等。图4中,计算机系统400包括一个或者多个处理器、通信部等,所述一个或者多个处理器可以为:一个或者多个中央处理单元(CPU)401,和/或,一个或者多个图像处理器(GPU)413等,处理器可以根据存储在只读存储器(ROM)402中的可执行指令或者从存储部分408加载到随机访问存储器(RAM)403中的可执行指令而执行各种适当的动作和处理。通信部412可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器402和/或随机访问存储器430中通信以执行可执行指令,通过总线404与通信部412相连、并经通信部412与其他目标设备通信,从而完成本公开中的相应步骤。一个具体的例子,处理器所执行的步骤包括:对于一待处理的图像,处理器按照两个或者以上多个不同的预设尺度分别选取多个局部候选区域;处理器对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;处理器对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;处理器根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或 者以上局部候选区域进行融合处理,得到物体分割图像。
此外,在RAM 403中,还可以存储有装置操作所需的各种程序以及数据。CPU401、ROM402以及RAM403通过总线404彼此相连。在有RAM403的情况下,ROM402为可选模块。RAM403存储可执行指令,或在运行时向ROM402中写入可执行指令,可执行指令使中央处理单元401执行上述物体分割方法所包括的步骤。输入/输出(I/O)接口405也连接至总线404。通信部412可以集成设置,也可以设置为具有多个子模块(例如,多个IB网卡),并分别与总线连接。
以下部件连接至I/O接口405:包括键盘、鼠标等的输入部分406;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分407;包括硬盘等的存储部分408;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分409。通信部分409经由诸如因特网的网络执行通信处理。驱动器410也根据需要连接至I/O接口405。可拆卸介质411,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器410上,以便于从其上读出的计算机程序根据需要被安装在存储部分408中。
需要说明的,如图4所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图4的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如,GPU和CPU可分离设置,再如理,可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上等。这些可替换的实施方式均落入本公开的保护范围。
特别地,根据本公开的实施方式,下文参考流程图描述的过程可以被实现为计算机软件程序,例如,本公开的实施方式包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的步骤的程序代码,程序代码可包括对应执行本申请提供的步骤对应的指令,例如,用于对于一待处理的图像,按照两个或者以上多个不同的预设尺度分别选取多个局部候选区域的指令;用于对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜的指令;用于对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别的指令;用于根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像的指令。
在这样的实施方式中,该计算机程序可以通过通信部分409从网络上被下载及安装,和/或从可拆卸介质411被安装。在该计算机程序被中央处理单元(CPU)401执行时,执行本公开中记载的上述指令。
下面结合图5-图13对本公开提供的物体分割技术方案进行说明。
图5中,步骤S101,对于一待处理的图像,处理器按照两个或者以上不同的预设尺度分别选取多个局部候选区域。步骤S101可以由被处理器运行的局部候选区域生成模块60执行。
本公开提出了多尺度局部候选区域生成方案,可以将待处理的图像中的一个物体拆成多个局部候选区域来学习。本公开将所选取的局部候选区域同时作为后续图像分割和图像分类的处理对象。
步骤S102,处理器对两个或者以上局部候选区域进行图像分割处理,预测得到局部候选区域的二进制分割掩膜。步骤S102可以由被处理器运行的图像分割模块61执行。
作为示例,处理器以局部候选区域为输入的处理对象,对每个局部候选区域进行图像分割处理,预测每个局部候选区域的二进制掩膜。
步骤S103,处理器对两个或者以上局部候选区域进行图像分类处理,预测得到局部候选区域所属的物体类别。步骤S103可以由被处理器运行的图像分类模块62执行。
作为示例,处理器以局部候选区域为输入的处理对象,对每个局部候选区域进行图像分类处理,预测得到每个局部候选区域所属的物体类别。
处理器可以同时执行上述步骤S102和步骤S103,也可先后执行,本公开对处理器执行两步骤的顺序不做限制。
步骤S104,处理器根据两个或者以上局部候选区域(例如所有局部候选区域)所属的物体类别和两个或者以上局部候选区域(例如所有局部候选区域)的二进制分割掩膜,对两个或者以上局部候选区域(例如所有局部候选区域)进行融合处理,得到物体分割图像,即得到物体个体分割结果。步骤S104可以由被处理器运行的图像融合模块63执行。
处理器将由多尺度局部候选区域生成方案生成的局部候选区域得到的物体局部分割结果和物体局部分类结果进行融合,最终得到整幅图像的物体个体分割结果。
本公开提供的物体分割技术,使用了多尺度局部候选区域生成方法,通过利用图像的多尺度特征,使物体分割技术具有一定的容错能力;本公开能够在检测图像中的物体的同时,对图像中的物体个体进行分割,并确定其边界。本公开通过对局部候选区域进行分割,在得到局部候选区域的分割结果后,使用局部区域融合方式,能够准确的确定出图像中的物体个体。
在图6和图7中,步骤S201,处理器对待处理的图像3-0进行卷积神经网络的卷积层3-1和/或池化层处理,得到卷积神经网络中间结果3-2。步骤S201可以由被处理器运行的卷积神经网络运算模块64执行。
上述待处理的图像30可以具体为384×384×3的图像,其中的384×384表示待处理的图像3-0的大小,3表示通道数量(例如,R、G、B)。本公开不限制待处理的图像30的尺寸大小。
在本公开的卷积神经网络中,部分或者每个卷积层之后设置有一个非线性响应单元。该非线性响应单元采用纠正线性单元(Rectified Linear Units,以下简称:ReLU),本公开通过在卷积层后增加ReLU,可以将卷积层的映射结果尽量稀疏一些,以模拟人的视觉反应,从而有利于提高图像处理效果。本公开可以根据实际情况设置卷积神经网络中的卷积层的卷积核,例如,考虑到有利于综合局部信息等因素,本公开通常将卷积神经网络中的卷积层的卷积核设为3×3,当然,卷积核也可以设置为1×1或者2×2或者4×4等。同时,本公开可以设定池化层的步长stride,从而使上层特征在不增加计算量的前提下,有利于扩大视野,同时池化层的步长stride还有增强空间不变性的特征,即允许同样的输入出现在不同的图像位置上,而输出结果响应相同。卷积神经网络的卷积层主要用于信息归纳和融合。最大池化层(Max pooling)主要进行高层信息的归纳。卷积神经网络的结构可以进行微调来适应不同性能和效率的权衡。
一个具体的示例得到的卷积神经网络中间结果3-2,如下:
1.输入层
2.<=1卷积层1_1(3×3×64)
3.<=2非线性响应ReLU层
4.<=3卷积层1_2(3×3×64)
5.<=4非线性响应ReLU层
6.<=5池化层(3×3/2)
7.<=6卷积层2_1(3×3×128)
8.<=7非线性响应ReLU层
9.<=8卷积层2_2(3×3×128)
10.<=9非线性响应ReLU层
11.<=10池化层(3×3/2)
12.<=11卷积层3_1(3×3×256)
13.<=12非线性响应ReLU层
14.<=13卷积层3_2(3×3×256)
15.<=14非线性响应ReLU层
16.<=15卷积层3_3(3×3×256)
17.<=16非线性响应ReLU层
18.<=17池化层(3×3/2)
19.<=18卷积层4_1(3×3×512)
20.<=19非线性响应ReLU层
21.<=20卷积层4_2(3×3×512)
22.<=21非线性响应ReLU层
23.<=22卷积层4_3(3×3×512)
24.<=23非线性响应ReLU层
25.<=24池化层(3×3/2)
26.<=25卷积层5_1(3×3×512)
27.<=26非线性响应ReLU层
28.<=27卷积层5_2(3×3×512)
29.<=28非线性响应ReLU层
30.<=29卷积层5_3(3×3×512)
31.<=30非线性响应ReLU层
其中符号.<=前面的数字为当前层数,后面的数字为输入层数,例如,2.<=1表明当前层为第二层,输入为第一层。卷积层后面括号内为卷积层参数,例如,3×3×64,表明卷积核大小3×3,通道数为64。池化层后面括号内为池化层参数,例如,3×3/2表明池化核大小为3×3,间隔为2。
图7中的24×24×512表示卷积神经网络中间结果3-2的大小,且卷积神经网络中间结果3-2的大小会随着待处理的图像3-0的大小的变化而变化,例如,待处理的图像3-0的大小变大时,卷积神经网络中间结果3-2的大小也会随之变大。
上述卷积神经网络中间结果3-2为后续图像分类处理和图像分割处理的共享数据。利用这些卷积神经网络中间结果3-2可以很大程度上降低后续处理的复杂度。
步骤S202,处理器利用卷积神经网络中间结果3-2,选取局部候选区域产生层3-3,处理器通过滑动框在局部候选区域产生层3-3对应的特征图上按照两个或者以上不同的预设尺度分别选取多个局部候选区域3-4。步骤S202可以由被处理器运行的局部候选区域生成模块60执行。
本公开将待处理的图像3-0中的一个物体拆成多个局部候选区域3-4来学习。本公开可以选择4个不同的预设尺度的局部候选区域3-4,分别为48×48的局部候选区域3-4(如图7中大括号右侧最上方的方块)、96×96的局部候选区域3-4(如图7中大括号右侧位于中间位置的方块)、192×192的局部候选区域3-4(如图7中大括号右侧最下方的方块)和384×384的局部候选区域(由于省略而未在图7中示出)。此处仅为举例,本公开不限于所举例的选取方式。通过选取多个不同的预设尺度(例如,48×48、96×96、192×192以及384×384等),有利于提高产生局部候选区域的完备性。
基于以上的卷积神经网络中间结果3-2,选取32.<=31为局部候选区域产生层3-3。处理器通过控制滑动框在局部候选区域产生层3-3对应的特征图(feature map)上滑动,分别以多个不同的预设尺度选取局部候选区域3-4。本公开通过滑动框在特征图上的滑动,滑动框每次所覆盖的特征图中各特征点形成一组特征点,不同组所包含的特征点不完全相同。上述特征图可以是处理器对待处理的图像3-0进行相应的处理而获得的特征图,例如,处理器利用VGG16(Visual Geometry Group,视觉组)网络、GoogleNet(谷歌网络)或者ResNet技术对待处理的图像3-0进行卷积运算而获得的特征图。
对于每个局部候选区域Pi(1≤i≤N,且N为局部候选区域的数量),表示为(r,c,h,w),其中(r,c)是局部候选区域3-4的左上角的坐标,h和w分别是局部候选区域3-4的高度值和宽度值,通过局部候选区域3-4的左上角的坐标以及局部候选区域3-4的高度值和宽度值,通常可以唯一确定出局部候选区域3-4对应于待处理的图像30中的位置。处理器可以使滑动框以预设步长滑动,例如,处理器控制滑动框以步长16滑动。在特征图上,每个局部候选区域Pi对应于经下采样的特征网格Gi,Gi可以表示为
Figure PCTCN2017088380-appb-000001
由上述描述可知,滑动框在特征图上每滑动一次形成一个局部候选区域3-4和一个特征网格,且特征网格和局部候选区域3-4的空间尺寸由滑动框决定。
为了从单尺度的输入图像中产生出多尺度的局部候选区域3-4,本公开利用共享的卷积神经网络中间结果3-2,从所选取的卷积层(局部候选区域产生层3-3)对应的特征图上分别以多个不同的预设尺度选取局部候选区域3-4,并没有增加计算成本。而且,本公开通过多个预设尺度的选择,能够尽可能多的覆盖不同大小的物体;每个局部候选区域3-4可以覆盖图像中的物体的一部分,不必完全包含物体,所以每个局部候选区域学到的信息更丰富。
进一步的,由于以不同的预设尺寸选取局部候选区域3-4,因此所有局部候选区域3-4的大小存在不同。为了便于后续图像分类和分割处理,本公开的处理器通过进行去卷积层和/或池化 层处理,将不同尺寸的局部候选区域统一为固定尺寸。在上述例子中,Gi的空间尺寸可以为多种,例如,可以为3×3、6×6、12×12和24×24等,利用去卷积层或池化层技术,将其统一为固定尺寸,例如,12×12或者10×10或者11×11或者13×13等。一个具体的例子,对于空间尺寸为3×3和6×6的Gi,利用去卷积层技术进行上采样处理统一为12×12。对于空间尺寸为24×24的Gi,利用(2×2/2)的最大池化技术将Gi的空间尺寸统一为12×12。
步骤S203,处理器对每个局部候选区域进行图像分割处理,预测得到局部候选区域的二进制分割掩膜3-5。步骤S203可以由被处理器运行的图像分割模块61执行。
处理器执行的图像分割步骤以Gi为输入,同时利用上述卷积神经网络中间结果32,对每个局部候选区域34进行图像分割处理,预测每个局部候选区域32的二进制掩膜Mi
在训练过程中,若局部候选区域Pi的中心位于某个已标定物体On的内部,则本实施例将使该局部候选区域Pi对应于此已标定物体On。由此确定该局部候选区域Pi的二进制掩膜Mi应属于此已标定物体On的一部分。上述已标定物体通常为预先人工标定的物体。
在上述示例中,处理器预测二进制掩膜3-5(即由0和1组成的二进制图像)的过程如下:
33.<=32卷积层seg_6_1(1×1×2304)
34.<=33非线性响应ReLU层
35.<=34卷积层seg_6_2(1×1×2304)
36.<=35重构层,将输入变形成48×48
37.<=36softmax损失层
上述36和37为输出层。另外,图7中的1×1×2304表示图像分割处理过程中所涉及的卷积层的卷积核的大小。图7中的重构表示对各卷积层处理后的局部候选区域进行重排列,从而形成二进制掩膜3-5,48×48为二进制掩膜3-5的大小。
步骤S204,处理器对每个局部候选区域进行图像分类处理,预测得到局部候选区域所属的物体类别。步骤S204可以由被处理器运行的图像分类模块62执行。上述物体类别可以为现有的数据集中的物体类别,现有的数据集如PASCAL VOC(Pattern Analysis,Statistical modelling and Computational Learning Visual Object Classes,图样分析,统计建模和计算学习可视化物体类别)数据集等。
处理器执行的图像分类步骤也以Gi为输入,同时处理器利用上述卷积神经网络中间结果3-2,对每个局部候选区域进行图像分类处理,预测每个局部候选区域所属的物体类别li
本实施例中,如果局部候选区域Pi同时满足如下三个条件,则认为该局部候选区域Pi属于已标定物体On
(1)局部候选区域Pi的中心位于已标定物体On内部;例如,如果已标定物体On具有外接框,则在局部候选区域Pi的中心位于已标定物体On的外接框内的情况下,认为局部候选区域Pi的中心位于已标定物体On内部;
(2)已标定物体On在局部候选区域Pi中的面积占已标定物体On面积的比例大于第一阈值(50≤第一阈值≤75),比如大于50%;
(3)已标定物体On在局部候选区域Pi中的面积占局部候选区域Pi面积的比例大于第二阈值(第二阈值通常小于第一阈值,如10≤第二阈值≤20),比如大于20%。
在上述示例中,处理器预测类别过程如下:
38.<=32池化层(3×3/2)
39.<=38卷积层cls_6_1(1×1×4096)
40.<=39非线性响应ReLU层
41.<=40卷积层cls_6_2(1×1×4096)
42.<=41非线性响应ReLU层
43.<=42卷积层cls_7_1(1×1×21)
44.<=43softmax损失层
图7中的1×1×4096以及1×1×21表示图像分类处理过程中所涉及的卷积层的卷积核的大小。
处理器可以同时执行上述步骤S203和步骤S204,也可以先后执行,本公开不限制处理器执行这两个步骤的顺序。
步骤S205,处理器利用预先设置的训练损失函数,对图像分类和图像分割的损失进行训练。步骤S205可以由被处理器运行的训练损失模块65执行。
本公开针对上述图像分类和图像分割的任务,预先设置了可以使处理器判别图像分类和图像分割是否准确相结合的训练损失函数,如下:
Figure PCTCN2017088380-appb-000002
其中,w为网络参数;fc(Pi)为局部候选区域Pi的分类损失,对应上述示例中的第44层;fs(Pi)为局部候选区域Pi的分割掩膜的损失,对应上述示例中的第37层;λ为调节fc(Pi)和fs(Pi)的权重,可设为1,1≤i≤N,且N为局部候选区域的数量。
本公开的处理器所采用的训练损失函数并不限定于以上具体的形式。处理器采用这种形式的训练损失函数,能够对本公开所设计的如图7所示的卷积神经网络进行有效的训练。
步骤S206,处理器根据两个或者以上局部候选区域所属的物体类别和两个或者以上局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。步骤S206可以由被处理器运行的图像融合模块63执行,例如,图像融合模块63根据各局部候选区域所属的物体类别和各局部候选区域3-4的二进制分割掩膜3-5,对所有局部候选区域3-4进行融合处理,从而得到物体分割图像。
发明人通过研究发现,如果某几个局部候选区域3-4与某个物体重叠的面积满足预定要求(例如,超过预设面积阈值),其对应的二进制分割掩膜3-5之间的重叠面积也会满足预定要求。图8示出了本公开提供的局部候选区域的重叠情况的示意图。如图8所示,定义反映两个局部候选区域3-4的二进制分割掩膜3-5的重叠面积的参数为IoU(Intersection over Union,基于并集的交集)。处理器利用滑动框选取了若干局部候选区域,处理器通过计算IoU以及局部候选区域所属的物体类别来判断哪些局部候选区域应被赋值为同一物体,从而对所有局部候选区域进行融合处理。
判断二进制分割掩膜之间的重叠面积是否满足预定要求一个具体的例子为:处理器通过滑动框获得了多个局部候选区域的二进制掩膜,即图8中的4-1、4-2、4-3、4-4以及4-5,而待 处理的图像4-0中的三个方框与相应的局部候选区域的二进制掩膜相对应,处理器针对4-2和4-3进行IoU计算,设定计算的结果为IoU=0.89,由于该IoU满足预定要求(如大于0.8),因此,处理器可以将4-2和4-3融合为二进制掩膜4-6;之后,处理器针对4-6和4-4进行IoU计算,设定计算的结果为IoU=0.83,该IoU满足预定要求(如同样大于0.8),因此,处理器可以将4-6和4-4融合为二进制掩膜4-7,而二进制掩膜4-7对应融合处理后的物体分割图像。
本公开中的处理器可以利用下述方式计算IoU:即IoU=两个局部候选区域的二进制分割掩膜的相交面积/(两个局部候选区域的二进制分割掩膜的面积和-相交面积)。
进一步的,处理器(例如,被处理器运行的图像融合模块63)对至少两个局部候选区域(如所有局部候选区域)进行融合处理包括:处理器确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;响应于重叠面积大于预设阈值、两个相邻的局部候选区域属于同一物体类别、且两个相邻的局部候选区域都未被赋值为一物体,则处理器生成新的物体将这两个相邻的局部候选区域赋值为该物体。
进一步的,处理器(例如,被处理器运行的图像融合模块63)对所有局部候选区域进行融合处理包括:处理器确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;响应于重叠面积大于预设阈值、两个相邻的局部候选区域属于同一物体类别、且两个相邻的局部候选区域中的一个被赋值为一物体,则处理器合并这两个相邻的局部候选区域,将另一个局部候选区域赋值为该物体。
进一步的,处理器(例如,被处理器运行的图像融合模块63)对所有局部候选区域进行融合处理包括:处理器确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;响应于重叠面积大于预设阈值、两个相邻的局部候选区域属于同一物体类别、且两个相邻的局部候选区域被赋值为两个物体,则处理器合并这两个物体。
具体地,图9示出了本公开提供的对所有局部候选区域进行融合处理的流程图。如图9所示,处理器(例如,被处理器运行的图像融合模块63)执行的融合处理过程包括如下步骤:
步骤S2061,处理器计算两个相邻的局部候选区域的二进制分割掩膜的重叠面积。
其中,相邻的局部候选区域包括行维度的相邻局部候选区域以及列维度的相邻局部候选区域。行维度的相邻局部候选区域通常是指在水平方向上相邻的局部候选区域,列维度的相邻局部候选区域通常是指在竖直方向上相邻的局部候选区域。
步骤S2062,处理器判断重叠面积是否大于预设阈值;若是,则处理器执行步骤S2063;否则,处理器执行步骤S2067;
步骤S2063,处理器判断这两个相邻的局部候选区域是否属于同一物体类别;若是,则处理器执行步骤S2064;否则,处理器执行步骤S2067;
步骤S2064,处理器判断这两个相邻的局部候选区域是否都未被赋值为一物体;若是,则处理器执行步骤S2065;否则,处理器执行步骤S2066;
步骤S2065,处理器生成新的物体,将这两个相邻的局部候选区域赋值为该物体,处理器执行步骤S2067;
步骤S2066,若这两个相邻的局部候选区域中的一个被赋值为一物体,则处理器合并这两个相邻的局部候选区域,处理器将另一个局部候选区域赋值为该物体;若这两个相邻的局部候选 区域被赋值为两个物体,则处理器合并这两个物体,处理器执行步骤S2067。
步骤S2067,处理器判断是否所有的局部候选区域都被赋值为对应的物体,如果所有的局部候选区域都被赋值为对应的物体,则到步骤S2068,本公开的融合处理过程结束,否则,处理器继续执行步骤S2061;也就是说,处理器循环执行以上步骤S2061至步骤S2066,直至所有的局部候选区域被赋值为对应的物体,最终得到所有的物体列表,进而处理器得到物体分割图像。
本公开生成的是物体的局部候选区域,一个物体可能被多个局部候选区域覆盖,这样可以覆盖大小不同的物体,且每个局部候选区域可以覆盖物体的一部分,不必完全覆盖物体,所以可以从每个局部候选区域学到丰富的信息,有利于提高物体分割技术的鲁棒性。同时,通过多个局部候选区域综合物体边界,能够根据不同的局部候选区域的图像分类处理结果和图像分割处理结果的综合,使物体分割处理结果与不同分类器的结果相结合,有利于提高物体分割结果的准确性。本公开通过联合优化局部候选区域,能够让最终的结果指导目前的局部候选区域选择模块,能让结果更精确。本公开可以用统一的深度学习来完成端到端完整的物体个体分割训练和测试。
在此提供的方法和显示不与任何特定计算机、虚拟系统或者其他设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本公开也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本公开的内容,并且上面对特定语言所做的描述是为了披露本公开的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本公开可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本公开的示例性实施例的描述中,本公开的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本公开要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本公开的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把本公开中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其他实施例中所包括的某些特征而不是其他特征,但是不同实施例的特征的组合意味着处于本公开的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以 任意的组合方式来使用。
本公开的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本公开实施例的获取应用信息的设备中的一些或者全部部件的一些或者全部功能。本公开还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本公开的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图12示出了可以实现本公开的物体分割方法的计算设备。该计算设备传统上包括处理器810和以存储设备820形式的计算机程序产品或者计算机可读介质,另外还包括通信接口和通信总线。存储设备820可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。处理器、通信接口和存储器通过通信总线完成相互间的通信。存储设备820具有存储用于执行上述方法中的任何方法步骤的程序代码831的存储空间830,用于存放至少一指令,该指令使处理器执行本公开的物体分割方法中的各种步骤。例如,存储程序代码的存储空间830可以包括分别用于实现上面的方法中的各种步骤的各个程序代码831。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘、紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图13所示的便携式或者固定存储单元。该存储单元可以具有与图12的计算设备中的存储设备820类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括用于执行根据本公开的方法步骤的计算机可读代码831',即可以由诸如810之类的处理器读取的代码,当这些代码由计算设备运行时,导致该计算设备执行上面所描述的方法中的各个步骤。
应该注意的是上述实施例对本公开进行说明而不是对本公开进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包括”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本公开可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的模块的权利要求中,这些装置中的若干个模块可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序,可将这些单词解释为名称。

Claims (21)

  1. 一种物体分割方法,其特征在于,所述方法包括下述步骤:
    对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;
    对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;
    对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;
    根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
  2. 根据权利要求1所述的物体分割方法,其特征在于,在所述对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域之前,所述方法还包括下述步骤:对所述待处理的图像进行卷积神经网络的卷积层和/或池化层处理,得到卷积神经网络中间结果;
    所述按照两个或者以上不同的预设尺度分别选取多个局部候选区域进一步包括:利用所述卷积神经网络中间结果,选取局部候选区域产生层,通过滑动框在局部候选区域产生层对应的特征图上按照两个或者以上不同的预设尺度分别选取多个局部候选区域。
  3. 根据权利要求2所述的物体分割方法,其特征在于,在所述方法选取多个局部候选区域之后,所述方法还包括下述步骤:
    通过进行去卷积层和/或池化层处理,将不同尺寸的局部候选区域统一为固定尺寸。
  4. 根据权利要求1-3任一项所述的物体分割方法,其特征在于,所述对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别进一步包括:
    若局部候选区域的中心位于已标定物体内部,所述已标定物体在局部候选区域中的面积占已标定物体面积的比例大于第一阈值,且所述已标定物体在局部候选区域中的面积占局部候选区域面积的比例大于第二阈值,则判定所述局部候选区域所属的物体类别为所述已标定物体的类别。
  5. 根据权利要求1-4任一项所述的物体分割方法,其特征在于,在所述对两个或者以上局部候选区域进行融合处理之前,所述方法还包括下述步骤:
    建立训练损失函数,对图像分类和图像分割的损失进行训练。
  6. 根据权利要求1-5任一项所述的物体分割方法,其特征在于,所述对两个或者以上局部候选区域进行融合处理进一步包括:
    根据局部候选区域的二进制分割掩膜的重叠面积以及局部候选区域所属的物体类别,对两个或者以上局部候选区域进行融合处理。
  7. 根据权利要求6所述的物体分割方法,其特征在于,所述对两个或者以上局部候选区域进行融合处理进一步包括:
    确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;
    响应于所述重叠面积大于预设阈值、所述两个相邻的局部候选区域属于同一物体类别、且所述两个相邻的局部候选区域都未被赋值为一物体,生成新的物体,将这两个相邻的局部候选 区域赋值为该新的物体。
  8. 根据权利要求6或7所述的物体分割方法,其特征在于,所述对两个或者以上局部候选区域进行融合处理进一步包括:
    确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;
    响应于所述重叠面积大于预设阈值、所述两个相邻的局部候选区域属于同一物体类别、且所述两个相邻的局部候选区域中的一个被赋值为一物体,合并这两个相邻的局部候选区域,将另一个局部候选区域赋值为该物体。
  9. 根据权利要求6-8任一所述的物体分割方法,其特征在于,所述对两个或者以上局部候选区域进行融合处理进一步包括:
    确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;
    响应于所述重叠面积大于预设阈值、所述两个相邻的局部候选区域属于同一物体类别、且所述两个相邻的局部候选区域被赋值为两个物体,合并这两个物体。
  10. 一种物体分割装置,其特征在于,所述装置包括下述模块:
    局部候选区域生成模块,用于对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;
    图像分割模块,用于对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;
    图像分类模块,用于对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;
    图像融合模块,用于根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
  11. 根据权利要求10所述的物体分割装置,其特征在于,所述装置还包括:卷积神经网络运算模块,用于对所述待处理的图像进行卷积神经网络的卷积层和/或池化层处理,得到卷积神经网络中间结果;
    所述局部候选区域生成模块进一步用于:利用所述卷积神经网络中间结果,选取局部候选区域产生层,通过滑动框在局部候选区域产生层对应的特征图上按照两个或者以上不同的预设尺度分别选取多个局部候选区域。
  12. 根据权利要求11所述的物体分割装置,其特征在于,所述局部候选区域生成模块进一步用于:通过进行去卷积层和/或池化层处理,将不同尺寸的局部候选区域统一为固定尺寸。
  13. 根据权利要求10-12任一项所述的物体分割装置,其特征在于,所述图像分类模块进一步用于:若局部候选区域的中心位于已标定物体内部,所述已标定物体在局部候选区域中的面积占已标定物体面积的比例大于第一阈值,且所述已标定物体在局部候选区域中的面积占局部候选区域面积的比例大于第二阈值,则所述图像分类模块判定所述局部候选区域所属的物体类别为所述已标定物体的类别。
  14. 根据权利要求10-13任一项所述的物体分割装置,其特征在于,所述装置还包括:训练损失模块,用于建立训练损失函数,对图像分类和图像分割的损失进行训练。
  15. 根据权利要求10-14任一项所述的物体分割装置,其特征在于,所述图像融合模块进一步用于:根据局部候选区域的二进制分割掩膜的重叠面积以及局部候选区域所属的物体类别,对两个或者以上局部候选区域进行融合处理。
  16. 根据权利要求15所述的物体分割装置,其特征在于,所述图像融合模块进一步用于:
    确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;
    响应于所述重叠面积大于预设阈值、所述两个相邻的局部候选区域属于同一物体类别、且所述两个相邻的局部候选区域都未被赋值为一物体,所述图像融合模块生成新的物体,所述图像融合模块将这两个相邻的局部候选区域赋值为该新的物体。
  17. 根据权利要求15或16所述的物体分割装置,其特征在于,所述图像融合模块进一步用于:
    确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;
    响应于所述重叠面积大于预设阈值、所述两个相邻的局部候选区域属于同一物体类别、且所述两个相邻的局部候选区域中的一个被赋值为一物体,所述图像融合模块合并这两个相邻的局部候选区域,所述图像融合模块将另一个局部候选区域赋值为该物体。
  18. 根据权利要求15-17任一项所述的物体分割装置,其特征在于,所述图像融合模块进一步用于:
    确定两个相邻的局部候选区域的二进制分割掩膜的重叠面积;
    响应于所述重叠面积大于预设阈值、所述两个相邻的局部候选区域属于同一物体类别、且所述两个相邻的局部候选区域被赋值为两个物体,所述图像融合模块合并这两个物体。
  19. 一种计算设备,其特征在于,包括:处理器、通信接口、存储器以及通信总线;所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;
    所述存储器用于存放至少一指令;所述指令使所述处理器执行以下操作:
    对于一待处理的图像,按照两个或者以上不同的预设尺度分别选取多个局部候选区域;
    对两个或者以上局部候选区域进行图像分割处理,预测得到所述局部候选区域的二进制分割掩膜;
    对两个或者以上局部候选区域进行图像分类处理,预测得到所述局部候选区域所属的物体类别;
    根据两个或者以上所述局部候选区域所属的物体类别和两个或者以上所述局部候选区域的二进制分割掩膜,对两个或者以上局部候选区域进行融合处理,得到物体分割图像。
  20. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备中运行时,所述设备中的处理器执行用于实现权利要求1-9中的任一权利要求所述的物体分割方法中的步骤的指令。
  21. 一种计算机可读介质,用于存储权利要求20所述的计算机程序。
PCT/CN2017/088380 2016-06-15 2017-06-15 物体分割方法及装置、计算设备 WO2017215622A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/857,304 US10489913B2 (en) 2016-06-15 2017-12-28 Methods and apparatuses, and computing devices for segmenting object

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610425391.0A CN106097353B (zh) 2016-06-15 2016-06-15 基于多层次局部区域融合的物体分割方法及装置、计算设备
CN201610425391.0 2016-06-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/857,304 Continuation US10489913B2 (en) 2016-06-15 2017-12-28 Methods and apparatuses, and computing devices for segmenting object

Publications (1)

Publication Number Publication Date
WO2017215622A1 true WO2017215622A1 (zh) 2017-12-21

Family

ID=57235471

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/088380 WO2017215622A1 (zh) 2016-06-15 2017-06-15 物体分割方法及装置、计算设备

Country Status (3)

Country Link
US (1) US10489913B2 (zh)
CN (1) CN106097353B (zh)
WO (1) WO2017215622A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074185A1 (en) * 2019-11-08 2020-03-05 Intel Corporation Fine-grain object segmentation in video with deep features and multi-level graphical models
CN111553923A (zh) * 2019-04-01 2020-08-18 上海卫莎网络科技有限公司 一种图像处理方法、电子设备及计算机可读存储介质

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106097353B (zh) * 2016-06-15 2018-06-22 北京市商汤科技开发有限公司 基于多层次局部区域融合的物体分割方法及装置、计算设备
US10621747B2 (en) * 2016-11-15 2020-04-14 Magic Leap, Inc. Deep learning system for cuboid detection
CN106845631B (zh) * 2016-12-26 2020-05-29 上海寒武纪信息科技有限公司 一种流执行方法及装置
CN106846323B (zh) * 2017-01-04 2020-07-10 珠海大横琴科技发展有限公司 一种实现交互式图像分割的方法、装置及终端
CN108303747B (zh) * 2017-01-12 2023-03-07 清华大学 检查设备和检测枪支的方法
CN110838124B (zh) * 2017-09-12 2021-06-18 深圳科亚医疗科技有限公司 用于分割具有稀疏分布的对象的图像的方法、系统和介质
CN110809784B (zh) * 2017-09-27 2021-04-20 谷歌有限责任公司 高分辨率图像分割的端到端网络模型
CN107833224B (zh) * 2017-10-09 2019-04-30 西南交通大学 一种基于多层次区域合成的图像分割方法
US10559080B2 (en) * 2017-12-27 2020-02-11 International Business Machines Corporation Adaptive segmentation of lesions in medical images
CN108875537B (zh) * 2018-02-28 2022-11-08 北京旷视科技有限公司 对象检测方法、装置和系统及存储介质
CN108805898B (zh) * 2018-05-31 2020-10-16 北京字节跳动网络技术有限公司 视频图像处理方法和装置
CN108898111B (zh) * 2018-07-02 2021-03-02 京东方科技集团股份有限公司 一种图像处理方法、电子设备及计算机可读介质
CN108710875B (zh) * 2018-09-11 2019-01-08 湖南鲲鹏智汇无人机技术有限公司 一种基于深度学习的航拍公路车辆计数方法及装置
US10846870B2 (en) * 2018-11-29 2020-11-24 Adobe Inc. Joint training technique for depth map generation
CN111292334B (zh) * 2018-12-10 2023-06-09 北京地平线机器人技术研发有限公司 一种全景图像分割方法、装置及电子设备
CN109977997B (zh) * 2019-02-13 2021-02-02 中国科学院自动化研究所 基于卷积神经网络快速鲁棒的图像目标检测与分割方法
CN111582432B (zh) * 2019-02-19 2023-09-12 嘉楠明芯(北京)科技有限公司 一种网络参数处理方法及装置
CN109934153B (zh) * 2019-03-07 2023-06-20 张新长 基于门控深度残差优化网络的建筑物提取方法
CN110084817B (zh) * 2019-03-21 2021-06-25 西安电子科技大学 基于深度学习的数字高程模型生产方法
CN110070056B (zh) * 2019-04-25 2023-01-10 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质及设备
CN110119728B (zh) * 2019-05-23 2023-12-05 哈尔滨工业大学 基于多尺度融合语义分割网络的遥感图像云检测方法
CN110222829A (zh) * 2019-06-12 2019-09-10 北京字节跳动网络技术有限公司 基于卷积神经网络的特征提取方法、装置、设备及介质
CN110807361B (zh) * 2019-09-19 2023-08-08 腾讯科技(深圳)有限公司 人体识别方法、装置、计算机设备及存储介质
CN110648340B (zh) * 2019-09-29 2023-03-17 惠州学院 一种基于二进制及水平集处理图像的方法及装置
CN110807779A (zh) * 2019-10-12 2020-02-18 湖北工业大学 一种基于区域分割的压缩计算鬼成像方法及系统
US11120280B2 (en) 2019-11-15 2021-09-14 Argo AI, LLC Geometry-aware instance segmentation in stereo image capture processes
EP3843038B1 (en) 2019-12-23 2023-09-20 HTC Corporation Image processing method and system
US11298195B2 (en) * 2019-12-31 2022-04-12 Auris Health, Inc. Anatomical feature identification and targeting
CN111325204B (zh) * 2020-01-21 2023-10-31 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备以及存储介质
CN111339892B (zh) * 2020-02-21 2023-04-18 青岛联合创智科技有限公司 一种基于端到端3d卷积神经网络的泳池溺水检测方法
CN111640123B (zh) * 2020-05-22 2023-08-11 北京百度网讯科技有限公司 无背景图像的生成方法、装置、设备及介质
CN111882558A (zh) * 2020-08-11 2020-11-03 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质
CN112529863B (zh) * 2020-12-04 2024-01-23 推想医疗科技股份有限公司 测量骨密度的方法及装置
CN112862840B (zh) * 2021-03-04 2023-07-04 腾讯科技(深圳)有限公司 图像分割方法、装置、设备及介质
CN112991381B (zh) * 2021-03-15 2022-08-02 深圳市慧鲤科技有限公司 图像处理方法及装置、电子设备和存储介质
CN113033504B (zh) * 2021-05-19 2021-08-27 广东众聚人工智能科技有限公司 一种基于多尺度视频异常检测方法
CN114511007B (zh) * 2022-01-17 2022-12-09 上海梦象智能科技有限公司 一种基于多尺度特征感知的非侵入式电气指纹识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8233726B1 (en) * 2007-11-27 2012-07-31 Googe Inc. Image-domain script and language identification
CN104573744A (zh) * 2015-01-19 2015-04-29 上海交通大学 精细粒度类别识别及物体的部分定位和特征提取方法
CN104992179A (zh) * 2015-06-23 2015-10-21 浙江大学 一种基于精粒度卷积神经网络的衣物推荐方法
CN106097353A (zh) * 2016-06-15 2016-11-09 北京市商汤科技开发有限公司 基于多层次局部区域融合的物体分割方法及装置、计算设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2909205B1 (fr) * 2006-11-28 2009-01-23 Commissariat Energie Atomique Procede de designation d'un objet dans une image.
US8751530B1 (en) * 2012-08-02 2014-06-10 Google Inc. Visual restrictions for image searches
JP2014071207A (ja) * 2012-09-28 2014-04-21 Canon Inc 画像処理装置、撮像システム、画像処理システム
CN104077577A (zh) * 2014-07-03 2014-10-01 浙江大学 一种基于卷积神经网络的商标检测方法
CN105469047B (zh) * 2015-11-23 2019-02-22 上海交通大学 基于无监督学习深度学习网络的中文检测方法及系统
CN105488468B (zh) * 2015-11-26 2019-10-18 浙江宇视科技有限公司 一种目标区域的定位方法和装置
CN105488534B (zh) * 2015-12-04 2018-12-07 中国科学院深圳先进技术研究院 交通场景深度解析方法、装置及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8233726B1 (en) * 2007-11-27 2012-07-31 Googe Inc. Image-domain script and language identification
CN104573744A (zh) * 2015-01-19 2015-04-29 上海交通大学 精细粒度类别识别及物体的部分定位和特征提取方法
CN104992179A (zh) * 2015-06-23 2015-10-21 浙江大学 一种基于精粒度卷积神经网络的衣物推荐方法
CN106097353A (zh) * 2016-06-15 2016-11-09 北京市商汤科技开发有限公司 基于多层次局部区域融合的物体分割方法及装置、计算设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARBELAEZ, P. ET AL.: "Boundary Extraction in Natural Images Using Ultrametric Contour Maps", PROCEEDINGS OF THE 2006 CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOP (CVPRW' 06, 31 December 2006 (2006-12-31), XP010922698 *
WU, RUOBINGET ET AL.: "Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification", 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, 31 December 2015 (2015-12-31), XP032866457 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553923A (zh) * 2019-04-01 2020-08-18 上海卫莎网络科技有限公司 一种图像处理方法、电子设备及计算机可读存储介质
CN111553923B (zh) * 2019-04-01 2024-02-23 上海卫莎网络科技有限公司 一种图像处理方法、电子设备及计算机可读存储介质
US20200074185A1 (en) * 2019-11-08 2020-03-05 Intel Corporation Fine-grain object segmentation in video with deep features and multi-level graphical models
US11763565B2 (en) * 2019-11-08 2023-09-19 Intel Corporation Fine-grain object segmentation in video with deep features and multi-level graphical models

Also Published As

Publication number Publication date
CN106097353B (zh) 2018-06-22
US10489913B2 (en) 2019-11-26
CN106097353A (zh) 2016-11-09
US20180144477A1 (en) 2018-05-24

Similar Documents

Publication Publication Date Title
WO2017215622A1 (zh) 物体分割方法及装置、计算设备
CN109829399B (zh) 一种基于深度学习的车载道路场景点云自动分类方法
CN108647585B (zh) 一种基于多尺度循环注意力网络的交通标识符检测方法
CN107424159B (zh) 基于超像素边缘和全卷积网络的图像语义分割方法
CN110991311B (zh) 一种基于密集连接深度网络的目标检测方法
Xu et al. Scale-aware feature pyramid architecture for marine object detection
CN114897779B (zh) 基于融合注意力的宫颈细胞学图像异常区域定位方法及装置
CN110378297B (zh) 基于深度学习的遥感图像目标检测方法、装置、及存储介质
WO2018165753A1 (en) Structure defect detection using machine learning algorithms
WO2016130203A1 (en) Convolution matrix multiply with callback for deep tiling for deep convolutional neural networks
CN112434618B (zh) 基于稀疏前景先验的视频目标检测方法、存储介质及设备
Ye et al. Autonomous surface crack identification of concrete structures based on the YOLOv7 algorithm
CN113850129A (zh) 一种旋转等变的空间局部注意力遥感图像目标检测方法
CN111768415A (zh) 一种无量化池化的图像实例分割方法
CN111461145A (zh) 一种基于卷积神经网络进行目标检测的方法
CN115239644B (zh) 混凝土缺陷识别方法、装置、计算机设备和存储介质
CN116433903A (zh) 实例分割模型构建方法、系统、电子设备及存储介质
CN114519819B (zh) 一种基于全局上下文感知的遥感图像目标检测方法
Ayachi et al. Real-time implementation of traffic signs detection and identification application on graphics processing units
CN116645592A (zh) 一种基于图像处理的裂缝检测方法和存储介质
CN114022752B (zh) 基于注意力特征精细化及对齐的sar目标检测方法
CN116310688A (zh) 基于级联融合的目标检测模型及其构建方法、装置及应用
CN116824543A (zh) 一种基于od-yolo的自动驾驶目标检测方法
CN113221855A (zh) 基于尺度敏感损失与特征融合的小目标检测方法和系统
CN113537026A (zh) 建筑平面图中的图元检测方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17812733

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17812733

Country of ref document: EP

Kind code of ref document: A1