US10489913B2 - Methods and apparatuses, and computing devices for segmenting object - Google Patents
Methods and apparatuses, and computing devices for segmenting object Download PDFInfo
- Publication number
- US10489913B2 US10489913B2 US15/857,304 US201715857304A US10489913B2 US 10489913 B2 US10489913 B2 US 10489913B2 US 201715857304 A US201715857304 A US 201715857304A US 10489913 B2 US10489913 B2 US 10489913B2
- Authority
- US
- United States
- Prior art keywords
- local candidate
- candidate regions
- processor
- image
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
- G06T2207/30261—Obstacle
Definitions
- Image segmentation is a basic issue in the field of image processing and is widely used in the fields of object identification, robot navigation, scene understanding, and the like.
- Different objects in an image may be separated from one another by using an image segmentation technique. Rapidly segmenting the objects in the image and determining boundaries of the objects are critical in image segmentation.
- the present disclosure relates to the technical field of computer vision, and in particular, to methods, apparatuses and computing devices for segmenting an object, and provides an object segmentation solution.
- a method for segmenting an object includes:
- an apparatus for segmenting an object includes:
- a local candidate region generation module configured to select, for an image to be processed, multiple local candidate regions according to two or more different preset scales respectively;
- an image segmentation module configured to perform image segmentation processing on two or more local candidate regions, to predict and obtain binary segmentation masks of the two or more local candidate regions;
- an image classification module configured to perform image classification processing on the two or more local candidate regions, to predict and obtain object classes to which the two or more local candidate regions belong;
- an image fusion module configured to fuse the two or more local candidate regions according to the object classes to which the two or more local candidate regions belong and the binary segmentation masks of the two or more local candidate regions, to obtain an object segmentation image.
- a computing device including: a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory communicate with one another via the communication bus;
- the memory is configured to store at least one instruction for causing the processor to execute the following operations:
- an apparatus for segmenting an object including:
- a memory for storing instructions executable by the processor
- processor is configured to:
- a non-transitory computer-readable medium for storing computer readable instructions.
- the instructions include: an instruction for selecting, for an image to be processed, multiple local candidate regions according to two or more different preset scales respectively; an instruction for performing image segmentation processing on two or more local candidate regions, to predict and obtain binary segmentation masks of the local candidate regions; an instruction for performing image classification processing on the two or more local candidate regions, to predict and obtain object classes to which the local candidate regions belong respectively; and an instruction for fusing the two or more local candidate regions according to the object class to which the two or more local candidate regions belong and the binary segmentation masks of the two or more local candidate regions, to obtain an object segmentation image.
- the technical solution provided by the present disclosure adopts a multiscale local candidate region generating approach, uses multiscale features of an image, thereby facilitating improving fault-tolerant ability of the object segmentation technique.
- the present disclosure may segment each of objects while detecting the objects, and determine the precise boundary thereof.
- the present disclosure uses an effective local region fusion approach after obtaining a segmentation result of the local candidate region by segmenting the local candidate region to facilitate improving the object segmentation effect.
- FIG. 1 shows a schematic diagram of an application scene according to an embodiment of the present disclosure
- FIG. 2 shows a schematic diagram of another application scene according to an embodiment of the present disclosure
- FIG. 3 shows a block diagram for implementing an exemplary device according to an embodiment of the present disclosure
- FIG. 4 shows a block diagram for implementing another exemplary device according to an embodiment of the present disclosure
- FIG. 5 shows a flow chart of a method for segmenting an object provided in the present disclosure
- FIG. 6 shows another flow chart of the object segmentation method provided in the present disclosure
- FIG. 7 shows a schematic diagram of a network model of a method for segmenting an object provided in the present disclosure
- FIG. 8 shows a schematic diagram of an overlapping situation of local candidate regions provided in the present disclosure
- FIG. 9 shows a flow chart for fusing all the local candidate regions provided in the present disclosure.
- FIG. 10 shows a block diagram of a functional structure of an object segmentation apparatus provided in the present disclosure
- FIG. 11 shows a block diagram of another functional structure of the object segmentation apparatus provided in the present disclosure.
- FIG. 12 shows a block diagram of a computing device for executing a method for segmenting an object according to an embodiment of the present disclosure.
- FIG. 13 shows a storage unit for holding or carrying program codes for implementing a method for segmenting an object according to the present disclosure.
- FIG. 1 exemplarily shows an application scenario in which the present disclosure may be implemented.
- a driving assistance system is installed in an automobile 1 .
- the driving assistance system in the automobile 1 needs to segment objects, such as a pedestrian 2 , a vehicle, and traffic signal lights 3 , in a road environment presented by a captured image, to better identify the road environment in the image.
- objects such as a pedestrian 2 , a vehicle, and traffic signal lights 3
- a road environment presented by a captured image to better identify the road environment in the image.
- objects such as a pedestrian 2 , a vehicle, and traffic signal lights 3
- the objects that are adjacent to each other may be segmented to accurately identify the objects on the road.
- FIG. 2 exemplarily shows another application scenario in which the present disclosure may be implemented.
- four chairs 20 surround a square table 21 .
- a robot 22 needs to perform object segmentation on four chairs 20 and a square table 21 in an image acquired by an image acquiring apparatus thereof, to accurately identify a chair 20 to be fetched or a square table 21 to be moved.
- the present disclosure provides an object segmentation solution.
- a method for generating multiscale local candidate regions is used to generate local candidate regions, and multiscale features of the image are fully used, thereby enabling the object segmentation technique of the present disclosure to possess a certain fault-tolerant ability.
- image classification processing is performed on the local candidate regions while performing image segmentation processing on the local candidate regions, thereby realizing segmentation of each of objects in the image while detecting the objects.
- a segmentation result of the local candidate region and an object class to which the local candidate region belong are obtained, and then the segmentation result and the object class is used to fuse the two or more local candidate regions, thus the technical solution of object segmentation based on multilevel local region fusion is formed, and the object segmentation technique of the present disclosure facilitates improving the object segmentation effect.
- FIG. 3 shows a block diagram of an exemplary device 30 (for example, a computer system/server) suitable for implementing the present disclosure.
- the device 30 shown in FIG. 3 is merely an example, and shall not bring any limitation to the functions and usage scopes of the present disclosure.
- the device 30 may be represented in a form of a universal computing device.
- Components of the device 30 may include, but not limited to: one or more processing units 301 (i.e., a processor), a system memory 302 , and a bus 303 for connecting different system components (including the system memory 302 and the processing unit 301 ).
- the device 30 may include multiple computer system readable media.
- the media may be any available medium accessible by the device 30 , including volatile or nonvolatile media, movable or immovable media, etc.
- the system memory 302 may include a computer system readable medium in a form of a volatile memory, for example, a random-access memory (RAM) 3021 and/or a cache memory 3022 .
- the device 30 may further include other movable/immovable computer system storage media or volatile/nonvolatile computer system storage media. Only as an example, an ROM 3023 may be used to read or write an immovable and nonvolatile magnetic medium (not shown in FIG. 3 , generally referred to as a “hard disk driver”). Although not shown in FIG.
- the system memory 302 may provide a disc driver for reading and writing a movable and nonvolatile magnetic disc (e.g., a “floppy disk”), and an optical disk driver for reading and writing a movable and nonvolatile optical disk (e.g., CD-ROM, DVD-ROM, or other optical media). Under these circumstances, each driver may be connected to the bus 303 by one or more data medium interfaces.
- the system memory 302 may include at least one program product having a set of (e.g., at least one) program modules, which are configured to execute the functions of the present disclosure.
- a program/utility tool 3025 having a set of (at least one) program modules 3024 may be stored in the system memory 302 for example, and such program modules 3024 include, but not limited to: an operation system, one or more application programs, other program modules and program data, each or a certain combination thereof may be used for implementation of a network environment.
- the program modules 3024 are generally configured to execute the functions and/or methods described in the present disclosure.
- the device 30 may also communicate with one or more external devices 304 (such as a keyboard, a pointing device, a display and/or the like). The communication may be performed via an Input/Output (I/O) interface 305 . Moreover, the device 30 may further communicate with one or more networks (such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via a network adapter 306 . As shown in FIG. 3 , the network adapter 306 communicates with other modules (such as the processing unit 301 and the like) of the device 30 via the bus 303 . It should be understood that although not shown in FIG. 3 , other hardware and/or software modules may be used through the device 30 .
- external devices 304 such as a keyboard, a pointing device, a display and/or the like. The communication may be performed via an Input/Output (I/O) interface 305 .
- the device 30 may further communicate with one or more networks (such as a Local Area
- the processing unit 301 executes various functional applications and data processing by operating a computer program stored in the system memory 302 .
- the processing unit 301 executes instructions for implementing the steps of the above method.
- the processing unit 301 may execute the computer program stored in the system memory 302 , and when the computer program is executed, the following steps are performed: selecting, for an image to be processed, multiple local candidate regions according to two or more different preset scales respectively; performing image segmentation processing on two or more local candidate regions, to predict and obtain binary segmentation masks of the local candidate regions; performing image classification processing on the two or more local candidate regions, to predict and obtain an object classes to which the two or more local candidate regions belong; and fusing the two or more local candidate regions according to the object classes to which the two or more local candidate regions belong and the binary segmentation masks of the two or more local candidate regions, to obtain an object segmentation image.
- FIG. 4 shows an exemplary device 400 suitable for implementing the present disclosure.
- the device 400 may be a mobile terminal, a Personal Computer (PC), a tablet computer, a server and the like.
- the computer system 400 includes one or more processors, a communication unit and the like.
- the one or more processors may be: one or more central processing units (CPUs) 401 , one or more image processors (GPUs) 413 and/or the like.
- the processor may execute various appropriate actions and processing according to executable instructions stored in a read only memory (ROM) 402 or according to executable instructions loaded into a random-access memory (RAM) 403 from a storage part 408 .
- a communication unit 412 may include, but not limited to, a network card.
- the network card may include, but not limited to, an IB (Infiniband) network card.
- the processor may communicate with the read only memory 402 and/or random-access memory 430 to execute the executable instructions.
- the processor is connected to the communication unit 412 via a bus 404 , and communicates with other target devices via the communication unit 412 so as to complete corresponding steps in the present disclosure.
- the steps executed by the processor include: selecting, for an image to be processed, multiple local candidate regions according to two or more different preset scales respectively; performing image segmentation processing on two or more local candidate regions, to predict and obtain binary segmentation masks of the local candidate regions; performing image classification processing on the two or more local candidate regions, to predict and obtain object classes to which the two or more local candidate regions belong; and fusing the two or more local candidate regions according to the object classes to which the two or more local candidate regions belong and the binary segmentation masks of the two or more local candidate regions, to obtain an object segmentation image.
- various programs and data required during operating an apparatus may further be stored in the RAM 403 .
- the CPU 401 , the ROM 402 and the RAM 403 are connected with one another via the bus 404 .
- the ROM 402 is an optional module.
- the RAM 403 stores executable instructions or writes, into the ROM 402 during operation, the executable instructions, which causes the central processing unit 401 to execute the steps included in the method for segmenting an object.
- An input/output (I/O) interface 405 is also connected to the bus 404 .
- the communication unit 412 may be configured integrally, or may be configured to have multiple sub-modules (for example, multiple IB network cards) separately connected to the bus.
- the following members are connected to the I/O interface 405 : an input part 406 including a keyboard, a mouse and the like; an output part 407 including a Cathode-Ray Tube (CRT), a Liquid Crystal Display (LCD), a loudspeaker and the like; a storage part 408 including a hard disk and so on; and a communication part 409 including a network interface card such as a LAN card and a modem, etc.
- the communication part 409 performs communication processing via a network such as the Internet.
- the driver 410 is also connected to the I/O interface 405 as required.
- a detachable medium 411 such as a magnetic disc, an optical disk, a magnetic optical disk, a semiconductor memory and the like, are installed on the driver 410 as required, so that a computer program read therefrom is installed on the storage part 408 as required.
- the framework shown in FIG. 4 is only an optical implementation approach.
- the number and types of the members in FIG. 4 may be selected, deleted, increased or replaced.
- implementation approaches such as separate configuration or integrated configuration may also be adopted.
- a GPU may be separated from a CPU.
- the GPU may be integrated on the CPU.
- the communication part may be configured separately, or may be integrated on the CPU or GPU.
- a process described with reference to a flow chart below may be implemented as a computer software program.
- the embodiment of the present disclosure includes a computer program product, which includes a computer program tangibly included on a machine readable medium.
- the computer program includes program codes for executing the steps shown in the flow chart.
- the program codes may include instructions for correspondingly executing the steps provided in the present disclosure, for example, an instruction for respectively selecting, for an image to be processed, multiple local candidate regions according to two or more different preset scales respectively; an instruction for performing image segmentation processing on two or more local candidate regions, to predict and obtain binary segmentation masks of the local candidate regions; an instruction for performing image classification processing on the two or more local candidate regions, to predict and obtain an object classes to which the two or more local candidate regions belong; and an instruction for fusing the two or more local candidate regions according to the object classes to which the two or more local candidate regions belong and the binary segmentation masks of the two or more local candidate regions, to obtain an object segmentation image.
- the computer program may be downloaded and installed from the network via the communication part 409 , and/or the computer program may be installed from the detachable medium 411 .
- the computer program is executed by the central processing unit (CPU) 401 , the instructions recited in the present disclosure are executed.
- step S 101 for an image to be processed, the processor selects multiple local candidate regions according to two or more different preset scales respectively.
- the step S 101 may be executed by a local candidate region generation module 60 shown in FIGS. 10 and 11 operated by the processor, in which, FIG. 10 further shows an image segmentation module 61 an image classification module 62 , and an image fusion module 63 , which are also shown in FIG. 11 .
- the present disclosure provides a solution for generating multiscale local candidate regions, in which one object in the image to be processed may be segmented into multiple local candidate regions for study.
- the selected local candidate regions are used as an object for subsequent image segmentation processing and image classification processing.
- step S 102 the processor performs image segmentation processing on two or more local candidate regions, to predict and obtain binary segmentation masks of the local candidate regions.
- the step S 102 may be executed by the image segmentation module 61 operated by the processor.
- the processor performs image segmentation processing on each local candidate region by taking the local candidate regions as objects to be input for processing, to predict the binary mask of each local candidate region.
- step S 103 the processor performs image classification processing on the two or more local candidate regions, to predict and obtain object classes to which the two or more local candidate regions belong.
- the step S 103 may be executed by the image classification module 62 operated by the processor.
- the processor takes the local candidate regions as objects to be input for processing, and performs image classification processing on each local candidate region to predict the object class to which each local candidate region belongs.
- the processor may execute step S 102 and step S 103 simultaneously or in sequence, the sequence for executing the two steps by the processor is not limited in the present disclosure.
- step S 104 the processor fuses the two or more local candidate regions (e.g., all local candidate regions) according to the object class to which the two or more local candidate regions (e.g., all local candidate regions) belong and the binary segmentation masks of the two or more local candidate regions (e.g., all local candidate regions), to obtain an object segmentation image, i.e., obtain an object individual segmentation result.
- the step S 104 may be executed by the image fusion module 63 operated by the processor.
- the processor fuses an object local segmentation result and an object local classification result that are obtained from the local candidate regions generated by the solution for generating multiscale local candidate regions to finally obtain the object individual segmentation result of the entire image.
- the object segmentation technique provided in the present disclosure uses a method for generating multiscale local candidate regions, and the object segmentation technique is enabled to have a certain fault-tolerant ability by utilizing the multiscale features of the image.
- the present disclosure can segment each of objects in the image while detecting the objects in the image, and determine the boundaries thereof.
- the local candidate regions are firstly segmented so as to obtain segmentation result of the local candidate regions, and then a local region fusion approach is used to accurately determine the objects in the image.
- step S 201 the processor performs convolution layer processing 3 - 1 and/or pooling layer processing, by a convolutional neural network on an image to be processed 3 - 0 , to obtain an intermediate result 3 - 2 of the convolutional neural network.
- the step S 201 may be executed by a convolutional neural network calculation module 64 in FIG. 11 operated by the processor, in which, FIG. 11 further shows a loss training module 65 .
- the image to be processed 30 may be an image of 384 ⁇ 384 ⁇ 3, where 384 ⁇ 384 represents a size of the image to be processed 3 - 0 , and 3 represents the number of channels (for example, R, G, and B).
- the size of the image to be processed 30 is not limited in the present disclosure.
- one nonlinear response unit is provided behind some or each of convolutional layers.
- the nonlinear response unit refers to a Rectified Linear Unit (hereinafter referred to as ReLU).
- ReLU Rectified Linear Unit
- the ReLUs is provided behind the convolutional layers such that a mapping result of the convolutional layers is sparse as much as possible to simulate visual reactions of a human, thereby facilitating improving the image processing effect.
- the present disclosure may configure a convolution kernel of the convolutional layer in the convolutional neural network according to actual situations. For example, in view of facilitating factors such as synthesis of local information and the like, the convolution kernel of the convolutional layer in the convolutional neural network is generally set to be 3 ⁇ 3 in the present disclosure.
- the convolution kernel may also be set to be 1 ⁇ 1, 2 ⁇ 2, or 4 ⁇ 4.
- a step length (stride) of the pooling layer may be set so as to facilitate broadening the vision without increasing a calculation amount of the upper layer feature.
- the step length (stride) of the pooling also has a feature for enhancing space invariance, i.e., when a same input appears at different image positions, a same output result response is obtained.
- the convolutional layer of the convolutional neural network is mainly used to conclude and fuse information.
- Maximum pooling layer (Max pooling) is mainly used to conclude high-level information.
- a structure of the convolutional neural network may be fine-tuned to accommodate a tradeoff between different performance and efficiencies.
- an intermediate result 3 - 2 of the convolutional neural network is obtained as follows:
- Parameters for the convolutional layer are inside a bracket behind “convolutional layer”, for example, 3 ⁇ 3 ⁇ 64 indicates that a size of a convolution kernel is 3 ⁇ 3, and the number of channels is 64.
- Parameters for the pooling layer are inside a bracket behind “pooling layer”. For example, 3 ⁇ 3/2 indicates that a size of the pooling kernel is 3 ⁇ 3 and an interval is 2.
- 24 ⁇ 24 ⁇ 512 indicates a size of an intermediate result 3 - 2 of the convolutional neural network
- the size of the intermediate result 3 - 2 of the convolutional neural network varies with a size of the image to be processed 3 - 0 .
- the size of the intermediate result 3 - 2 of the convolutional neural network will also be correspondingly increased.
- the intermediate result 3 - 2 of the convolutional neural network is data used collectively for subsequence image classification processing and image segmentation processing. Using the intermediate result 3 - 2 of the convolutional neural network may reduce complexity of subsequence processing in a great extent.
- step S 202 the processor selects a local candidate region generation layer 3 - 3 by utilizing the intermediate result 3 - 2 of the convolutional neural network.
- the processor selects, according to two or more different preset scales respectively, multiple local candidate regions 3 - 4 on a feature map corresponding to the local candidate region generation layer 3 - 3 through a sliding window.
- the step S 202 may be executed by the local candidate region generation module 60 operated by the processor.
- the present disclosure segments one object in the image to be processed 3 - 0 into multiple local candidate regions 3 - 4 for study.
- the present disclosure may select four local candidate regions 3 - 4 with different preset scales, i.e., a local candidate region 3 - 4 with a preset scale of 48 ⁇ 48 (i.e., a block at the top of the right side of a brace in FIG. 7 ), a local candidate region 3 - 4 with a preset scale of 96 ⁇ 96 (i.e., a block at the middle of the right side of the brace in FIG. 7 ), a local candidate region 3 - 4 with a preset scale of 192 ⁇ 192 (i.e., a block at the bottom of the right side of the brace in FIG.
- the processor respectively selects the local candidate regions 3 - 4 according to the multiple different preset scales by controlling the sliding window to slide on the feature map corresponding to the local candidate region generation layer 3 - 3 .
- respective feature points in the feature map covered by the sliding window during each sliding form a set of feature points, and feature points included in different sets are not completely identical.
- the feature map may be a feature map obtained by performing, by the processor, corresponding processing on the image to be processed 3 - 0 , for example, a feature map obtained by performing, by the processor, a convolution calculation on the image to be processed 3 - 0 using the VGG16 (Visual Geometry Group) network, GoogleNet (Google Network) or ResNet technique.
- VGG16 Visual Geometry Group
- GoogleNet Google Network
- ResNet ResNet
- Each local candidate region P i (1 ⁇ i ⁇ N, and N is the number of the local candidate regions) is represented in a form of (r, c, h, w), where (r, c) is the coordinate of a top left corner of the local candidate region 3 - 4 ; h and w respectively are a height value and a width value of the local candidate region 3 - 4 .
- the processor may enable the sliding window to slide at a preset stride. For example, the processor controls the sliding window to slide at a stride of 16.
- each local candidate region P i corresponds to a down-sampled feature grid G i
- G i may be represented in a form of
- the present disclosure uses the collectively used intermediate result 3 - 2 of the convolutional neural network to select the local candidate regions 3 - 4 on the feature map corresponding to the selected convolutional layer (the local candidate region generation layer 3 - 3 ) with multiple different preset scales respectively, and no computing cost is increased.
- the present disclosure since multiple preset scales are selected, objects of different sizes may be covered as many as possible.
- Each local candidate region 3 - 4 may cover a part of the objects in the image and does not need to completely contain the objects, and therefore, information learnt from each local candidate region is richer.
- the processor in the present disclosure performs deconvolution layer and/or unpooling layer processing to unify the local candidate regions of different sizes to a fixed size.
- deconvolution layer and/or unpooling layer processing may be utilized to unify the spatial sizes to be a fixed size, for example, 12 ⁇ 12, 10 ⁇ 10, 11 ⁇ 11, or 13 ⁇ 13.
- the deconvolution technique is adopted for up-sampling processing to make each G i has a spatial size of 12 ⁇ 12.
- the (2 ⁇ 2/2) maximum pooling technique is utilized to make G i have a spatial size of 12 ⁇ 12.
- step S 203 the processor performs image segmentation processing on each local candidate region, to predict and obtain the binary segmentation mask 3 - 5 of the local candidate region.
- the step S 203 may be executed by the image segmentation module 61 operated by the processor.
- G i is taken as an input and meanwhile the intermediate result 32 of the convolutional neural network is used to perform image segmentation processing on each local candidate region 34 , to predict the binary mask M i of each local candidate region 32 .
- the local candidate region P i corresponds to the calibrated object O n in the present embodiment.
- a binary mask M i of the local candidate region P i necessarily should belong to a part of the calibrated object O n .
- the above-mentioned calibrated object is generally an object that is manually calibrated in advance.
- a process for predicting the binary mask 3 - 5 i.e., a binary image consisting of 0 and 1) by the processor is as follows:
- 1 ⁇ 1 ⁇ 2304 represents a size of a convolution kernel of the convolutional layer involved in the image segmentation process.
- Reconstruction indicates that the local candidate regions, obtained after the respective conventions processed, are rearranged, so as to form a binary mask 3 - 5 with a size of 48 ⁇ 48.
- step S 204 the processor performs image classification processing on each local candidate region, to predict and obtain an object class to which the local candidate region belongs.
- Step S 204 may be executed by the image classification module 62 operated by the processor.
- the above-mentioned object class may be an object class in an existing data set such as a PASCAL VOC (Pattern Analysis, Statistical modeling and Computational Learning Visual Object Classes) data set, etc.
- G i is further taken as an input.
- the processor uses the intermediate result 3 - 2 of the convolutional neural network to perform image classification processing on each local candidate region to predict an object class l i to which the each local candidate region belongs.
- the local candidate region P i meets the following three conditions at a same time, it is considered that the local candidate region P i belongs to the calibrated object O n :
- a center of the local candidate region P i is located within the calibrated object O n ; for example, if the calibrated object O n has an external frame and the center of the local candidate region P i is located within the external frame of the calibrated object O n , it is determined that the center of the local candidate region P i is located within the calibrated object O n ;
- a proportion of an area of the calibrated object O n in the local candidate region P i to an area of the calibrated object O n is greater than a first threshold (50 ⁇ the first threshold ⁇ 75), for example, greater than 50%;
- a proportion of an area of the calibrated object O n in the local candidate region P i to an area of the local candidate region P i is greater than a second threshold (the second threshold is generally smaller than the first threshold, for example, 10 ⁇ the second threshold ⁇ 20), for example, greater than 20%.
- a process for predicting the class by the processor is as follows:
- 1 ⁇ 1 ⁇ 4096 and 1 ⁇ 1 ⁇ 21 represent the size of the convolution kernel of the convolutional layer involved in the image classification process.
- the processor may execute the step S 203 and step S 204 simultaneously or in sequence, the sequence for executing the above two steps by the processor is not limited in the present disclosure.
- step S 205 the processor trains a loss of the image classification and image segmentation by using a preset loss training function.
- Step S 205 may be executed by the loss training module 65 operated by the processor.
- a following loss training function is preset, which enables the processor to determine whether the image classification and image segmentation are accurately combined:
- W is a network parameter
- f c (P i ) is a classification loss of the local candidate region P i , and corresponds to layer 44 in the above-mentioned example
- f s (P i ) is a loss of the segmentation mask of the local candidate region P i , and corresponds to layer 37 in the above-mentioned example
- ⁇ is a weight for adjusting f c (P i ) and f s (P i ), and can be set as 1; and 1 ⁇ i ⁇ N, where N is the number of the local candidate regions.
- the loss training function adopted by the processor of the present disclosure is not limited to the above specific form.
- the processor is capable of effectively training the convolutional neural network as shown in FIG. 7 that is designed by the present disclosure.
- step S 206 according to the object classes to which the two or more local candidate regions belong and to the binary segmentation masks of the two or more local candidate regions, the processor fuses the two or more local candidate regions to obtain an object segmentation image.
- Step S 206 may be executed by an image fusion module 63 operated by the processor, for example, the image fusion module 63 fuses all local candidate regions 3 - 4 according to the object classes to which the local candidate regions belong and to the binary segmentation masks 3 - 5 of the local candidate regions 3 - 4 , so as to obtain an object segmentation image.
- FIG. 8 shows a schematic diagram of an overlapping situation of local candidate regions provided by the present disclosure.
- a parameter that reflects an overlapped area of the binary segmentation masks 3 - 5 of the two local candidate regions 3 - 4 is defined as IoU (Intersection over Union).
- the processor uses the sliding window to select several local candidate regions, and the processor determines which local candidate regions shall be assigned as a same object by computing the IoU and the object classes to which the local candidate regions belong, so as to fuse all local candidate regions.
- An example for determining whether the overlapped area between the binary segmentation masks meets predetermined requirements is that: the processor obtains binary masks of multiple local candidate regions through the sliding window, i.e., 4 - 1 , 4 - 2 , 4 - 3 , 4 - 4 and 4 - 5 in FIG. 8 , while three blocks in the image to be processed 4 - 0 correspond to the corresponding binary masks of the local candidate regions.
- the operation that the processor (for example, the image fusion module 63 operated by the processor) fuses at least two local candidate regions (for example, all the local candidate regions) includes: determining, by the processor, an overlapped area between binary segmentation masks of two adjacent local candidate regions; and in response to the overlapped area being greater than a preset threshold, the two adjacent local candidate regions belonging to a same object class, and neither of the two adjacent local candidate regions being assigned as an object, the processor generates a new object and assigns the two adjacent local candidate regions as the object.
- an operation that the processor (for example, the image fusion module 63 operated by the processor) fuses all of the local candidate regions includes: determining, by the processor, an overlapped area between the binary segmentation masks of two adjacent local candidate regions; and in response to the overlapped area being greater than a preset threshold, the two adjacent local candidate regions belonging to a same object class, and one of the two adjacent local candidate regions being assigned as one object, merging, by the processor, the two adjacent local candidate regions and assigning the other local candidate region as the object.
- an operation that the processor (for example, the image fusion module 63 operated by the processor) fuses all of the local candidate regions includes: determining, by the processor, an overlapped area between the binary segmentation masks of two adjacent local candidate regions; and in response to the overlapped area being greater than a preset threshold, the two adjacent local candidate regions belonging to a same object class, and the two adjacent local candidate regions being assigned as two objects, merging, by the processor, the two objects.
- FIG. 9 shows a flow chart of fusing all local candidate regions provided by the present disclosure.
- the fusion process executed by the processor includes the following steps.
- step S 2061 the processor computes an overlapped area of the binary segmentation masks of two adjacent local candidate regions.
- the adjacent local candidate regions include adjacent local candidate regions in the row dimension and adjacent local candidate regions in the column dimension.
- the adjacent local candidate regions in the row dimension generally refer to adjacent local candidate regions in a horizontal direction
- the adjacent local candidate regions in the column dimension generally refer to adjacent local candidate regions in a vertical direction.
- step S 2062 the processor determines whether the overlapped area is greater than a preset threshold; if the overlapped area is greater than the preset threshold, the processor executes step S 2063 ; and otherwise, the processor executes step S 2067 .
- step S 2063 the processor determines whether the two adjacent local candidate regions belong to a same object class; if the two adjacent local candidate regions belong to the same object class, the processor executes step S 2064 ; and otherwise, the processor executes step S 2067 .
- step S 2064 the processor determines whether neither of the two adjacent local candidate regions is assigned as an object; if neither of the two adjacent local candidate regions is assigned as the object, the processor executes step S 2065 ; and otherwise, the processor executes step S 2066 ;
- step S 2065 the processor generates a new object, and assigns the two adjacent local candidate regions as the object, and the processor executes step S 2067 .
- step S 2066 if one of the two adjacent local candidate regions is assigned as an object, the processor merges the two adjacent local candidate regions, and the processor assigns the other local candidate region as the object; and if the two adjacent local candidate regions are assigned as two objects, the processor merges the two object, and the processor executes step S 2067 .
- step S 2067 the processor determines whether all the local candidate regions are assigned as corresponding objects, and if all the local candidate regions are assigned as corresponding objects, go to step S 2068 , and the fusion process of the present disclosure ends; and otherwise, the processor continues to execute step S 2061 , that is, the processor executes steps S 2061 to S 2066 repeatedly, until all the local candidate regions are assigned as corresponding objects, and a list of all the objects is finally obtained, so that the processor obtains the object segmentation image.
- local candidate regions for an object are generated. It is possible that one object may be covered by multiple local candidate regions, so that objects of different sizes may be covered. Moreover, each local candidate region may cover a part of the object, but does not need to completely cover the object, so that richer information may be learnt from each local candidate region, thus facilitating improvement of robustness of the object segmentation technique. Meanwhile, by synthesizing the object boundary using multiple local candidate regions, object segmentation results and results of different classifiers can be combined according to the synthesis of the image classification result and image segmentation result of different local candidate regions, thus facilitating improvement of the accuracy of the object segmentation result. The present disclosure can enable a final result to guide the current local candidate region in selecting module by jointly optimizing the local candidate regions, and enable the result to be more accurate. The present disclosure may use unified deep learning to achieve an end-to-end entire object individual segmentation training and test.
- modules or units or components in the present disclosure may be combined into one module or unit or component, and besides, the modules or units or components in the present disclosure may also be segmented into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, all features disclosed in the specification (including the accompanying claims, abstract and drawings) and all processes or units of any method or apparatus disclosed in this way may be combined by employing any combination. Unless otherwise explicitly stated, each feature disclosed in the specification (including accompanying claims, abstract and drawings) may be replaced by an alternative feature that provides identical, equivalent or similar objective.
- Each embodiment regarding members in the present disclosure may be implemented with hardware, or may be implemented with a software module operating on one or more processors, or may be implemented with a combination thereof.
- a microprocessor or Digital Signal Processor may be used in practice to achieve some or all functions of some or all members in the device that obtains application information according to the embodiments of the present disclosure.
- the present disclosure may also be implemented in a device or apparatus program (for example, a computer program and a computer program product) for executing some or all of the methods described herein.
- the programs implementing the present disclosure may be stored on a computer readable medium, and may be in the form of having one or more signals.
- the signals may be obtained by downloading from an Internet website, or provided on a carrier signal, or provided in any other forms.
- FIG. 12 shows a computing device that may implement the object segmentation method in the present disclosure.
- the computing device conventionally includes a processor 810 and a computer program product or a computer readable medium in a form of a storage device 820 .
- the computing device further includes a communication interface and a communication bus.
- the storage device 820 may be, for example, a flash memory, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM, hard disk, ROM, or other electronic memories.
- the processor, the communication interface and the memory communicate with one another via the communication bus.
- the storage device 820 has a storage space 830 that stores program codes 831 for executing the steps in the method above, and is configured to store at least one instruction for causing the processor to execute various steps in the object segmentation method in the present disclosure.
- the storage space 830 for storing program codes may include each program code 831 for implementing each step of the method above.
- These program codes may be read from one or more computer program products or written into the one or more computer program products.
- These computer program products include a program code carrier such as, for example, a hard disk, a Compact Disk (CD), a memory card, or a floppy disk.
- Such computer program product generally is a portable or fixed storage unit as shown in FIG. 13 , for example.
- the storage unit may have a storage section, a storage space and the like arranged in a similar way as the storage device 820 in the computing device in FIG. 12 .
- the program code may, for example, be compressed in an appropriate form.
- the storage unit includes a computer readable code 831 ′ for executing the steps of the method according to the present disclosure, i.e., code readable by the processor such as 810 . When these codes are operated by the computing device, the computing device is caused to execute each step in the method described above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Description
It can be known from the description that, after each sliding of the sliding window on the feature map, one local candidate region 3-4 and one feature grid are formed, and spatial sizes of the feature grid and the local candidate region 3-4 are determined by the sliding window.
Claims (18)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610425391.0 | 2016-06-15 | ||
CN201610425391.0A CN106097353B (en) | 2016-06-15 | 2016-06-15 | Method for segmenting objects and device, computing device based on the fusion of multi-level regional area |
CN201610425391 | 2016-06-15 | ||
PCT/CN2017/088380 WO2017215622A1 (en) | 2016-06-15 | 2017-06-15 | Object segmentation method and apparatus and computing device |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/088380 Continuation WO2017215622A1 (en) | 2016-06-15 | 2017-06-15 | Object segmentation method and apparatus and computing device |
Publications (2)
Publication Number | Publication Date |
---|---|
US20180144477A1 US20180144477A1 (en) | 2018-05-24 |
US10489913B2 true US10489913B2 (en) | 2019-11-26 |
Family
ID=57235471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/857,304 Active 2037-10-09 US10489913B2 (en) | 2016-06-15 | 2017-12-28 | Methods and apparatuses, and computing devices for segmenting object |
Country Status (3)
Country | Link |
---|---|
US (1) | US10489913B2 (en) |
CN (1) | CN106097353B (en) |
WO (1) | WO2017215622A1 (en) |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106097353B (en) | 2016-06-15 | 2018-06-22 | 北京市商汤科技开发有限公司 | Method for segmenting objects and device, computing device based on the fusion of multi-level regional area |
IL297846B2 (en) | 2016-11-15 | 2023-12-01 | Magic Leap Inc | Deep learning system for cuboid detection |
CN106845631B (en) * | 2016-12-26 | 2020-05-29 | 上海寒武纪信息科技有限公司 | Stream execution method and device |
CN106846323B (en) * | 2017-01-04 | 2020-07-10 | 珠海大横琴科技发展有限公司 | Method, device and terminal for realizing interactive image segmentation |
CN108303747B (en) * | 2017-01-12 | 2023-03-07 | 清华大学 | Inspection apparatus and method of detecting a gun |
CN110838124B (en) * | 2017-09-12 | 2021-06-18 | 深圳科亚医疗科技有限公司 | Method, system, and medium for segmenting images of objects having sparse distribution |
EP3625767B1 (en) | 2017-09-27 | 2021-03-31 | Google LLC | End to end network model for high resolution image segmentation |
CN107833224B (en) * | 2017-10-09 | 2019-04-30 | 西南交通大学 | A kind of image partition method based on the synthesis of multilayer sub-region |
US10559080B2 (en) * | 2017-12-27 | 2020-02-11 | International Business Machines Corporation | Adaptive segmentation of lesions in medical images |
CN108875537B (en) * | 2018-02-28 | 2022-11-08 | 北京旷视科技有限公司 | Object detection method, device and system and storage medium |
CN108805898B (en) * | 2018-05-31 | 2020-10-16 | 北京字节跳动网络技术有限公司 | Video image processing method and device |
CN108898111B (en) * | 2018-07-02 | 2021-03-02 | 京东方科技集团股份有限公司 | Image processing method, electronic equipment and computer readable medium |
CN108710875B (en) * | 2018-09-11 | 2019-01-08 | 湖南鲲鹏智汇无人机技术有限公司 | A kind of take photo by plane road vehicle method of counting and device based on deep learning |
US10846870B2 (en) * | 2018-11-29 | 2020-11-24 | Adobe Inc. | Joint training technique for depth map generation |
CN111292335B (en) * | 2018-12-10 | 2023-06-13 | 北京地平线机器人技术研发有限公司 | Method and device for determining foreground mask feature map and electronic equipment |
CN109977997B (en) * | 2019-02-13 | 2021-02-02 | 中国科学院自动化研究所 | Image target detection and segmentation method based on convolutional neural network rapid robustness |
CN111582432B (en) * | 2019-02-19 | 2023-09-12 | 嘉楠明芯(北京)科技有限公司 | Network parameter processing method and device |
CN109934153B (en) * | 2019-03-07 | 2023-06-20 | 张新长 | Building extraction method based on gating depth residual error optimization network |
CN110084817B (en) * | 2019-03-21 | 2021-06-25 | 西安电子科技大学 | Digital elevation model production method based on deep learning |
CN111553923B (en) * | 2019-04-01 | 2024-02-23 | 上海卫莎网络科技有限公司 | Image processing method, electronic equipment and computer readable storage medium |
CN110070056B (en) * | 2019-04-25 | 2023-01-10 | 腾讯科技(深圳)有限公司 | Image processing method, image processing apparatus, storage medium, and device |
CN110119728B (en) * | 2019-05-23 | 2023-12-05 | 哈尔滨工业大学 | Remote sensing image cloud detection method based on multi-scale fusion semantic segmentation network |
CN110222829A (en) * | 2019-06-12 | 2019-09-10 | 北京字节跳动网络技术有限公司 | Feature extracting method, device, equipment and medium based on convolutional neural networks |
CN110807361B (en) * | 2019-09-19 | 2023-08-08 | 腾讯科技(深圳)有限公司 | Human body identification method, device, computer equipment and storage medium |
CN110648340B (en) * | 2019-09-29 | 2023-03-17 | 惠州学院 | Method and device for processing image based on binary system and level set |
CN110807779A (en) * | 2019-10-12 | 2020-02-18 | 湖北工业大学 | Compression calculation ghost imaging method and system based on region segmentation |
US11763565B2 (en) * | 2019-11-08 | 2023-09-19 | Intel Corporation | Fine-grain object segmentation in video with deep features and multi-level graphical models |
US11120280B2 (en) * | 2019-11-15 | 2021-09-14 | Argo AI, LLC | Geometry-aware instance segmentation in stereo image capture processes |
EP3843038B1 (en) | 2019-12-23 | 2023-09-20 | HTC Corporation | Image processing method and system |
EP4084721A4 (en) * | 2019-12-31 | 2024-01-03 | Auris Health Inc | Anatomical feature identification and targeting |
CN111325204B (en) * | 2020-01-21 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN111339892B (en) * | 2020-02-21 | 2023-04-18 | 青岛联合创智科技有限公司 | Swimming pool drowning detection method based on end-to-end 3D convolutional neural network |
CN111640123B (en) * | 2020-05-22 | 2023-08-11 | 北京百度网讯科技有限公司 | Method, device, equipment and medium for generating background-free image |
CN111882558A (en) * | 2020-08-11 | 2020-11-03 | 上海商汤智能科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112529863B (en) * | 2020-12-04 | 2024-01-23 | 推想医疗科技股份有限公司 | Method and device for measuring bone mineral density |
CN112862840B (en) * | 2021-03-04 | 2023-07-04 | 腾讯科技(深圳)有限公司 | Image segmentation method, device, equipment and medium |
CN112991381B (en) * | 2021-03-15 | 2022-08-02 | 深圳市慧鲤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN113033504B (en) * | 2021-05-19 | 2021-08-27 | 广东众聚人工智能科技有限公司 | Multi-scale video anomaly detection method |
CN114511007B (en) * | 2022-01-17 | 2022-12-09 | 上海梦象智能科技有限公司 | Non-invasive electrical fingerprint identification method based on multi-scale feature perception |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100066761A1 (en) * | 2006-11-28 | 2010-03-18 | Commissariat A L'energie Atomique | Method of designating an object in an image |
US8233726B1 (en) | 2007-11-27 | 2012-07-31 | Googe Inc. | Image-domain script and language identification |
US8751530B1 (en) * | 2012-08-02 | 2014-06-10 | Google Inc. | Visual restrictions for image searches |
CN104077577A (en) | 2014-07-03 | 2014-10-01 | 浙江大学 | Trademark detection method based on convolutional neural network |
CN104573744A (en) | 2015-01-19 | 2015-04-29 | 上海交通大学 | Fine granularity classification recognition method and object part location and feature extraction method thereof |
US20150153559A1 (en) * | 2012-09-28 | 2015-06-04 | Canon Kabushiki Kaisha | Image processing apparatus, imaging system, and image processing system |
CN104992179A (en) | 2015-06-23 | 2015-10-21 | 浙江大学 | Fine-grained convolutional neural network-based clothes recommendation method |
CN105469047A (en) * | 2015-11-23 | 2016-04-06 | 上海交通大学 | Chinese detection method based on unsupervised learning and deep learning network and system thereof |
CN105488534A (en) | 2015-12-04 | 2016-04-13 | 中国科学院深圳先进技术研究院 | Method, device and system for deeply analyzing traffic scene |
CN105488468A (en) | 2015-11-26 | 2016-04-13 | 浙江宇视科技有限公司 | Method and device for positioning target area |
CN106097353A (en) | 2016-06-15 | 2016-11-09 | 北京市商汤科技开发有限公司 | The method for segmenting objects merged based on multi-level regional area and device, calculating equipment |
-
2016
- 2016-06-15 CN CN201610425391.0A patent/CN106097353B/en active Active
-
2017
- 2017-06-15 WO PCT/CN2017/088380 patent/WO2017215622A1/en active Application Filing
- 2017-12-28 US US15/857,304 patent/US10489913B2/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100066761A1 (en) * | 2006-11-28 | 2010-03-18 | Commissariat A L'energie Atomique | Method of designating an object in an image |
US8233726B1 (en) | 2007-11-27 | 2012-07-31 | Googe Inc. | Image-domain script and language identification |
US8751530B1 (en) * | 2012-08-02 | 2014-06-10 | Google Inc. | Visual restrictions for image searches |
US20150153559A1 (en) * | 2012-09-28 | 2015-06-04 | Canon Kabushiki Kaisha | Image processing apparatus, imaging system, and image processing system |
CN104077577A (en) | 2014-07-03 | 2014-10-01 | 浙江大学 | Trademark detection method based on convolutional neural network |
CN104573744A (en) | 2015-01-19 | 2015-04-29 | 上海交通大学 | Fine granularity classification recognition method and object part location and feature extraction method thereof |
CN104992179A (en) | 2015-06-23 | 2015-10-21 | 浙江大学 | Fine-grained convolutional neural network-based clothes recommendation method |
CN105469047A (en) * | 2015-11-23 | 2016-04-06 | 上海交通大学 | Chinese detection method based on unsupervised learning and deep learning network and system thereof |
CN105488468A (en) | 2015-11-26 | 2016-04-13 | 浙江宇视科技有限公司 | Method and device for positioning target area |
CN105488534A (en) | 2015-12-04 | 2016-04-13 | 中国科学院深圳先进技术研究院 | Method, device and system for deeply analyzing traffic scene |
CN106097353A (en) | 2016-06-15 | 2016-11-09 | 北京市商汤科技开发有限公司 | The method for segmenting objects merged based on multi-level regional area and device, calculating equipment |
Non-Patent Citations (13)
Title |
---|
Arbelaez, P. et al., "Boundary Extraction in Natural Images Using Ultrametric Contour Maps", Proceedings of the 2006 Conference on Computer Vision and Paitern Recognition Workshop (CYPRW' 06), vol.\, No.\, Dec. 31, 2006 (Dec. 31, 2006), ISSN:\, the whole document. |
B. Hariharan, P. A. Arbela'ez, R. B. Girshick, and J. Malik. Hypercolumns for object segmentation and fine-grained lo-calization. In CVPR, pp. 447-456, 2015. |
B. Hariharan, P. A. Arbela'ez, R. B. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, pp. 297-312, 2014. |
English Translation of International Search Report in international application No. PCT/CN2017/088380, dated Aug. 28, 2017. |
J. Dai, K. He, and J. Sun. Convolutional feature masking for joint object and stuff segmentation. In CVPR, pp. 3992-4000, 2015. |
J. Long, E Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431-3440, 2015. |
Meng Ning, "The Research of License Plate Characters Recognition Method in Natural Scenes", Software Engineering School of Information Engineering, pp. 1-69, May 2015. |
R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pp. 580-587, 2014. |
S.Ren,K.He,R.Girshick,andJ.Sun.Fasterr-cnn:Toward-s real-time object detection with region proposal networks. NIPS, 2015. |
Wu et al, (Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification, 2015 IEEE International Conference on Computer Vision, pp. 1287-1295) (Year: 2015). * |
Wu, Ruobinget al., "Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification", 2015 IEEE International Conference on Computer Vision,vol.\, No.\, Dec. 31, 2015 (Dec. 31, 2015), ISSN:\, chapters 2.1-2.5 and 3.4. |
Y. Chen, X. Liu, and M. Yang. Multi-instance object seg- mentation with occlusion handling. In CVPR, pp. 3470-3478, 2015. |
Z. Liu, X. Li, P. Luo, C. C. Loy, , and X. Tang. Seman-tic image segmentation via deep parsing network. In ICCV, 2015. |
Also Published As
Publication number | Publication date |
---|---|
CN106097353B (en) | 2018-06-22 |
US20180144477A1 (en) | 2018-05-24 |
CN106097353A (en) | 2016-11-09 |
WO2017215622A1 (en) | 2017-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10489913B2 (en) | Methods and apparatuses, and computing devices for segmenting object | |
CN111886603B (en) | Neural network for target detection and characterization | |
CN109508580B (en) | Traffic signal lamp identification method and device | |
JP7350878B2 (en) | Image analysis method, device, program | |
JP2022505762A (en) | Image Semantic Segmentation Network training methods, equipment, equipment and computer programs | |
US10650283B2 (en) | Electronic apparatus and control method thereof | |
CN110659664B (en) | SSD-based high-precision small object identification method | |
US9826166B2 (en) | Vehicular surrounding-monitoring control apparatus | |
CN108428248B (en) | Vehicle window positioning method, system, equipment and storage medium | |
CN109255181B (en) | Obstacle distribution simulation method and device based on multiple models and terminal | |
CN110675407A (en) | Image instance segmentation method and device, electronic equipment and storage medium | |
US11074671B2 (en) | Electronic apparatus and control method thereof | |
CN110910445B (en) | Object size detection method, device, detection equipment and storage medium | |
CN111461145A (en) | Method for detecting target based on convolutional neural network | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN116433903A (en) | Instance segmentation model construction method, system, electronic equipment and storage medium | |
CN110796130A (en) | Method, device and computer storage medium for character recognition | |
CN110874170A (en) | Image area correction method, image segmentation method and device | |
Somasundaram et al. | Straightening of highly curved human chromosome for cytogenetic analysis | |
US20220327811A1 (en) | System and method for composite training in machine learning architectures | |
CN113119996B (en) | Trajectory prediction method and apparatus, electronic device and storage medium | |
CN113033593A (en) | Text detection training method and device based on deep learning | |
CN112435293B (en) | Method and device for determining structural parameter representation of lane line | |
CN116051925B (en) | Training sample acquisition method, device, equipment and storage medium | |
JP2019125128A (en) | Information processing device, control method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: BEIJING SENSETIME TECHNOLOGY DEVELOPMENT CO., LTD, Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHI, JIANPING;REEL/FRAME:046330/0824 Effective date: 20171218 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PTGR); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |