WO2018100668A1 - Dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image - Google Patents

Dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image Download PDF

Info

Publication number
WO2018100668A1
WO2018100668A1 PCT/JP2016/085537 JP2016085537W WO2018100668A1 WO 2018100668 A1 WO2018100668 A1 WO 2018100668A1 JP 2016085537 W JP2016085537 W JP 2016085537W WO 2018100668 A1 WO2018100668 A1 WO 2018100668A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
image processing
image
scaled
score
Prior art date
Application number
PCT/JP2016/085537
Other languages
English (en)
Inventor
Karan Rampal
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Priority to US16/464,711 priority Critical patent/US11138464B2/en
Priority to PCT/JP2016/085537 priority patent/WO2018100668A1/fr
Priority to JP2019527275A priority patent/JP6756406B2/ja
Publication of WO2018100668A1 publication Critical patent/WO2018100668A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space

Definitions

  • the present invention relates to an image processing device, an image processing method, and an image processing program, and more particularly to an image processing device, an image processing method, and an image processing program to remove background for object recognition purpose.
  • Object recognition tasks have many practical uses such as in surveillance, biometrics etc.
  • the goal of these tasks is to output a label or a score indicating the level of similarity between a pair of input images containing the object of interest.
  • the object here can be person, vehicle, animal etc.
  • Metric learning is one of the most effective techniques to get the similarity scores.
  • the objective of this technique is to compute distance between the inputs by first projecting them into a feature space, which itself can be learnt or handcrafted. Next, a metric or a function is learnt which can compute distance in the new feature space by effectively separating similar features and dissimilar features by a given margin.
  • PTL 1 One of the methods for object recognition is by combining multiple metrics, see PTL 1.
  • PTL 1 multiple hand-crafted features are extracted from the images and then a number of similarity function such as Bhattacharya co-efficient, cosine similarity etc. are used. Finally to combine them a RankBoost algorithm is used. This gives high accuracy and combines the advantages of many metrics together.
  • PTL 2 Another method for scale estimation is using triangulated graphs, see PTL 2.
  • triangulated graphs are fitted inside the object (person, for example) by minimizing an energy function using dynamic programming. This describes the shape of the person.
  • This method also combines color information by using HSV color space to increase robustness of the method.
  • BTF brightness transfer function
  • NPL 1 handcrafted features called by Local Maximal Occurrence representation (LOMO) are computed for each image.
  • LOMO Local Maximal Occurrence representation
  • NPL 2 discloses the similarity between a pair of images in an end to end manner. This means that the entire pipeline of feature generation, extraction and metric learning is lumped together by learning a deep neural network. Also a contrastive loss function is proposed which helps improving on the discrimination ability.
  • Object recognition involves extracting features from the input images for the purpose of representing the object in a more descriptive space.
  • this space or its subspace a metric function or a distance function is learnt. This function can be used to compare the input images.
  • metric function or a distance function
  • This function can be used to compare the input images.
  • feature extraction we also have features which belong to the background and not just the input image. These background features can cause mismatches in the recognition result.
  • the metric that is learnt from these features is not robust enough, hence we need a technique that can remove such features early on in the learning process.
  • NPL 1 requires handcrafted features. These are features which are designed specifically for a particular application in mind. Such features perform very well for a certain application, however, they are not generalized well to other application areas.
  • the device disclosed in PTL 4 removes the unnecessary features by using depth information. It needs hardware or camera calibration information to find depth information in the image.
  • the method disclosed in PTL 5 uses dispersion of pixels to remove unnecessary (background) features. If dispersion of pixels is high it is considered as background and removed else that it is foreground. This method is not suitable for scenes with illumination variation since it assumes that low dispersion pixels are necessarily foreground.
  • One of the objects of the present invention is to provide an image processing device, an image processing method, and an image processing program that is capable of reducing the effect of the background on the similarity score or the similarity label of the object recognition task.
  • An image processing device includes: feature extraction means which obtains features in each of scaled samples of the region of interest in a probe image; saliency generation means which computes the probabilities of the pixels in the scaled samples that contribute to the score or the label of the object of interest in the region; and dropout processing means which removes the features from the scaled samples which are not essential for the computing the score or the label of the object, using the computed probabilities.
  • An image processing method includes the steps of: obtaining features in each of scaled samples of the region of interest in a probe image; computing the probabilities of the pixels in the scaled samples that contribute to the score or the label of the object of interest in the region; and removing the features from the scaled samples which are not essential for the computing the score or the label of the object, using the computed probabilities.
  • a non-transitory computer-readable recording medium having recorded therein an image processing program according to the present invention that, when executed by a computer, obtains features in each of scaled samples of the region of interest in a probe image, computes the probabilities of the pixels in the scaled samples that contribute to the score or the label of the object of interest in the region, and removes the features from the scaled samples which are not essential for the computing the score or the label of the object, using the computed probabilities.
  • the present invention is able to reduce the effect of the background on the similarity score or the similarity label of the object recognition task.
  • Fig. 1 is a block diagram showing an example of a structure of an image processing device 100 according to a first exemplary embodiment of the present invention.
  • Fig. 2 is a flowchart showing an example of an operation of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • Fig. 3 is a flowchart showing the estimation process of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • Fig. 4 is a flowchart showing the dropout process of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • Fig. 5 is a flowchart showing the saliency process of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • Fig. 1 is a block diagram showing an example of a structure of an image processing device 100 according to a first exemplary embodiment of the present invention.
  • Fig. 2 is a flowchart showing an example of an operation of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • FIG. 6 is a block diagram showing an example of a structure of an image processing device 10 according to a second exemplary embodiment of the present invention.
  • Fig. 7 is a block diagram showing an example of a hardware structure of a computer 1000 which is able to realize the image processing device according to the exemplary embodiments of the present invention.
  • the object recognition performance is affected by the background features especially in case of complex background scenes and hence need to be restrained.
  • Given the location of the object of interest in the image a number of scaled samples are generated. From these scaled samples features are extracted.
  • Using the object detector a saliency map is generated by taking the backpropagation of the detector output with respect to the input images.
  • With the help of the saliency map probabilities of the pixels which belong to object or background is computed.
  • this dropout is performed by removing neurons belonging to the features whose pixel probability is background.
  • a score can be obtained and the one target image with the highest score can be selected as the output.
  • Fig. 1 is a block diagram showing an example of a structure of an image processing device 100 according to the first exemplary embodiment of the present invention.
  • the image processing device 100 includes an input unit 101, an object detection unit 102, a feature extraction unit 103, a learning unit 104, a model storage unit 105, a saliency generation unit 106, a dropout processing unit 107, a feature matching unit 108, a parameter update unit 109, an output unit 110 and a training dataset storage unit 111.
  • the input unit 101 receives a series of frames i.e. images, for example, frames of a video, still images or the like, in tracking phase.
  • the input unit 101 may receive a series of frames i.e. training frames, for example, in learning phase or before the learning phase.
  • the frames and a frame in the frames may be referred to as "images” and an "image” respectively.
  • the training frames and a training frame in the training frames are referred to as "training images” and a "training image” respectively.
  • the object detection unit 102 detects a region of interest i.e. an object, such as a face or one of other objects which may include several parts, in the frames.
  • the object detection unit 102 detects a person in the frame. It provides the location of the person in the frame, i.e. the x and y co-ordinates of the upper-left and lower-right bounding box.
  • the object detection unit 102 may be referred to as "the object detector”.
  • the feature extraction unit 103 is used to extract the features from the region of interest that are provided to it by the object detection unit 102. Using the location provided by the object detection unit 102, the feature extraction unit 103 generates scaled samples. These samples are then normalized to lie in the same co-ordinate system. The coordinates are defined in a coordinate system set in advance in the frames. Finally, the features are extracted from these sample images. These features can be a combination of edge, texture, color, temporal, spatial and/or other higher level information or lower level information from the sample images.
  • the learning unit 104 learns the model by one or more series of training frames. More specifically, the learning unit 104 learns the model which will be used for computing the saliency map of the samples, by features extracted from training frames.
  • the learning unit 104 may compute the mean vector and the covariance matrix from the features of the samples as part of the parameter learning for the model. It may also compute the gradient of the object detector output with respect to the input image.
  • the model essentially captures the distribution of the features of the scaled samples. More specifically it captures the likelihood of an image pixel to belong to a particular label, which has been outputted by the object detector.
  • the object detector maximizes its output score such that the given input image matches the desired label, in our case the opposite is needed, given a label we need to generate an image that matches the label.
  • the model storage unit 105 is used to store the model’s parameters which are used for inference purpose and to evaluate the model on a given input.
  • the saliency generation unit 106 derives the probability of a pixel to belong to a particular label using the model parameters stored in the model storage unit 105.
  • the probability is computed by obtaining the gradient of the output from the object detector with respect to a random image. This random image is then iteratively updated till finally the pixels in this image depict the probability. This procedure produces the required saliency map iteratively.
  • the saliency map produced in the saliency generation unit 106 is the input of the dropout processing unit 107.
  • each of the features of the samples are directly associated with their probability from the saliency map. If the probability of the feature is low, that is it belongs to the background class, then it is removed or dropped out. If the feature belongs to the object then the feature is rescaled using the probability. This produces the final features which will be used for matching.
  • the feature matching unit 108 selects the sample with the highest score by comparing the features of the target image and the features of the probe image. The features of the probe image at this scale are matched with the features of the enrolled target images. For each of the target images a score is generated by the feature matching unit 108. The model parameters are updated by the parameter update unit 109.
  • the estimation processing unit a set of the feature extraction unit 103, the saliency generation unit 106, the dropout processing unit 107, and the feature matching unit 108 may be referred to as "the estimation processing unit”.
  • the output unit 110 outputs the final target image or the ID.
  • the output unit 110 may plot predetermined marks representing the ID on the frame at some predetermined positions represented by the x, y coordinates and the scale (width, height) of the object bounding box in the output which is the frame with the plotted marks.
  • the training dataset storage unit 111 stores one or more series of training samples which contain target image and probe image pairs and a label indicating whether they are the same object or not.
  • the input unit 101 may not be used to achieve the training dataset storage unit 111.
  • Fig. 2 is a flowchart showing an example of the operation of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • the operation of the image processing device 100 according to the first exemplary embodiment of the present invention can be broadly divided into training phase and evaluation phase.
  • the recognition of the object begins by detection of object in an image or a frame.
  • the input to the system (Step S101) is an image of the object called the probe image.
  • the object detection unit 102 performs a check to find out if there exists a target image (Step S102).
  • the target image is also called the enrolled images in a gallery. If no target image has been selected (NO in Step S102), then an image from the gallery is selected (Step S103) which has been enrolled previously.
  • the object detection unit 102 may be a specific implementation of a general object detector.
  • the detected object and the selected target image are now provided to the estimation processing unit (Step S104).
  • Step S104 When a target image exists (YES in Step S102), then it is directly given to Step S104. After that the output score is generated (Step S105) by the estimation processing unit. Finally if all the target images have been compared then the processing is finished.
  • This unit scores each of the samples generated from the current frame and the target image and output is the one which has the maximum score.
  • the output unit 110 outputs the estimated ID or the estimated label and score i.e. the final output described above (Step S105).
  • the input unit 101 receives a next frame (Step S101).
  • processing of the image processing device 100 is finished by an instruction from a user of the image processing device 100 via a input device (not illustrated) (YES in Step S106) or if all the target images have been processed, the image processing device 100 stops the processing shown in Fig. 2.
  • Fig. 3 is a flowchart showing an example of an operation of the image processing device 100 according to the first exemplary embodiment in the estimation processing phase.
  • Step S201 Given the target image and the samples which are the scaled probe image or the scaled query image are generated by the Step S201. These samples are extracted around the region given by the object location and the scale provided by the object detector. Next, the features are extracted from theses samples (Step S202). Extracted features refer to features such as HOG (Histogram of Oriented Gradients), LBP (Local Binary Patterns), normalized gradients, color histograms etc.
  • Step S203 we perform the dropout processing. This will be explained in detail later using the Fig. 4.
  • Step S206 if we are in the training phase (YES in Step S204) it means we need to remove the features of the background using the mask from Step S203 (Step S206).
  • the mask is the output of the dropout processing in Step S203. This is done using elementwise multiplication of the mask with the feature map generated in the feature extraction (Step S202). Using reduced features from the feature map we get the score using the feature matching (Step S207). Finally, we can select the maximum score from all the scales as the final output (Step S208). In case of NO in Step S204, there is no need for the masking operation, all we need to do is perform the forward pass (the classifier forward pass) that is send the data as it is undisturbed (Step S205). This procedure is explained in more detail below:
  • a left member is the mean or the average of the samples. It is one of the parameters that is used for normalizing the features before the actual dropout procedure.
  • the ‘x i ’ is the vector of features of the i th sample and ‘N’ is the total number of scaled samples.
  • Equation (2) ‘V’ is the variance of the feature vectors. Using these two equations we can normalize the features to have zero mean and unit variance. The normalization is done by using the following equation:
  • Step S203 After the features have been normalized they are passed to the dropout processing (Step S203). This procedure will be explained in more detail here using Fig. 4 as reference.
  • Fig. 4 is a flowchart of the dropout processing step of the image processing device 100 according the first exemplary embodiment of the present invention.
  • the first step in dropout processing is saliency processing (Step S301). This is used to produce a saliency map and will be explained in detail later with the help of Fig. 5.
  • the saliency map is used to get the pixel probabilities.
  • Step S302 we get the feature map. It is just the features which were extracted in Step S202, but resized to form a 3 dimensional map.
  • Step S303 the entries of the saliency map are checked at each pixel location, i.e. x and y axis, against a threshold. This threshold is selected beforehand. If the probability in the saliency map is greater than the threshold ‘T’ (YES in Step S303) then the corresponding features are just renormalized (Step S306) using the Equation (4) below:
  • Step S303 If the probability is not greater than the threshold ‘T’ (NO in Step S303), then the corresponding feature is removed by setting it to zero in Step S304. Next, the feature map is updated by reshaping the map again back to its original dimensions instead of the 3 dimensions as in Step S302 (Step S305).
  • Step S301 the saliency map generated in Step S301 is stored in the model storage unit 105 according to the Step S307.
  • the image processing device 100 stops the dropout processing shown in Fig. 4.
  • the saliency processing step will be explained using Fig. 5.
  • Fig. 5 is a flowchart showing the saliency process of the image processing device 100 according to the first exemplary embodiment of the present invention.
  • the scaled samples are used as input; shown in Step S401.
  • Step S402 a saliency map is initialized by random values using a Gaussian distribution with mean zero and unit variance. This is shown in the Equation (5) below:
  • Equation (6) represents the classifier forward pass i.e. computing the class label when given an input image which is the randomly initialized saliency map.
  • Equation (6) ‘L’ is classifier function which takes the input the image ‘I’, also ‘c’ is a constant that is used for regularization of the maximization.
  • Step S404 The next step is the classifier backward pass (Step S404), using this step we get the gradients of the Equation (6) with respect to the input saliency map image. This step provides us with the direction in which we should update the saliency map image so that we can maximize the Equation (6).
  • Step S404 is implemented using the equation below:
  • Equation (7) ‘ ⁇ L’ is the gradient of the classifier function with respect to the saliency map image and ‘a’ is a constant which controls the step size.
  • ‘I’ is the updated saliency map of Step S405.
  • the algorithm has converged and saliency processing can be stopped. However if the loss is still not sufficiently low enough (NO in Step S407), then we again perform the steps from Step S403. These steps are repeated until the saliency map image has low loss and the algorithm has converged.
  • the features are re-normalized once again.
  • the feature matching step can be performed.
  • the matching can be done using the kernel methods such as intersection kernel, Gaussian kernel, polynomial kernel etc.
  • Equation (8) gives the matching score ‘r’ between the features of the target image ‘I’ and the feature of the probe image ‘x’.
  • ‘d’ is the dimension length of the features
  • ‘j’ is the dimension index. The target image with the lowest score is selected.
  • One of the objects of the present invention is to provide an image processing device that is capable of object recognition accurately and reducing the effect of the background on the similarity score or the distance score.
  • the first advantageous effect of the present exemplary embodiment is that it is able to estimate the object accurately and reduce the effect of the background on the recognition score.
  • the advantage of the present exemplary embodiment is that, multiple metrics can still be used with this method, like PTL 1 which combines many metrics together.
  • This image processing device can be used to reduce the background effect which will improve the performance of each metric.
  • model parameter does not require handcrafted features unlike in NPL 1 and PTL 2.
  • Handcrafted features limit the applicability of the technique and decreases generalizability.
  • This image processing device can be utilized with any technique which requires background removal.
  • An additional advantageous effect of the present exemplary embodiment is that there is no need to compute the projection matrix and hence no need for the camera calibration information unlike PTL 3.
  • An additional advantageous effect of the present exemplary embodiment is that similar to NPL 2 the learning is end to end and given an input image pair outputs the similarity score directly.
  • the distance function for this image processing device is not limited to Euclidean distance.
  • the device disclosed in PTL 4 and the method disclosed in PTL 5 are deterministic and not probabilistic.
  • the present exemplary embodiment is a probabilistic method and needs a probability map which is provided by the saliency generation unit 106.
  • the present exemplary embodiment does not need any hardware or calibration information, and has no assumption in PTL 5.
  • Fig. 6 is a block diagram showing an example of a structure of an image processing device 10 according to the second exemplary embodiment of the present invention.
  • the image processing device 10 includes: a feature extraction unit 11 (the feature extraction unit 103, for example) which obtains features in each of scaled samples of the region of interest in a probe image; a saliency generation unit 12 (the saliency generation unit 106, for example) which computes the probabilities of the pixels in the scaled samples that contribute to the score or the label of the object of interest in the region; a dropout processing unit 13 (the dropout processing unit 107, for example) which removes the features from the scaled samples which are not essential for the computing the score or the label of the object, using the computed probabilities.
  • a feature extraction unit 11 the feature extraction unit 103, for example
  • a saliency generation unit 12 the saliency generation unit 106, for example
  • a dropout processing unit 13 the dropout processing unit 107, for example
  • the image processing device can reduce the effect of the background on the similarity score or the similarity label of the object recognition task.
  • the second exemplary embodiment has the same advantageous effect as the first advantageous effect of the first exemplary embodiment.
  • the reason that the advantageous effect is the same as that of the first advantageous effect of the first exemplary embodiment is because the fundamental principal is the same in both embodiments.
  • the image processing device 10 may include a feature matching unit (the feature matching unit 108, for example) which obtains the similarity between a given target image and a scaled sample of the probe image and selects the scaled sample with the maximum similarity as the final output.
  • a feature matching unit the feature matching unit 108, for example
  • the image processing device can output the scaled sample with the maximum similarity.
  • the dropout processing unit 13 may generate the mask for removing the features which are not essential for the computing the score or the label of the object, using the computed probabilities, and removes the features from the scaled samples, using the generated mask.
  • neurons which belong to the background pixels are computed and a mask is generated which can be threshold for the features belonging to such pixels.
  • the image processing device can remove the features from the scaled samples using the generated mask.
  • the image processing device 10 may include a learning unit (the learning unit 104, for example) which learns the models parameters by one or more series of training samples which contain target image and probe image pairs and a label indicating whether they are the same object or not.
  • a learning unit the learning unit 104, for example
  • the image processing device can learn a relation between target image and probe image.
  • the image processing device 10 may include a feature map updating unit (the dropout processing unit 107, for example) which updates feature map by applying the mask generated by the dropout processing unit 13 for removing the features whose pixels result in a saliency map with low probability.
  • a feature map updating unit the dropout processing unit 107, for example
  • the image processing device can update feature map using the mask.
  • the image processing device 10 may include a feature normalization unit (the dropout processing unit 107, for example) which normalizes the remaining features again after removing the features by the dropout processing unit 13.
  • a feature normalization unit the dropout processing unit 107, for example
  • the image processing device can perform the feature matching step using the kernel methods.
  • Each of the image processing device 100 and the image processing device 10 can be implemented using a computer and a program controlling the computer, dedicated hardware, or a set of a computer and a program controlling the computer and a dedicated hardware.
  • Fig. 7 is a block diagram showing an example of a hardware structure of a computer 1000 which is able to realize the image processing device 100 and the image processing device 10, which are described above.
  • the computer 1000 includes a processor 1001, a memory 1002, a storage device 1003 and an interface 1004, which are communicably connected via a bus 1005.
  • the computer 1000 can access storage medium 2000.
  • Each of the memory 1002 and the storage device 1003 may be a storage device, such as a RAM (Random Access Memory), a hard disk drive or the like.
  • the storage medium 2000 may be a RAM, a storage device such as a hard disk drive or the like, a ROM (Read Only Memory), or a portable storage medium.
  • the storage device 1003 may operate as the storage medium 2000.
  • the processor 1001 can read data and a program from the memory 1002 and the storage device 1003, and can write data and a program in the memory 1002 and the storage device 1003.
  • the processor 1001 can communicate with a server (not illustrated) which provides frames for the processor 1001, a terminal (not illustrated) to output the final output shape, and the like over the interface 1004.
  • the processor 1001 can access the storage medium 2000.
  • the storage medium 2000 stores a program that causes the computer 1000 operates as the image processing device 100 or the image processing device 10.
  • the processor 1001 loads the program, which causes the computer 1000 operates as the image processing device 100 or the image processing device 10, stored in the storage medium 2000 into the memory 1002.
  • the processor 1001 operates as the image processing device 100 or the image processing device 10 by executing the program loaded in the memory 1002.
  • the input unit 101, the object detection unit 102, the feature extraction unit 103, the learning unit 104, the saliency generation unit 106, the feature matching unit 108, the dropout processing unit 107 and the output unit 110 can be realized by a dedicated program that is loaded in the memory 1002 from the storage medium 2000 and can realize each of the above-described units, and the processor 1001 which executes the dedicated program.
  • the model storage unit 105, the parameter update unit 109 and the training dataset storage unit 111 can be realized by the memory 1002 and/or the storage device 1003 such as a hard disk device or the like.
  • a part of or all of the input unit 101, the object detection unit 102, the feature extraction unit 103, the learning unit 104, the model storage unit 105, the saliency generation unit 106, the dropout processing unit 107, the feature matching unit 108, the parameter update unit 109, the output unit 110 and the training dataset storage unit 111 can be realized by a dedicated circuit that realizes the functions of the above-described units.
  • An image processing method comprising the steps of: obtaining features in each of scaled samples of the region of interest in a probe image; computing the probabilities of the pixels in the scaled samples that contribute to the score or the label of the object of interest in the region; removing the features from the scaled samples which are not essential for the computing the score or the label of the object, using the computed probabilities.
  • Supplementary note 2 The image processing method according to Supplementary note 1, comprising the steps of: generating the mask for removing the features which are not essential for the computing the label or the score of the object and applying the mask for removing the features whose pixels result in a saliency map with low probability.
  • the neurons which belong to the background pixels are computed and a mask is generated which can be threshold for the features belonging to such pixels.
  • Supplementary note 3 The image processing method according to Supplementary note 1 or 2, comprising the steps of: learning the models parameters by one or more series of training samples which contain target image and probe image pairs and a label indicating whether they are the same object or not.
  • Supplementary note 4 The image processing method according to any one of Supplementary note 1 to 3, comprising the steps of: obtaining scaled samples of the image from the given region of interest.
  • Supplementary note 5 The image processing method according to any one of Supplementary note 1 to 4, comprising the steps of: normalizing the remaining features again after removing the features.
  • a non-transitory computer-readable recording medium having recorded therein an image processing program that, when executed by a computer, obtains features in each of scaled samples of the region of interest in a probe image, computes the probabilities of the pixels in the scaled samples that contribute to the score or the label of the object of interest in the region, and removes the features from the scaled samples which are not essential for the computing the score or the label of the object, using the computed probabilities.
  • Supplementary note 7 A non-transitory computer-readable recording medium according to Supplementary note 6, the image processing program when executed by the computer, generates the mask for removing the features which are not essential for the computing the label or the score of the object, and applies the mask for removing the features whose pixels result in a saliency map with low probability.
  • the neurons which belong to the background pixels are computed and a mask is generated which can be threshold for the features belonging to such pixels.
  • Supplementary note 8 A non-transitory computer-readable recording medium according to Supplementary note 6 or 7, the image processing program when executed by the computer, learns the models parameters by one or more series of training samples which contain target image and probe image pairs and a label indicating whether they are the same object or not.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un dispositif de traitement d'image (10) comprenant : une unité d'extraction de caractéristiques (11) qui obtient des caractéristiques dans chacun des échantillons mis à l'échelle de la zone d'intérêt dans une image de sonde ; une unité de génération de saillance (12) qui calcule les probabilités des pixels dans les échantillons mis à l'échelle qui contribuent au score ou à l'étiquette de l'objet d'intérêt dans la zone ; et une unité de traitement de perte (13) qui supprime les caractéristiques des échantillons mis à l'échelle qui ne sont pas essentiels au calcul du score ou de l'étiquette de l'objet, à l'aide des probabilités calculées.
PCT/JP2016/085537 2016-11-30 2016-11-30 Dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image WO2018100668A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/464,711 US11138464B2 (en) 2016-11-30 2016-11-30 Image processing device, image processing method, and image processing program
PCT/JP2016/085537 WO2018100668A1 (fr) 2016-11-30 2016-11-30 Dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image
JP2019527275A JP6756406B2 (ja) 2016-11-30 2016-11-30 画像処理装置、画像処理方法および画像処理プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/085537 WO2018100668A1 (fr) 2016-11-30 2016-11-30 Dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image

Publications (1)

Publication Number Publication Date
WO2018100668A1 true WO2018100668A1 (fr) 2018-06-07

Family

ID=62242396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/085537 WO2018100668A1 (fr) 2016-11-30 2016-11-30 Dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image

Country Status (3)

Country Link
US (1) US11138464B2 (fr)
JP (1) JP6756406B2 (fr)
WO (1) WO2018100668A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165673A (zh) * 2018-07-18 2019-01-08 广东工业大学 基于度量学习和多示例支持向量机的图像分类方法
WO2022102534A1 (fr) * 2020-11-12 2022-05-19 日東電工株式会社 Dispositif de détermination, procédé de détermination, et programme de détermination

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6756406B2 (ja) * 2016-11-30 2020-09-16 日本電気株式会社 画像処理装置、画像処理方法および画像処理プログラム
CN111666952B (zh) * 2020-05-22 2023-10-24 北京腾信软创科技股份有限公司 一种基于标签上下文的显著区域提取方法及系统
CN112037167B (zh) * 2020-07-21 2023-11-24 苏州动影信息科技有限公司 一种基于影像组学和遗传算法的目标区域确定系统
CN112926598B (zh) * 2021-03-08 2021-12-07 南京信息工程大学 基于残差域深度学习特征的图像拷贝检测方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090202124A1 (en) * 2006-10-11 2009-08-13 Olympus Corporation Image processing apparatus, image processing method, and computer program product
US20120301015A1 (en) * 2011-05-23 2012-11-29 Ntt Docomo, Inc. Image identification device, image identification method and recording medium
US20140205206A1 (en) * 2013-01-24 2014-07-24 Mayur Datar Systems and methods for resizing an image

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2375908B (en) * 2001-05-23 2003-10-29 Motorola Inc Image transmission system image transmission unit and method for describing texture or a texture-like region
US7362920B2 (en) * 2003-09-22 2008-04-22 Siemens Medical Solutions Usa, Inc. Method and system for hybrid rigid registration based on joint correspondences between scale-invariant salient region features
US7711146B2 (en) 2006-03-09 2010-05-04 General Electric Company Method and system for performing image re-identification
WO2009101153A2 (fr) * 2008-02-13 2009-08-20 Ubisoft Entertainment S.A. Capture d'image en prises réelles
JP2011253354A (ja) 2010-06-02 2011-12-15 Sony Corp 画像処理装置および方法、並びにプログラム
US9501714B2 (en) 2010-10-29 2016-11-22 Qualcomm Incorporated Systems and methods to improve feature generation in object recognition
JP6103243B2 (ja) * 2011-11-18 2017-03-29 日本電気株式会社 局所特徴量抽出装置、局所特徴量抽出方法、及びプログラム
US9396412B2 (en) 2012-06-21 2016-07-19 Siemens Aktiengesellschaft Machine-learnt person re-identification
US9633263B2 (en) 2012-10-09 2017-04-25 International Business Machines Corporation Appearance modeling for object re-identification using weighted brightness transfer functions
US9514536B2 (en) * 2012-10-10 2016-12-06 Broadbandtv, Corp. Intelligent video thumbnail selection and generation
JP6541363B2 (ja) * 2015-02-13 2019-07-10 キヤノン株式会社 画像処理装置、画像処理方法およびプログラム
MY191808A (en) * 2015-03-27 2022-07-16 Nissan Motor Shared vehicle management apparatus and shared vehicle management method
CN106296638A (zh) * 2015-06-04 2017-01-04 欧姆龙株式会社 显著性信息取得装置以及显著性信息取得方法
US9881234B2 (en) * 2015-11-25 2018-01-30 Baidu Usa Llc. Systems and methods for end-to-end object detection
US9830529B2 (en) * 2016-04-26 2017-11-28 Xerox Corporation End-to-end saliency mapping via probability distribution prediction
JP6756406B2 (ja) * 2016-11-30 2020-09-16 日本電気株式会社 画像処理装置、画像処理方法および画像処理プログラム
US10037601B1 (en) * 2017-02-02 2018-07-31 International Business Machines Corporation Systems and methods for automatic detection of architectural distortion in two dimensional mammographic images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090202124A1 (en) * 2006-10-11 2009-08-13 Olympus Corporation Image processing apparatus, image processing method, and computer program product
US20120301015A1 (en) * 2011-05-23 2012-11-29 Ntt Docomo, Inc. Image identification device, image identification method and recording medium
US20140205206A1 (en) * 2013-01-24 2014-07-24 Mayur Datar Systems and methods for resizing an image

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165673A (zh) * 2018-07-18 2019-01-08 广东工业大学 基于度量学习和多示例支持向量机的图像分类方法
CN109165673B (zh) * 2018-07-18 2021-08-31 广东工业大学 基于度量学习和多示例支持向量机的图像分类方法
WO2022102534A1 (fr) * 2020-11-12 2022-05-19 日東電工株式会社 Dispositif de détermination, procédé de détermination, et programme de détermination

Also Published As

Publication number Publication date
JP6756406B2 (ja) 2020-09-16
JP2019536164A (ja) 2019-12-12
US20190311216A1 (en) 2019-10-10
US11138464B2 (en) 2021-10-05

Similar Documents

Publication Publication Date Title
US11138464B2 (en) Image processing device, image processing method, and image processing program
CN110020592B (zh) 物体检测模型训练方法、装置、计算机设备及存储介质
US10467459B2 (en) Object detection based on joint feature extraction
US9928405B2 (en) System and method for detecting and tracking facial features in images
KR100647322B1 (ko) 객체의 모양모델 생성장치 및 방법과 이를 이용한 객체의특징점 자동탐색장치 및 방법
US20210264144A1 (en) Human pose analysis system and method
CN110211157B (zh) 一种基于相关滤波的目标长时跟踪方法
Zhao et al. Closely coupled object detection and segmentation
US11380010B2 (en) Image processing device, image processing method, and image processing program
Xia et al. Loop closure detection for visual SLAM using PCANet features
JP5591360B2 (ja) 分類及び対象物検出の方法及び装置、撮像装置及び画像処理装置
US10657625B2 (en) Image processing device, an image processing method, and computer-readable recording medium
Demirkus et al. Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos
WO2016038647A1 (fr) Dispositif de traitement d'image, procédé de traitement d'image, et support de stockage stockant un programme de celui-ci
CN109685830B (zh) 目标跟踪方法、装置和设备及计算机存储介质
US11176455B2 (en) Learning data generation apparatus and learning data generation method
CN115063526A (zh) 二维图像的三维重建方法、系统、终端设备及存储介质
Scharfenberger et al. Salient region detection using self-guided statistical non-redundancy in natural images
CN110751163A (zh) 目标定位方法及其装置、计算机可读存储介质和电子设备
Zhang et al. Augmented visual feature modeling for matching in low-visibility based on cycle-labeling of Superpixel Flow
Hassan et al. Salient object detection based on CNN fusion of two types of saliency models
Martinez et al. Facial landmarking for in-the-wild images with local inference based on global appearance
Jacques et al. Improved head-shoulder human contour estimation through clusters of learned shape models
Gao et al. Real-time multi-view face detection based on optical flow segmentation for guiding the robot
Martins et al. Texture collinearity foreground segmentation for night videos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16923119

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019527275

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16923119

Country of ref document: EP

Kind code of ref document: A1