WO2019205729A1 - 用于识别物体的方法、设备和计算机可读存储介质 - Google Patents
用于识别物体的方法、设备和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2019205729A1 WO2019205729A1 PCT/CN2019/070207 CN2019070207W WO2019205729A1 WO 2019205729 A1 WO2019205729 A1 WO 2019205729A1 CN 2019070207 W CN2019070207 W CN 2019070207W WO 2019205729 A1 WO2019205729 A1 WO 2019205729A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- classification
- feature
- candidate
- determining
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/87—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using selection of the recognition techniques, e.g. of a classifier in a multiple classifier system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Definitions
- the present disclosure relates to the field of artificial intelligence, and more particularly to methods, devices, and computer readable storage media for identifying objects.
- a method for identifying an object includes: determining a candidate classification of the object using a first neural network; and determining a classification of the object using a second neural network respectively corresponding to the candidate classification in response to determining a candidate classification of the object .
- the step of determining a candidate classification of the object using the first neural network comprises: using a first neural network to determine a first feature similarity of the object to the classified reference data; A classification corresponding to the first feature similarity of the first feature similarity that is greater than or equal to the predetermined threshold is determined as the candidate classification of the object.
- the step of determining a first feature similarity of the object to the classified reference data using the first neural network comprises: determining a first object feature vector of the object using the first neural network; Calculating a first feature similarity between the first object feature vector and the first reference feature vector, wherein the first reference feature vector is determined by the first neural network based on the classified reference data of.
- the predetermined threshold is a threshold uniformly set for all first feature similarities, or the predetermined threshold is a threshold respectively set for the first feature similarity, and the thresholds can be set independently of each other .
- the method further comprises determining the one candidate classification as a classification of the object in response to determining only one candidate classification of the object.
- the method further comprises providing an output indicating that the object cannot be identified in response to not determining any candidate classification of the object.
- the step of determining a classification of the object using a second neural network respectively corresponding to the candidate classification comprises: for each of the candidate classifications a candidate classification, based on the first object feature vector, using a second neural network to determine a second object feature vector of the object associated with the candidate class; respectively calculating a difference between the object and the candidate class a second feature similarity between the associated second object feature vector and the corresponding second reference feature vector, wherein the second reference feature vector is determined by the second neural network based on the first reference feature vector And determining a classification corresponding to the largest second feature similarity in the second feature similarity as the classification of the object.
- the second neural network is trained to use two samples belonging to a classification corresponding to the second neural network as a positive sample pair, the expected output value being a positive reference value; a sample of the classification corresponding to the second neural network and a sample not belonging to the classification corresponding to the second neural network as a negative sample pair, the expected output value being a negative reference value; and using the corresponding second The squared error between the calculated value of the feature similarity and the expected output value is used as a loss function.
- the first neural network is a convolutional neural network with a fully connected layer for final classification removed, the second neural network being a single layer fully connected neural network.
- an apparatus for identifying an object includes: a processor; a memory storing instructions that, when executed by the processor, cause the processor to: determine a candidate classification of the object using a first neural network; and responsive to determining the object The candidate classification uses a second neural network respectively corresponding to the candidate classification to determine the classification of the object.
- the instructions when executed by the processor, further cause the processor to: determine, using the first neural network, a first feature similarity of the object to the classified reference data; A classification corresponding to the first feature similarity of the feature similarity that is greater than or equal to the predetermined threshold is determined as the candidate classification of the object.
- the instructions when executed by the processor, further cause the processor to: determine a first object feature vector of the object using a first neural network; and calculate the first object feature vector and A first feature similarity between the first reference feature vectors, wherein the first reference feature vector is determined by the first neural network based on the classified reference data.
- the predetermined threshold is a threshold uniformly set for all first feature similarities, or the predetermined threshold is a threshold respectively set for the first feature similarity, and the thresholds can be set independently of each other .
- the instructions when executed by the processor, further cause the processor to determine the one candidate classification as a classification of the object in response to determining only one candidate classification of the object.
- the instructions when executed by the processor, further cause the processor to output a message indicating that the object cannot be identified in response to not determining any candidate classification of the object.
- the instructions when executed by the processor, further cause the processor to: use a respective second neural network based on the first object feature vector for each candidate classification in the candidate classification Determining a second object feature vector of the object associated with the candidate class; respectively calculating a second object feature vector of the object associated with the candidate class and a corresponding second reference feature vector a feature similarity, wherein the second reference feature vector is separately determined by the second neural network based on a first reference feature vector; and is similar to a largest second feature of the second feature similarity
- the corresponding classification of degrees is determined as the classification of the object.
- the second neural network is trained to use two samples belonging to a classification corresponding to the second neural network as a positive sample pair, the expected output value being a positive reference value; a sample of the classification corresponding to the second neural network and a sample not belonging to the classification corresponding to the second neural network as a negative sample pair, the expected output value being a negative reference value; and using the corresponding second The squared error between the calculated value of the feature similarity and the expected output value is used as a loss function.
- the first neural network is a convolutional neural network with a fully connected layer for final classification removed, the second neural network being a single layer fully connected neural network.
- a computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method according to the first aspect of the present disclosure.
- FIG. 1 is a flow chart illustrating an example method for identifying an object in accordance with an embodiment of the present disclosure.
- FIG. 2 is a flow chart showing an example method for determining candidate classifications for an object, in accordance with an embodiment of the disclosure.
- FIG. 3 illustrates a block diagram of an example of determining candidate classifications for an object, in accordance with an embodiment of the present disclosure.
- FIG. 4 is a flow diagram showing an example method for determining a classification of an object using a second neural network based on candidate classifications of objects, in accordance with an embodiment of the disclosure.
- FIG. 5 is a block diagram illustrating an example for determining a classification of an object using a second neural network according to candidate classification of an object, according to an embodiment of the present disclosure.
- FIG. 6 is a block diagram showing an example hardware arrangement of an example device for identifying an object, in accordance with an embodiment of the present disclosure.
- a commonly used object recognition technique is to construct an object classifier, for example, using a convolutional neural network (CNN).
- CNN convolutional neural network
- Different types of goods can be identified by the object classifier.
- this method has a high recognition accuracy but a poor scalability. For example, if you add a new product category, you will spend a lot of time redesigning and training the entire classifier, which will not meet the needs of supermarkets to quickly introduce new products.
- a two-level object recognition method capable of online expansion is proposed. It generally involves the following two steps: First, the first neural network is used to determine the feature vector of the image of the object, and the feature matching is used to perform the rough recognition of the object classification. In the case that a plurality of confusing candidate object classifications are identified in the rough recognition process, the second neural network for these specific categories may be used to further identify and contrast the plurality of candidate object classifications, thereby obtaining more accurate recognition results. . In the case of adopting such a scheme, high recognition accuracy and high scalability can be achieved at the same time.
- embodiments of the present disclosure are described in the context of object image recognition, embodiments of the present disclosure are not limited thereto.
- the concepts described in the embodiments of the present disclosure can be employed as long as it is required to distinguish between multiple categories and require high scalability while involving neural network-based needs.
- user behavior characteristics may be targeted (eg, based on user behavior to help a website or application developer distinguish between their user categories, such as malicious users, loyal users, infrequently accessed Users, frequent users, etc.).
- CNN convolutional neural network
- ConvNet convolutional neural network
- the mode of connection between the neurons of the neural network is inspired by the animal visual cortex.
- a single neuron responds to a stimulus in a limited area of space, which is the aforementioned receptive field.
- the respective receptive fields of different neurons partially overlap each other so that they form the entire field of view.
- the response of a single neuron to its receptive field stimuli can be mathematically approximated by convolution operations. Therefore, convolutional neural networks have a wide range of applications in the fields of image and video recognition, recommended fields (for example, product recommendation of shopping websites, etc.), and natural language processing.
- the convolutional neural network may generally include a plurality of functional layers, such as convolutional layers, fully connected layers, etc., as will be described in detail below, which may be gradually changed from local feature capture by stacking of multiple convolutional layers/fully connected layers or the like. Capture for global features and ultimately get recognition/classification results.
- the first convolutional layer of the convolutional neural network may learn features such as eye color, eye contour, eyelashes, nose contour, nose shadow, mouth.
- Fine (or very local) features such as contours, mouth colors, etc., and the second convolutional layer can be the eye for the output of the first convolutional layer (based on, for example, eye color, eye contour, eyelashes, etc.) Identification), nose (determined according to, for example, nose contour, nose shadow, etc.), mouth (determined according to, for example, mouth contour, mouth color, etc.) and other features of the facial organs, which are globally larger than the first A feature learned by a convolutional layer.
- the third convolutional layer or fully connected layer can learn more global features such as faces (determined according to eyes, nose, mouth, etc.) according to the output of the second convolutional layer, and finally determine the image. There is a position of a feature point of a face or a face. Of course, the present disclosure is not limited thereto. In addition, a detailed description of the fully connected layer will be given later and will not be discussed in detail herein.
- the features learned by CNN are usually not semantic features understood by humans, but rather an abstract feature that is usually complete for humans. Unable to understand. But by combining these features together, the computer can determine that this is a face and various parts of the face.
- the criterion for a person to judge a face may be to see if there are any human eyes, nose, or mouth in the image; but another person may select a feature that has human eyebrows and chin in the image; Some people may be even more strange, will go to see if there are glasses, masks, earrings, etc. in this image to determine whether there is a human face.
- the convolutional neural network perhaps the strangest "person” may use a series of features that humans can't describe in words to determine whether it is a face or a part of a face, for example, certain Pixel combination.
- the convolutional layer is the core building block of CNN.
- the parameters of this layer consist of a collection of learnable convolution kernels (or simply convolution kernels), each with a small receptive field, but extending over the entire depth of the input data.
- each convolution kernel is convolved along the width and height of the input data, the dot product between the elements of the convolution kernel and the input data is calculated, and a two-dimensional activation map of the convolution kernel is generated.
- the network is able to learn the convolution kernel that can be activated when a particular type of feature is seen at a spatial location of the input.
- the convolution calculation result is as follows (1):
- the convolution kernel is a convolution kernel for identifying a particular object (eg, an eye)
- the possibility of the object appearing in the upper right side of the output on the right side of the equation relative to the lower left side can be seen. Bigger. As described above, by stacking multiple convolution layers, the process of feature recognition gradually evolves from local to global.
- a convolutional neural network can be achieved via a fully connected layer after multiple convolutional layers.
- the fully connected layer is actually a special convolutional layer whose convolution kernel has a full connection to all elements of the previous layer output, which is the same as a conventional neural network. Therefore, matrix multiplication can be used directly for it.
- the output of the fully connected layer can be a one-dimensional array, where each element can represent a likelihood indicator that the image is classified into a certain category.
- the output can be used, for example, to determine whether a face exists in the image, the gender of the face, the race, age, etc., and the disclosure is not limited thereto.
- FIG. 1 is a flow chart showing an example method 100 for identifying an object in accordance with an embodiment of the present disclosure.
- method 100 can include steps S110 and S120.
- steps S110 and S120 can include other more steps, sub-steps, etc., or steps S110 and/or S120 may be replaced with steps or sub-steps that implement the same or similar functions.
- step S110 the first neural network may be used to determine a candidate classification of the object.
- the first neural network is mainly used as a general object identifier to perform preliminary classification or rough classification on the object.
- the first neural network can be a Convolutional Neural Network (CNN).
- CNN Convolutional Neural Network
- the first neural network may be a convolutional neural network that does not have a fully connected layer for final classification.
- the first CNN can employ a different network structure depending on design requirements.
- an object recognition network such as MobileNet, VGG network, ResNet network, etc., which is well-known in the field of deep learning, can be used, but a unique or dedicated CNN network can also be built by itself.
- the fully connected layers of the object recognition network for outputting the classification categories may be removed to form a corresponding CNN feature extraction system.
- the VGG-19 network includes 16 convolutional layers and 3 fully connected layers (and various auxiliary layers, such as a pooling layer, an activation layer, etc.), wherein the last of the three fully connected layers A fully connected layer is responsible for performing the classification, which can output the final classification result based on the calculation results of the first 18 layers.
- the last fully connected layer for final classification may be removed, and only the preceding layer is used to determine the candidate classification of the object, specifically
- the implementation method can be as described later.
- ResNet networks they can also be removed for the fully connected layer of the final classification.
- features extracted by the first neural network from which the fully connected layer for final classification is removed may be referred to as a first object feature vector or F1.
- F1 first object feature vector
- various parameters of the first neural network can be obtained through training.
- a fully connected layer for classification for example, a fully connected layer of the aforementioned VGG19 network, a fully connected layer of a ResNet network, etc.
- the output dimension of the fully connected layer may be The number of objects is the same.
- the output of the fully connected layer can then be converted to the identified object classification probability using the Softmax function, where the Softmax function has the following form:
- K is the dimension of the output vector z of the fully connected layer or the number of object classifications
- z j is the jth element of the output vector z
- ⁇ (z j ) is the classification probability distribution of z j
- e is the natural logarithm.
- Softmax is to map each element of, for example, a K-dimensional real number vector (for example, the output of the aforementioned fully connected layer) into a (0, 1) interval, and its sum is 1, thereby forming a classification probability distribution.
- the first neural network may be trained using a large number of object sample images labeled with classification as training samples, using cross entropy as a loss cost function of the training process, and optimally obtained by minimizing the cost function.
- the first neural network may be retrained or the first neural network may not be retrained. Without retraining the first neural network, the amount of work required for retraining will be reduced at the expense of reduced recognition accuracy. However, this cost is acceptable in view of the subsequent use of a second neural network trained for a particular classification. In other words, in some embodiments, the first neural network may not be retrained when introducing new object classifications.
- step S120 in response to determining a plurality of candidate classifications of the object, a plurality of second neural networks respectively corresponding to the plurality of candidate classifications may be used to determine the classification of the object.
- at least one of the plurality of second neural networks may be a single layer fully connected neural network trained separately for respective classifications.
- all of the plurality of second neural networks may be a single layer fully connected neural network trained separately for respective classifications.
- Step S120 is mainly used to further accurately determine the actual classification of the object in a plurality of similar candidate classifications. Since the corresponding second neural network is set and trained for each classification, this step has very good scalability. In other words, the second neural network that has been trained for the existing classification does not need to be retrained when adding a new object classification, but can be directly applied.
- the method 100 may further include step S122 and/or step S124.
- step S122 may be performed, that is, in response to determining only one candidate classification of the object, the one candidate may be classified. Determined as the classification of the object.
- step S124 may be performed, ie, in response to not determining any candidate classification of the object, the indication may be unrecognizable The message of the object.
- step S122 and step S124 are not essential steps of method 100.
- step S110 when it is determined, for example, that only one candidate category exists, it can still be further determined using the second neural network as in step S120, instead of directly determining as in step S122 of FIG. Classification for objects.
- step S110 when, for example, it is determined that there are no candidate classifications, for example, step S110 can be re-executed while reducing the respective thresholds as described below in connection with FIG. 2 and until at least one candidate classification is determined.
- step S110 and step S124 are both optional steps.
- FIG. 2 is a flow diagram showing an example method 200 for determining candidate classifications of objects in accordance with an embodiment of the disclosure.
- FIG. 3 illustrates a block diagram of a candidate classification for determining an object in accordance with an embodiment of the present disclosure.
- method 200 can include steps S210, S220, and S230.
- steps S210, S220, and S230 can include other more steps, sub-steps, etc., or steps S210, S220, and/or S230 may be replaced with steps or sub-steps that implement the same or similar functions.
- step S110 shown in FIG. 1 specific steps of the method 200 shown in FIG. 2 may be employed, but the present disclosure is not limited thereto.
- step S210 the first neural network of the object may be determined using the aforementioned first neural network (eg, the aforementioned F1).
- the first neural network may be a convolutional neural network with a fully connected layer for final classification removed.
- one or more first feature similarities between the first object feature vector and the one or more first reference feature vectors may be separately calculated, wherein the one or more first reference feature vectors
- the reference data based on one or more classifications is determined by the first neural network, respectively.
- the one or more first reference feature vectors may be reference data determined by feature extraction of reference images of various categories of objects using the aforementioned first neural network. The characteristics of the reference image can be pre-calculated and saved.
- various distance metrics can be used to determine a first feature similarity between a first object feature vector and one or more first reference feature vectors.
- a cosine distance or an Euclidean distance may be used as a measure of the first feature similarity. For example, if Euclidean distance is used, the first feature similarity can be calculated as follows:
- S1 ref(i) is a first feature similarity between the first object feature vector of the object and the first reference feature vector of the i-th reference object
- F1 ref(i) represents the i-th reference object A reference feature vector
- 2 is the Euclidean distance.
- the similarity value range based on the cosine distance is, for example, one to one, and the larger the value is, the more similar.
- a classification corresponding to the first feature similarity of the one or more first feature similarities that is greater than or equal to the predetermined threshold may be determined as the candidate classification of the object. For example, after determining the first feature similarity S1 ref(i) of the object and the one or more reference objects, a threshold Th1 may be set, and if the similarity S1 ref(i) is greater than or equal to Th1, the matching may be considered as successful. The classification for which S1 ref(i) is directed is considered as a candidate classification for the object. On the other hand, if the similarity S1 ref(i) is smaller than Th1, it can be considered that the matching is unsuccessful, and it can be determined that the object does not belong to the corresponding classification.
- different similarity thresholds Th1 may be set for different classifications.
- the similarity threshold Th1 1 may be set for the first classification (eg, beverage), and the similarity threshold Th1 2 may be set for the second classification (eg, bread).
- the similarity threshold Th1 1 may be set for the first classification (eg, beverage)
- the similarity threshold Th1 2 may be set for the second classification (eg, bread).
- the first neural network can be used to determine the candidate classification of the object.
- FIG. 4 is a flow diagram showing an example method 400 for determining a classification of an object using a second neural network based on candidate classifications of objects, in accordance with an embodiment of the disclosure.
- FIG. 5 is a block diagram illustrating an example for determining a classification of an object using a second neural network according to candidate classification of an object, according to an embodiment of the present disclosure.
- method 400 can include steps S410, S420, and S430.
- steps S410, S420, and/or S430 may be replaced with steps or sub-steps that implement the same or similar functions.
- step S120 shown in FIG. 1 specific steps of the method 400 shown in FIG. 4 may be employed, but the present disclosure is not limited thereto.
- step S410 a second object feature vector F2 of the object associated with the candidate class may be determined based on the first object feature vector F1 for each candidate class in the candidate class using the corresponding second neural network.
- any of the second neural networks may be a single layer fully connected neural network trained for the respective classification for accurately identifying the classified object.
- the training of the coefficients of the second neural network may, for example, use two samples belonging to the classification corresponding to the second neural network as a positive sample pair, the expected output value being a positive reference value; a sample of the classification corresponding to the second neural network and a sample not belonging to the classification corresponding to the second neural network as a negative sample pair, the expected output value being a negative reference value; and using the corresponding second feature
- the squared error between the calculated value of the similarity ie, the similarity given by the output of the second neural network as explained below
- the expected output value is used as a loss function, and the optimal result is obtained by minimizing the loss function training.
- the parameters of the two neural networks are used as a loss function, and the optimal result is obtained by minimizing the loss function training.
- the second neural network for a particular classification can be trained to identify the object of the particular classification, that is, the object of the classification can be distinguished from the object that is not classified by the highly accurate .
- the second neural network for a particular classification can be trained to identify the object of the particular classification, that is, the object of the classification can be distinguished from the object that is not classified by the highly accurate .
- a plurality of second feature similarities between the plurality of second object feature vectors respectively associated with the plurality of candidate categories and the corresponding plurality of second reference feature vectors may be respectively calculated, wherein The plurality of second reference feature vectors are respectively determined by the second neural network based on the plurality of first reference feature vectors.
- a second neural network may be used in advance for the reference object of each of the foregoing classifications to determine a second reference feature vector F2 ref(i) of the reference object.
- the second feature similarity S2 ref(i) between the second object feature vector F2 of the object and each of the second reference feature vectors F2 ref(i) is respectively calculated.
- the second feature similarity S2 ref(i) may be determined using a cosine distance or an Euclidean distance.
- the second feature similarity S2 ref(i) can be calculated using the following formula:
- S2 ref(i) is a second feature similarity between the second object feature vector of the object and the second reference feature vector of the i-th reference object
- F2 ref(i) represents the i-th reference object Two reference feature vectors
- 2 is the Euclidean distance
- a classification corresponding to the largest second feature similarity among the plurality of second feature similarities S2 ref(i) may be determined as the classification of the object.
- the specific value of the second feature similarity may not necessarily be considered, but a weighted value or a relative value may be considered.
- certain classified second features may be given a higher degree of similarity, such that the weighted second feature similarity values are more representative of the classification differences, for example, for those in different classifications.
- different weights can be set for different categories, so that classification differences can be reflected.
- the first neural network may be used first to extract the feature vector of the object image, and the feature matching is used to perform rough recognition of the object category; when the feature matching identifies multiple
- the second neural network for specific object classification may be used to further identify and compare the plurality of candidate object classifications to obtain more accurate recognition results.
- the object recognition method since there is no need to update the universal first neural network and different second neural networks can be trained for a particular classification, the object recognition method also has the characteristics of being easy to expand and maintain when expanding the classification of the identified objects.
- FIG. 6 is a block diagram showing an example hardware arrangement of an example device 600 for identifying an object in accordance with an embodiment of the present disclosure.
- Apparatus 600 can include a processor 606 (eg, a digital signal processor (DSP), a central processing unit (CPU), a microcontroller, a microprocessor, or any processing device).
- Processor 606 can be a single processing unit or a plurality of processing units for performing different acts of the flows described herein.
- Apparatus 600 may also include an input unit 602 for receiving signals from other entities, and an output unit 604 for providing signals to other entities.
- Input unit 602 and output unit 604 can be arranged as a single entity or as separate entities.
- device 600 can include at least one readable storage medium 608 in the form of a non-volatile or volatile memory, such as an electrically erasable programmable read only memory (EEPROM), flash memory, and/or a hard disk drive.
- the readable storage medium 608 includes a computer program 610 that includes code/computer readable instructions that, when executed by the processor 606 in the arrangement 600, cause the hardware arrangement 600 and/or the device 600 including the hardware arrangement 600.
- the flow such as described above in connection with Figures 1, 2, and 4, and any variations thereof, can be performed.
- Computer program 610 can be configured as computer program code having a computer program block 610A-610B architecture, for example. Accordingly, in an example embodiment when, for example, a hardware arrangement is used in device 600, the code in the computer program disposed includes program block 610A for determining a candidate classification of the object using the first neural network. The code in the computer program further includes a program block 610B for determining a classification of the object using a second neural network corresponding to the candidate classification, respectively, in response to determining the candidate classification of the object.
- the computer program blocks can essentially perform the various actions in the flows illustrated in Figures 1, 2, and 4 to simulate any dedicated hardware device.
- processor 606 when different computer program modules are executed in processor 606, they may correspond to respective ones of the dedicated hardware devices.
- code in the embodiment disclosed above in connection with FIG. 6 is implemented as a computer program that, when executed in processor 606, causes device 600 to perform the actions described above in connection with FIGS. 1, 2, and 4, however, in an embodiment, at least one of the codes may be implemented at least in part as a hardware circuit.
- the processor may be a single CPU (Central Processing Unit), but may also include two or more processing units.
- a processor can include a general purpose microprocessor, an instruction set processor, and/or a related chipset and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)).
- the processor may also include an onboard memory for caching purposes.
- the computer program can be carried by a computer program product connected to the processor.
- the computer program product can comprise a computer readable medium having stored thereon a computer program.
- the computer program product can be flash memory, random access memory (RAM), read only memory (ROM), EEPROM, and the computer program modules described above can be distributed to different computers in the form of memory within the UE in alternative embodiments. In the program product.
- functions described herein as being implemented by pure hardware, software, and/or firmware may also be implemented by dedicated hardware, a combination of general-purpose hardware and software, and the like.
- functions described as being implemented by dedicated hardware eg, Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), etc.
- general purpose hardware eg, central processing unit (CPU), digital signal processing (DSP) is implemented in a way that is combined with software and vice versa.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
根据本公开的实施例,提供了用于识别物体的方法、设备和计算机可读存储介质。该方法包括:使用第一神经网络来确定所述物体的候选分类;以及响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类。该设备包括:处理器;存储器,存储指令,所述指令在由处理器执行时使得所述处理器:使用第一神经网络来确定所述物体的候选分类;以及响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类。
Description
相关申请的交叉引用
本申请要求于2018年4月26日递交的中国专利申请CN201810389080.2的优先权,其全部公开内容通过引用合并于此。
本公开涉及人工智能领域,且更具体地涉及用于识别物体的方法、设备和计算机可读存储介质。
随着无人超市的逐渐兴起,为了实现超市里商品自动管理和购物车自动结算,有必要通过图像识别之类的人工智能技术对超市的商品进行识别,以自动识别出商品类别,从而提升用户体验、降低管理成本。
发明内容
根据本公开的第一方面,提供了一种用于识别物体的方法。该方法包括:使用第一神经网络来确定所述物体的候选分类;以及响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类。
在一些实施例中,使用第一神经网络来确定所述物体的候选分类的步骤包括:使用第一神经网络来确定所述物体与分类的参考数据的第一特征相似度;以及将与所述第一特征相似度中的大于等于预定阈值的第一特征相似度相对应的分类确定为所述物体的候选分类。
在一些实施例中,使用第一神经网络来确定所述物体与分类的参考数据的第一特征相似度的步骤包括:使用第一神经网络来确定所述物体的第一物体特征向量;以及分别计算所述第一物体特征向量与第一参考特征向量之间的第一特征相似度,其中,所述第一参考特征向量是基于所述分类的参考数据由所述第一神经网络来分别确定的。
在一些实施例中,所述预定阈值是针对所有第一特征相似度统一设置的阈值,或者所 述预定阈值是针对所述第一特征相似度分别设置的阈值,并且所述阈值能够彼此独立设置。
在一些实施例中,所述方法还包括:响应于仅确定了所述物体的一个候选分类,将所述一个候选分类确定为所述物体的分类。
在一些实施例中,所述方法还包括:响应于没有确定所述物体的任何候选分类,提供指示无法识别所述物体的输出。
在一些实施例中,响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类的步骤包括:针对所述候选分类中的每个候选分类,基于所述第一物体特征向量,使用相应的第二神经网络来确定所述物体的与该候选分类相关联的第二物体特征向量;分别计算所述物体的与所述候选分类分别关联的第二物体特征向量与相应的第二参考特征向量之间的第二特征相似度,其中,所述第二参考特征向量是基于第一参考特征向量由所述第二神经网络来分别确定的;以及将与所述第二特征相似度中的最大第二特征相似度相对应的分类确定为所述物体的分类。
在一些实施例中,所述第二神经网络的训练方式为:使用属于与所述第二神经网络相对应的分类的两个样本作为正样本对,其期望输出值为正参考值;使用属于与所述第二神经网络相对应的分类的一个样本以及不属于与所述第二神经网络相对应的分类的一个样本作为负样本对,其期望输出值为负参考值;以及使用相应第二特征相似度的计算值和期望输出值之间的平方误差作为损失函数。
在一些实施例中,所述第一神经网络是移除了用于最终分类的全连接层的卷积神经网络,所述第二神经网络是单层全连接神经网络。
根据本公开的第二方面,提供了一种用于识别物体的设备。该设备包括:处理器;存储器,存储指令,所述指令在由处理器执行时使得所述处理器:使用第一神经网络来确定所述物体的候选分类;以及响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类。
在一些实施例中,所述指令在由处理器执行时还使得所述处理器:使用第一神经网络来确定所述物体与分类的参考数据的第一特征相似度;以及将与所述第一特征相似度中的大于等于预定阈值的第一特征相似度相对应的分类确定为所述物体的候选分类。
在一些实施例中,所述指令在由处理器执行时还使得所述处理器:使用第一神经网络 来确定所述物体的第一物体特征向量;以及分别计算所述第一物体特征向量与第一参考特征向量之间的第一特征相似度,其中,所述第一参考特征向量是基于所述分类的参考数据由所述第一神经网络来分别确定的。
在一些实施例中,所述预定阈值是针对所有第一特征相似度统一设置的阈值,或者所述预定阈值是针对所述第一特征相似度分别设置的阈值,并且所述阈值能够彼此独立设置。
在一些实施例中,所述指令在由处理器执行时还使得所述处理器:响应于仅确定了所述物体的一个候选分类,将所述一个候选分类确定为所述物体的分类。
在一些实施例中,所述指令在由处理器执行时还使得所述处理器:响应于没有确定所述物体的任何候选分类,输出指示无法识别所述物体的消息。
在一些实施例中,所述指令在由处理器执行时还使得所述处理器:针对所述候选分类中的每个候选分类,基于所述第一物体特征向量,使用相应的第二神经网络来确定所述物体的与该候选分类相关联的第二物体特征向量;分别计算所述物体的与所述候选分类分别关联的第二物体特征向量与相应的第二参考特征向量之间的第二特征相似度,其中,所述第二参考特征向量是基于第一参考特征向量由所述第二神经网络来分别确定的;以及将与所述第二特征相似度中的最大第二特征相似度相对应的分类确定为所述物体的分类。
在一些实施例中,所述第二神经网络的训练方式为:使用属于与所述第二神经网络相对应的分类的两个样本作为正样本对,其期望输出值为正参考值;使用属于与所述第二神经网络相对应的分类的一个样本以及不属于与所述第二神经网络相对应的分类的一个样本作为负样本对,其期望输出值为负参考值;以及使用相应第二特征相似度的计算值和期望输出值之间的平方误差作为损失函数。
在一些实施例中,所述第一神经网络是移除了用于最终分类的全连接层的卷积神经网络,所述第二神经网络是单层全连接神经网络。
根据本公开的第三方面,提供了一种存储指令的计算机可读存储介质,所述指令在由处理器执行时使得所述处理器执行根据本公开第一方面所述的方法。
通过下面结合附图说明本公开的优选实施例,将使本公开的上述及其它目的、特征和 优点更加清楚,其中:
图1是示出了根据本公开实施例的用于识别物体的示例方法的流程图。
图2是示出了根据本公开实施例的用于确定物体的候选分类的示例方法的流程图。
图3示出了根据本公开实施例的用于确定物体的候选分类的示例的框图。
图4是示出了根据本公开实施例的用于根据物体的候选分类使用第二神经网络来确定物体的分类的示例方法的流程图。
图5是示出了根据本公开实施例的用于根据物体的候选分类使用第二神经网络来确定物体的分类的示例的框图。
图6是示出了根据本公开实施例的用于识别物体的示例设备的示例硬件布置的框图。
为了使本申请的目的、技术方案和优点更加清楚明白,以下结合附图对本申请做进一步详细说明。应注意,以下描述只用于举例说明,并不用于限制本公开。在以下描述中,为了提供对本公开的透彻理解,阐述了大量特定细节。然而,对于本领域普通技术人员显而易见的是:不必采用这些特定细节来实行本公开。在其他实例中,为了避免混淆本公开,未具体描述公知的电路、材料或方法。
在整个说明书中,对“一个实施例”、“实施例”、“一个示例”或“示例”的提及意味着:结合该实施例或示例描述的特定特征、结构或特性被包含在本公开至少一个实施例中。因此,在整个说明书的各个地方出现的短语“在一个实施例中”、“在实施例中”、“一个示例”或“示例”不一定都指同一实施例或示例。此外,可以以任何适当的组合和/或子组合将特定的特征、结构或特性组合在一个或多个实施例或示例中。此外,本领域普通技术人员应当理解,在此提供的附图都是为了说明的目的,并且附图不一定是按比例绘制的。这里使用的术语“和/或”包括一个或多个相关列出的项目的任何和所有组合。
常用的物体识别技术是例如采用卷积神经网络(CNN)构建一个物体分类器。用该物体分类器可以对不同类别的商品进行识别。然而,这种方法虽然识别准确率高,但是可扩展性差。例如,如果添加新的商品类别,则需要消耗大量时间对整个分类器进行重新设计和训练,从而无法满足超市快速引进新商品的需求。
此外,为了实现新类别在线扩展,有的商品识别器采用先提取特征然后进行特征比对的方法识别商品,避免对特征提取器进行重训练。但是这种方法的特征比对准确率会下降,而且当某个新类别与原有类别比较相似时,容易混淆导致错误识别。
如上所述,为了至少部分解决或减轻常用的物体识别方案不能同时实现高识别率以及高可扩展性的问题,根据本公开的实施例,提出一种能够在线扩展的两级物体识别方法。其大体上涉及以下两步:首先使用第一神经网络来确定物体的图像的特征向量,并用特征匹配来进行物体分类的粗识别。在粗识别过程中识别出多个易混淆的候选物体分类的情况下,可以使用针对这些特定类别的第二神经网络对这多个候选物体分类进行进一步的识别对比,从而得到更精确的识别结果。在采用这种方案的情况下,可以同时实现高识别精度以及高可扩展性。
需要注意的是:尽管本公开实施例是在物体图像识别的上下文中来描述的,但本公开实施例不限于此。事实上,只要是需要在涉及基于神经网络的需要在多个类别之间进行区分且同时需要高可扩展性的场景中,都可以采用本公开实施例所描述的理念。例如,除了在对物体的图像进行识别之外,还可以例如针对用户行为特征(例如,根据用户的行为来帮助网站或应用开发者来区分其用户类别,例如恶意用户、忠实用户、不经常访问用户、经常访问用户等)。
在开始描述根据本公开实施例之前,将首先简要介绍将在本文中使用的一些技术术语。
卷积神经网络(Convolutional Neural Network)
Hubel和Wiesel等人在1950年和1960年的研究表明:猫和猴子的视觉皮层中包含单独对视野中的小区域进行响应的神经元。如果眼睛不移动的话,视觉空间中的由视觉刺激影响单个神经元的区域被称为该神经元的感受野或感受域(receptive field)。相邻神经元具有相似和重叠的感受野。感受野的大小和位置在皮层上系统性改变,以形成完整的视觉空间映射。
受到该研究的启发,在机器学习领域中,提出了卷积神经网络(简称为CNN或ConvNet),其是一类前馈(feed-forward)人工神经网络。具体地,该神经网络的神经元之间的连接模式受到了动物视觉皮层的启发。单个神经元对于空间的有限区域中的刺激加以响应,该有限区域即为前述感受野。不同神经元的各自感受野彼此部分重叠,使得它们排 列形成了整个视野。单个神经元对其感受野内刺激的响应可以用卷积运算的方式来数学近似。因此,卷积神经网络在图像和视频识别领域、推荐领域(例如,购物网站的商品推荐等)、以及自然语言处理领域中都具有广泛的应用。
卷积神经网络通常可以包括多个功能层,例如如下面将要详细描述的卷积层、全连接层等,其可以通过多个卷积层/全连接层等的堆叠,从局部特征捕捉逐渐变为全局特征捕捉,并最终可以得到识别/分类结果。作为一个直观的示例,在例如人脸识别领域中,不妨认为卷积神经网络的第一个卷积层学习到的特征可以是例如眼睛颜色、眼睛轮廓、睫毛、鼻子轮廓、鼻子阴影、嘴部轮廓、嘴部颜色等等细微(或非常局部)的特征,而第二个卷积层针对第一卷积层的输出所学习到的可以是眼睛(根据例如眼睛颜色、眼睛轮廓、睫毛等来识别)、鼻子(根据例如鼻子轮廓、鼻子阴影等来确定)、嘴(根据例如嘴部轮廓、嘴部颜色等来确定)等略大一些的面部器官的特征,这些特征在全局性上大于第一个卷积层所学习到的特征。而第三个卷积层或全连接层可以根据第二个卷积层的输出来学习到人脸(根据例如眼睛、鼻子、嘴等来确定)等更为全局的特征,并最终判定图像中存在人脸或者人脸的特征点的位置。当然,本公开不限于此。此外,关于全连接层的详细描述将在后文中给出,此处不再详细讨论。
然而,尽管上述示例是以人类能够理解的方式给出的,但实际上CNN学习到的特征通常不是人类所理解的语义特征,而是一种抽象特征,这种特征对于人类来说通常是完全无法理解的。但是通过将这些特征组合在一起,计算机就可以判定这是一个人脸以及人脸的各个部位。为了便于理解,可以认为一个人判断人脸的标准可能是看看图像中有没有人类的眼睛、鼻子、嘴巴;但是另一个人选取的特征就可能是图像中有没有人类的眉毛、下巴;还有的人可能更加奇怪,会去通过看看这张图像里是不是有眼镜、口罩、耳环等去判断是否存在人脸。而卷积神经网络,则可能是最奇怪的一个“人”,它可能会使用人类完全无法用语言形容的一系列特征去判断是否是人脸以及人脸的各个部位,例如,某些特定的像素组合。
卷积层
卷积层是CNN的核心构造单元。该层的参数由可学习卷积核(或简称为卷积核)的集合来构成,每个卷积核具有很小的感受野,但是在输入数据的整个深度上延伸。在前向过 程中,将每个卷积核沿输入数据的宽度和高度进行卷积,计算卷积核的元素与输入数据之间的点积,并产生该卷积核的二维激活映射。作为结果,网络能够学习到在输入的某个空间位置上看到某个具体类型的特征时才可以激活的卷积核。
例如,假设输入数据和卷积核分别为如下等式左侧的4 x 4和2 x 2的矩阵,则卷积计算结果如下式(1):
假如该卷积核是用于识别特定物体(例如,眼睛)的卷积核,则可以看到在等式右侧作为结果的输出中的右上方相对于左下方而言出现该物体的可能性更大。如前所述,通过堆叠多个卷积层,就能实现特征识别从局部向全局逐渐演变的过程。
全连接层
通常,在多个卷积层之后,可以经由全连接层来实现卷积神经网络中的全局特征捕捉。全连接层其实是一种特殊的卷积层,其卷积核具有针对前一层输出的所有元素的全连接,这与常规神经网络是一样的。因此,可以对其直接使用矩阵乘法。
具体地,全连接层的输出可以是一个一维数组,其中每个元素可代表该图像被分类到某一类别的可能性指标。在人脸特征识别的上下文中,该输出可以用来例如确定图像中是否存在人脸、该人脸的性别、人种、年龄等等,不公开不限于此。
接下来,将结合图1来详细描述根据本公开实施例的用于识别物体的方法的流程。
图1是示出了根据本公开实施例的用于识别物体的示例方法100的流程图。如图1所示,方法100可以包括步骤S110和S120。然而,本公开实施例不限于此,事实上方法100也可以包括其它更多的步骤、子步骤等,或者可以用实现相同或相似功能的步骤或子步骤来替换步骤S110和/或S120。
如图1所示,方法100可以开始于步骤S110。在步骤S110中,可以使用第一神经网络来确定物体的候选分类。在该步骤S110中,主要是使用该第一神经网络作为通用物体识别器对物体进行初步分类或粗分类。在一些实施例中,该第一神经网络可以是卷积神经网络(CNN)。更进一步地,在一些实施例中,该第一神经网络可以是不具有用于最终分类的全连接层的卷积神经网络。在具体实施例中,该第一CNN可以根据设计需求而采用不同的网 络结构。
例如,可以使用在深度学习领域中知名的MobileNet、VGG网络、ResNet网络等物体识别网络,然而也可以自行搭建特有或专用的CNN网络。此外,在进行物体识别的情况下,可以将这些物体识别网络的用于输出分类类别的全连接层移除,以形成相应的CNN特征提取系统。以VGG-19网络为例,其包括16个卷积层和3个全连接层(以及各种辅助层,例如池化层、激活层等等),其中,这三个全连接层中的最后一个全连接层负责执行分类,其可以根据前18层的计算结果来输出最终分类结果。在使用VGG-19网络作为本实施例中的第一神经网络进行识别时,可以移除其最后一个用于最终分类的全连接层,而仅使用其前面的层来确定物体的候选分类,具体实现方法可如后文所述。又例如,在使用MobileNet、ResNet网络进行识别的情况下,同样可以移除它们用于最终分类的全连接层。在本说明书的上下文中,经该移除了用于最终分类的全连接层的第一神经网络所提取的特征可以被称为第一物体特征向量或F1。后文中,将结合图2来详细描述步骤S110的示例实现方式。
在一些实施例中,该第一神经网络的各个参数可以通过训练得到。例如,可以在该第一神经网络的后面加上用于分类的全连接层(例如,前述VGG19网络的全连接层、ResNet网络的全连接层等),该全连接层的输出维数可以与物体分类数量相同。然后可以用Softmax函数将该全连接层的输出转换成所识别的物体分类概率,其中,Softmax函数具有如下形式:
其中,K是全连接层的输出向量z的维数或物体分类数量,z
j为输出向量z的第j个元素,σ(z
j)为z
j的分类概率分布,e为自然对数。Softmax的作用是将例如K维实数向量(例如,前述全连接层的输出)的每个元素映射到(0,1)区间中,且其和为1,从而形成分类概率分布。
此外,在一些实施例中,可以使用大量标注有分类的物体样本图像作为训练样本对该第一神经网络进行训练,使用交叉熵作为训练过程的损失代价函数,通过最小化代价函数得到最优的第一神经网络。需要注意的是:在引入新的物体分类时,可以对第一神经网络进行重新训练,也可以不对第一神经网络进行重新训练。在不对第一神经网络进行重新训练的情况下,将以识别精度下降的代价来减少重新训练所需的工作量。然而,考虑到后面采用针对特定分类来训练的第二神经网络,该代价是可以接受的。换言之,在一些实施例 中,在引入新的物体分类时,可以不对第一神经网络重新训练。
接下来,回到图1,在步骤S120中,响应于确定了物体的多个候选分类,可以使用与该多个候选分类分别对应的多个第二神经网络来确定物体的分类。在一些实施例中,该多个第二神经网络中的至少一个第二神经网络可以是针对相应分类来分别训练的单层全连接神经网络。在一些实施例中,该多个第二神经网络中的全部第二神经网络可以是针对相应分类来分别训练的单层全连接神经网络。换言之,在根据步骤S110确定了物体的候选分类且存在多个候选分类的情况下,可以执行步骤S120。后文中,将结合图4来详细描述步骤S120的示例实现方式。步骤S120主要用于在多个相似的候选分类中进一步精确确定物体的实际分类。由于是针对每个分类来设置并训练相应的第二神经网络,因此该步骤具备非常良好的可扩展性。换言之,针对已有分类已经训练好的第二神经网络在添加新的物体分类时无需再重新训练,而是可以直接应用。
此外,如图1所示,方法100还可以包括步骤S122和/或步骤S124。在一些实施例中,如果在根据步骤S110确定了物体的候选分类且仅存在一个候选分类的情况下,可以执行步骤S122,即响应于仅确定了物体的一个候选分类,可以将该一个候选分类确定为物体的分类。在一些实施例中,如果在根据步骤S110确定了物体的候选分类且没有确定物体的任何候选分类的情况下,可以执行步骤S124,即响应于没有确定物体的任何候选分类,可以输出指示无法识别物体的消息。
然而,需要注意的是:步骤S122和步骤S124并不是方法100的必要步骤。例如,在另一些实施例中,当例如确定仅存在一个候选分类时,依然可以将其如图步骤S120一样使用第二神经网络来进一步确定其分类,而不是如图1的步骤S122一样直接确定为物体的分类。又例如,在另一些实施例中,当例如确定没有任何候选分类时,可以重新执行例如步骤S110且同时降低如以下结合图2所描述的相应阈值,并直到确定至少一个候选分类为止。类似地,在确定仅一个候选分类的情况下,也可以返回步骤S110并降低阈值以增加候选分类的数目。总而言之,步骤S122和步骤S124均为备选步骤。
图2是示出了根据本公开实施例的用于确定物体的候选分类的示例方法200的流程图。图3示出了根据本公开实施例的用于确定物体的候选分类的框图。如图2所示,方法200可以包括步骤S210、S220和S230。然而,本公开实施例不限于此,事实上方法200也可以 包括其它更多的步骤、子步骤等,或者可以用实现相同或相似功能的步骤或子步骤来替换步骤S210、S220和/或S230。如前所述,在图1所示的步骤S110中,可以采用图2所示的方法200的具体步骤,然而本公开不限于此。
如图2所示,方法200可以开始于步骤S210。在步骤S210中,可以使用前述第一神经网络来确定物体的第一物体特征向量(例如,前述F1)。如前所述,第一神经网络可以是移除了用于最终分类的全连接层的卷积神经网络。
接下来,在步骤S220中,可以分别计算第一物体特征向量与一个或多个第一参考特征向量之间的一个或多个第一特征相似度,其中,一个或多个第一参考特征向量是基于一个或多个分类的参考数据由第一神经网络来分别确定的。例如,在一些实施例中,该一个或多个第一参考特征向量可以是使用前述第一神经网络对各种类别物体的参考图像进行特征提取所确定的参考数据。参考图像的特征可预先计算出来并保存。
参照图3,在一些实施例中,可以使用各种距离度量来确定第一物体特征向量与一个或多个第一参考特征向量之间的第一特征相似度。在一些实施例中,可以使用余弦距离或欧式距离作为第一特征相似度的度量。例如,如果采用欧式距离的话,则第一特征相似度可以如下计算:
S1
ref(i)=||F1
ref(i)-F1||
2
其中,S1
ref(i)是物体的第一物体特征向量与第i个参考物体的第一参考特征向量之间的第一特征相似度,且F1
ref(i)代表第i个参考物体的第一参考特征向量,且||·||
2为欧式距离。此外,基于余弦距离的相似度取值范围例如是一1~1,取值越大则越相似。
然后,在步骤S230中,可以将与一个或多个第一特征相似度中的大于等于预定阈值的第一特征相似度相对应的分类确定为物体的候选分类。例如,在确定了物体与一个或多个参考物体的第一特征相似度S1
ref(i)之后,可以设置一个阈值Th1,如果相似度S1
ref(i)大于等于Th1,则可以视为匹配成功,将S1
ref(i)所针对的分类视为该物体的候选分类。反之,如果相似度S1
ref(i)小于Th1,则可以视为匹配不成功,则可以确定该物体不属于相应分类。
此外,在另一些实施例中,可以针对不同的分类设置不同的相似度阈值Th1。例如,对于第一分类(例如,饮料)可以设置相似度阈值Th1
1,而对于第二分类(例如,面包)可以设置相似度阈值Th1
2。从而,通过针对不同分类设置不同的相似度阈值,可以体现不同 分类的特点。这是因为在一些分类中的物体与其相似分类中的物体之间的相似度要高于另一些分类中的物体与其相似分类中的物体之间的相似度。例如,对于具有大体相同形状的物体的多个分类,需要设置更高的相似度阈值来区分不同物体分类,而对于另一些物体分类来说,如果设置过高的相似度阈值可能导致无法正确识别同一分类中具有较大形体差异的物体。换言之,在另一些实施例中,可以针对与不同分类相对应的第一特征相似度设置不同的相似度阈值,以更准确地反应分类之间的区别以及分类内部的共同点。
从而,通过如图2所示的实施例,可以使用第一神经网络来确定物体的候选分类。
图4是示出了根据本公开实施例的用于根据物体的候选分类使用第二神经网络来确定物体的分类的示例方法400的流程图。图5是示出了根据本公开实施例的用于根据物体的候选分类使用第二神经网络来确定物体的分类的示例的框图。如图4所示,方法400可以包括步骤S410、S420和S430。然而,本公开实施例不限于此,事实上方法400也可以包括其它更多的步骤、子步骤等,或者可以用实现相同或相似功能的步骤或子步骤来替换步骤S410、S420和/或S430。如前所述,在图1所示的步骤S120中,可以采用图4所示的方法400的具体步骤,然而本公开不限于此。
如图4所示,方法400可以开始于步骤S410。在步骤S410中,可以针对候选分类中的每个候选分类,基于第一物体特征向量F1,使用相应的第二神经网络来确定物体的与该候选分类相关联的第二物体特征向量F2。在一些实施例中,任一个第二神经网络可以是针对相应分类来训练的单层全连接神经网络,用于精确识别该分类物体。在一些实施例中,第二神经网络的系数的训练方式可以例如:使用属于与该第二神经网络相对应的分类的两个样本作为正样本对,其期望输出值为正参考值;使用属于与该第二神经网络相对应的分类的一个样本以及不属于与该第二神经网络相对应的分类的一个样本作为负样本对,其期望输出值为负参考值;以及使用相应的第二特征相似度(即,如下文解释的根据第二神经网络的输出所给出的相似度)的计算值和期望输出值之间的平方误差作为损失函数,通过最小化损失函数训练得到最优的第二神经网络的参数。通过使用这样设置的大量样本对,可以将针对特定分类的第二神经网络训练为专门用于识别该特定分类的物体,即,可以高度精确地将本分类的物体与非本分类的物体区分开。这样,即便后续增加了新的分类,都无需针对新分类来重新训练已有分类的第二神经网络。
接下来,在步骤S420中,可以分别计算物体的与多个候选分类分别关联的多个第二物体特征向量与相应的多个第二参考特征向量之间的多个第二特征相似度,其中,多个第二参考特征向量是基于多个第一参考特征向量由第二神经网络来分别确定的。参考图5,可以针对前述每个分类的参考物体预先使用第二神经网络来确定参考物体的第二参考特征向量F2
ref(i)。然后,与第一特征相似度的计算相类似地,分别计算物体的第二物体特征向量F2与各个第二参考特征向量F2
ref(i)之间的第二特征相似度S2
ref(i)。例如,在一些实施例中,可以采用余弦距离或欧式距离来确定第二特征相似度S2
ref(i)。例如,可以采用以下公式来计算第二特征相似度S2
ref(i):
S2
ref(i)=||F2
ref(i)-F2||
2
其中,S2
ref(i)是物体的第二物体特征向量与第i个参考物体的第二参考特征向量之间的第二特征相似度,且F2
ref(i)代表第i个参考物体的第二参考特征向量,且||·||
2为欧式距离。
然后,在步骤S430中,可以将与多个第二特征相似度S2
ref(i)中的最大第二特征相似度相对应的分类确定为物体的分类。然而,在另一些实施例中,与第一特征相似度S1
ref(i)的处理相类似地,也可以不一定仅考虑第二特征相似度的具体值,而是考虑加权值或相对值。例如,在另一些实施例中,可以给予某些分类的第二特征相似度更高的权重,使其加权后的第二特征相似度值更能体现分类差异,例如,针对那些不同分类中的物体差异较大的分类;相对地,可以给予某些分类的第二特征相似度更低的权重,以对将物体判定为该分类提出更高的要求,例如,针对不同分类中物体差异较小的分类。换言之,可以针对不同的分类设置不同的权重,从而能够体现出分类差异。
通过上面结合图1、图2和图4所描述的物体识别方法,可以首先使用第一神经网络来提取物体图像的特征向量,并用特征匹配进行物体类别的粗识别;当特征匹配识别出多个易混淆的候选物体分类时,可以使用针对特定物体分类的第二神经网络对该多个候选物体分类进行进一步的识别对比,得到更精确的识别结果。此外,由于无需更新通用的第一神经网络且可以针对特定分类来训练不同的第二神经网络,因此该物体识别方法在扩展所识别物体的分类时还具有易于扩展和维护的特点。
图6是示出了根据本公开实施例的用于识别物体的示例设备600的示例硬件布置的框图。设备600可以包括处理器606(例如,数字信号处理器(DSP)、中央处理单元(CPU)、 微控制器、微处理器或任何处理器件)。处理器606可以是用于执行本文描述的流程的不同动作的单一处理单元或者是多个处理单元。设备600还可以包括用于从其他实体接收信号的输入单元602、以及用于向其他实体提供信号的输出单元604。输入单元602和输出单元604可以被布置为单一实体或者是分离的实体。
此外,设备600可以包括具有非易失性或易失性存储器形式的至少一个可读存储介质608,例如是电可擦除可编程只读存储器(EEPROM)、闪存、和/或硬盘驱动器。可读存储介质608包括计算机程序610,该计算机程序610包括代码/计算机可读指令,其在由布置600中的处理器606执行时使得硬件布置600和/或包括硬件布置600在内的设备600可以执行例如上面结合图1、图2和图4所描述的流程及其任何变形。
计算机程序610可被配置为具有例如计算机程序块610A~610B架构的计算机程序代码。因此,在例如设备600中使用硬件布置时的示例实施例中,布置的计算机程序中的代码包括:程序块610A,用于使用第一神经网络来确定物体的候选分类。计算机程序中的代码还包括:程序块610B,用于响应于确定了物体的候选分类,使用与候选分类分别对应的第二神经网络来确定物体的分类。
计算机程序块实质上可以执行图1、图2和图4中所示出的流程中的各个动作,以模拟任何专用硬件设备。换言之,当在处理器606中执行不同计算机程序模块时,它们可以对应于专用硬件设备中的相应硬件单元。
尽管上面结合图6所公开的实施例中的代码被实现为计算机程序,其在处理器606中执行时使得设备600执行上面结合图1、图2和图4所描述的动作,然而在备选实施例中,该代码中的至少一项可以至少被部分地实现为硬件电路。
处理器可以是单个CPU(中央处理单元),但也可以包括两个或更多个处理单元。例如,处理器可以包括通用微处理器、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC))。处理器还可以包括用于缓存用途的板载存储器。计算机程序可以由连接到处理器的计算机程序产品来承载。计算机程序产品可以包括其上存储有计算机程序的计算机可读介质。例如,计算机程序产品可以是闪存、随机存取存储器(RAM)、只读存储器(ROM)、EEPROM,且上述计算机程序模块在备选实施例中可以用UE内的存储器的形式被分布到不同计算机程序产品中。
至此已经结合优选实施例对本公开进行了描述。应该理解,本领域技术人员在不脱离本公开的精神和范围的情况下,可以进行各种其它的改变、替换和添加。因此,本公开的范围不局限于上述特定实施例,而应由所附权利要求所限定。
此外,在本文中被描述为通过纯硬件、纯软件和/或固件来实现的功能,也可以通过专用硬件、通用硬件与软件的结合等方式来实现。例如,被描述为通过专用硬件(例如,现场可编程门阵列(FPGA)、专用集成电路(ASIC)等)来实现的功能,可以由通用硬件(例如,中央处理单元(CPU)、数字信号处理器(DSP))与软件的结合的方式来实现,反之亦然。
Claims (19)
- 一种用于识别物体的方法,包括:使用第一神经网络来确定所述物体的候选分类;以及响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类。
- 根据权利要求1所述的方法,其中,使用第一神经网络来确定所述物体的候选分类的步骤包括:使用第一神经网络来确定所述物体与所述分类的参考数据的第一特征相似度;以及将与所述第一特征相似度中的大于等于预定阈值的第一特征相似度相对应的分类确定为所述物体的候选分类。
- 根据权利要求2所述的方法,其中,使用第一神经网络来确定所述物体与所述分类的参考数据的所述第一特征相似度的步骤包括:使用第一神经网络来确定所述物体的第一物体特征向量;以及分别计算所述第一物体特征向量与第一参考特征向量之间的第一特征相似度,其中,所述第一参考特征向量是基于所述分类的参考数据由所述第一神经网络来分别确定的。
- 根据权利要求2所述的方法,其中,所述预定阈值是针对所有第一特征相似度统一设置的阈值,或者所述预定阈值是针对所述第一特征相似度分别设置的阈值,并且所述阈值能够彼此独立设置。
- 根据权利要求1所述的方法,还包括:响应于仅确定了所述物体的一个候选分类,将所述一个候选分类确定为所述物体的分类。
- 根据权利要求1所述的方法,还包括:响应于没有确定所述物体的任何候选分类,输出指示无法识别所述物体的消息。
- 根据权利要求3所述的方法,其中,响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类的步骤包括:针对所述候选分类中的每个候选分类,基于所述第一物体特征向量,使用相应的第二 神经网络来确定所述物体的与该候选分类相关联的第二物体特征向量;分别计算所述物体的与所述候选分类分别关联的第二物体特征向量与相应的第二参考特征向量之间的第二特征相似度,其中,所述第二参考特征向量是基于第一参考特征向量由所述第二神经网络来分别确定的;以及将与所述第二特征相似度中的最大第二特征相似度相对应的分类确定为所述物体的分类。
- 根据权利要求1所述的方法,其中,所述第二神经网络训练方式为:使用属于与所述第二神经网络相对应的分类的两个样本作为正样本对,其期望输出值为正参考值;使用属于与所述第二神经网络相对应的分类的一个样本以及不属于与所述第二神经网络相对应的分类的一个样本作为负样本对,其期望输出值为负参考值;以及使用相应第二特征相似度的计算值和期望输出值之间的平方误差作为损失函数。
- 根据权利要求1所述的方法,其中,所述第一全卷积神经网络为只包含卷积层的神经网络或只包卷基层和迟化层的神经网络,所述第二神经网络包含有全连接层、单层神经网络和分类器。
- 一种用于识别物体的设备,包括:处理器;存储器,存储指令,所述指令在由处理器执行时使得所述处理器:使用第一神经网络来确定所述物体的候选分类;以及响应于确定了所述物体的候选分类,使用与所述候选分类分别对应的第二神经网络来确定所述物体的分类。
- 根据权利要求10所述的设备,其中,所述指令在由处理器执行时还使得所述处理器:使用第一神经网络来确定所述物体与分类的参考数据的第一特征相似度;以及将与所述第一特征相似度中的大于等于预定阈值的第一特征相似度相对应的分类确定为所述物体的候选分类。
- 根据权利要求11所述的设备,其中,所述指令在由处理器执行时还使得所述处理 器:使用第一神经网络来确定所述物体的第一物体特征向量;以及分别计算所述第一物体特征向量与第一参考特征向量之间的第一特征相似度,其中,所述第一参考特征向量是基于所述分类的参考数据由所述第一神经网络来分别确定的。
- 根据权利要求11所述的设备,其中,所述预定阈值是针对所有第一特征相似度统一设置的阈值,或者所述预定阈值是针对所述第一特征相似度分别设置的阈值,并且所述阈值能够彼此独立设置。
- 根据权利要求10所述的设备,其中,所述指令在由处理器执行时还使得所述处理器:响应于仅确定了所述物体的一个候选分类,将所述一个候选分类确定为所述物体的分类。
- 根据权利要求10所述的设备,其中,所述指令在由处理器执行时还使得所述处理器:响应于没有确定所述物体的任何候选分类,输出指示无法识别所述物体的消息。
- 根据权利要求12所述的设备,其中,所述指令在由处理器执行时还使得所述处理器:针对所述候选分类中的每个候选分类,基于所述第一物体特征向量,使用相应的第二神经网络来确定所述物体的与该候选分类相关联的第二物体特征向量;分别计算所述物体的与所述候选分类分别关联的第二物体特征向量与相应的第二参考特征向量之间的第二特征相似度,其中,所述第二参考特征向量是基于第一参考特征向量由所述第二神经网络来分别确定的;以及将与所述第二特征相似度中的最大第二特征相似度相对应的分类确定为所述物体的分类。
- 根据权利要求10所述的设备,其中,所述第二神经网络的训练方式为:使用属于与所述第二神经网络相对应的分类的两个样本作为正样本对,其期望输出值为正参考值;使用属于与所述第二神经网络相对应的分类的一个样本以及不属于与所述第二神经网 络相对应的分类的一个样本作为负样本对,其期望输出值为负参考值;以及使用相应第二特征相似度的计算值和期望输出值之间的平方误差作为损失函数。
- 根据权利要求10所述的设备,其中,所述第一神经网络是移除了用于最终分类的全连接层的卷积神经网络,所述第二神经网络是单层全连接神经网络。
- 一种存储指令的计算机可读存储介质,所述指令在由处理器执行时使得所述处理器执行根据权利要求1所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19792150.5A EP3786846B1 (en) | 2018-04-26 | 2019-01-03 | Method used for identifying object, device and computer readable storage medium |
US16/473,007 US11093800B2 (en) | 2018-04-26 | 2019-01-03 | Method and device for identifying object and computer readable storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810389080.2 | 2018-04-26 | ||
CN201810389080.2A CN110414541B (zh) | 2018-04-26 | 2018-04-26 | 用于识别物体的方法、设备和计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019205729A1 true WO2019205729A1 (zh) | 2019-10-31 |
Family
ID=68293750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/070207 WO2019205729A1 (zh) | 2018-04-26 | 2019-01-03 | 用于识别物体的方法、设备和计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11093800B2 (zh) |
EP (1) | EP3786846B1 (zh) |
CN (1) | CN110414541B (zh) |
WO (1) | WO2019205729A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111126384A (zh) * | 2019-12-12 | 2020-05-08 | 创新奇智(青岛)科技有限公司 | 基于特征融合的商品分类系统及分类方法 |
CN115004191A (zh) * | 2020-03-20 | 2022-09-02 | 深圳市欢太数字科技有限公司 | 信息识别方法、装置、存储介质及电子设备 |
DE102020203707A1 (de) * | 2020-03-23 | 2021-09-23 | Robert Bosch Gesellschaft mit beschränkter Haftung | Plausibilisierung der Ausgabe neuronaler Klassifikatornetzwerke |
CN111918137B (zh) * | 2020-06-29 | 2021-07-20 | 北京大学 | 一种基于视频特征的推送方法、装置、存储介质及终端 |
CN114693957A (zh) * | 2020-12-30 | 2022-07-01 | 顺丰科技有限公司 | 包裹分类方法、装置、电子设备及存储介质 |
CN113934929A (zh) * | 2021-09-30 | 2022-01-14 | 联想(北京)有限公司 | 推荐方法及装置、电子设备 |
US20230186623A1 (en) * | 2021-12-14 | 2023-06-15 | Ping An Technology (Shenzhen) Co., Ltd. | Systems and methods for crop disease diagnosis |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101406390A (zh) * | 2007-10-10 | 2009-04-15 | 三星电子株式会社 | 检测人体部位和人的方法和设备以及对象检测方法和设备 |
CN102867172A (zh) * | 2012-08-27 | 2013-01-09 | Tcl集团股份有限公司 | 一种人眼定位方法、系统及电子设备 |
CN106535134A (zh) * | 2016-11-22 | 2017-03-22 | 上海斐讯数据通信技术有限公司 | 一种基于wifi的多房间定位方法及服务器 |
CN106557778A (zh) * | 2016-06-17 | 2017-04-05 | 北京市商汤科技开发有限公司 | 通用物体检测方法和装置、数据处理装置和终端设备 |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2661265B1 (fr) * | 1990-04-24 | 1994-07-29 | Thomson Csf | Systeme neuronal de classification et procede de classification utilisant un tel systeme. |
DE4100500A1 (de) * | 1991-01-10 | 1992-07-16 | Bodenseewerk Geraetetech | Signalverarbeitungsanordnung zur klassifizierung von objekten aufgrund der signale von sensoren |
US6823323B2 (en) * | 2001-04-26 | 2004-11-23 | Hewlett-Packard Development Company, L.P. | Automatic classification method and apparatus |
US7587064B2 (en) * | 2004-02-03 | 2009-09-08 | Hrl Laboratories, Llc | Active learning system for object fingerprinting |
US8447100B2 (en) | 2007-10-10 | 2013-05-21 | Samsung Electronics Co., Ltd. | Detecting apparatus of human component and method thereof |
CN101853389A (zh) | 2009-04-01 | 2010-10-06 | 索尼株式会社 | 多类目标的检测装置及检测方法 |
CN101996326A (zh) * | 2009-08-26 | 2011-03-30 | 索尼株式会社 | 多类目标的检测装置及检测方法 |
JP5747014B2 (ja) | 2012-11-05 | 2015-07-08 | 東芝テック株式会社 | 商品認識装置及び商品認識プログラム |
CN104346620B (zh) * | 2013-07-25 | 2017-12-29 | 佳能株式会社 | 对输入图像中的像素分类的方法和装置及图像处理系统 |
WO2015054666A1 (en) * | 2013-10-10 | 2015-04-16 | Board Of Regents, The University Of Texas System | Systems and methods for quantitative analysis of histopathology images using multi-classifier ensemble schemes |
JP2015099550A (ja) | 2013-11-20 | 2015-05-28 | 東芝テック株式会社 | 商品認識装置及び商品認識プログラム |
US9299010B2 (en) * | 2014-06-03 | 2016-03-29 | Raytheon Company | Data fusion analysis for maritime automatic target recognition |
US10410138B2 (en) * | 2015-07-16 | 2019-09-10 | SparkBeyond Ltd. | System and method for automatic generation of features from datasets for use in an automated machine learning process |
US9858496B2 (en) * | 2016-01-20 | 2018-01-02 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
CN106250921A (zh) * | 2016-07-26 | 2016-12-21 | 北京小米移动软件有限公司 | 图片处理方法及装置 |
US20180039853A1 (en) * | 2016-08-02 | 2018-02-08 | Mitsubishi Electric Research Laboratories, Inc. | Object Detection System and Object Detection Method |
US10318827B2 (en) * | 2016-12-19 | 2019-06-11 | Waymo Llc | Object detection neural networks |
CN107229942B (zh) * | 2017-04-16 | 2021-03-30 | 北京工业大学 | 一种基于多个分类器的卷积神经网络分类方法 |
US10546197B2 (en) * | 2017-09-26 | 2020-01-28 | Ambient AI, Inc. | Systems and methods for intelligent and interpretive analysis of video image data using machine learning |
US11030458B2 (en) * | 2018-09-14 | 2021-06-08 | Microsoft Technology Licensing, Llc | Generating synthetic digital assets for a virtual scene including a model of a real-world object |
US20200394458A1 (en) * | 2019-06-17 | 2020-12-17 | Nvidia Corporation | Weakly-supervised object detection using one or more neural networks |
-
2018
- 2018-04-26 CN CN201810389080.2A patent/CN110414541B/zh active Active
-
2019
- 2019-01-03 EP EP19792150.5A patent/EP3786846B1/en active Active
- 2019-01-03 WO PCT/CN2019/070207 patent/WO2019205729A1/zh active Application Filing
- 2019-01-03 US US16/473,007 patent/US11093800B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101406390A (zh) * | 2007-10-10 | 2009-04-15 | 三星电子株式会社 | 检测人体部位和人的方法和设备以及对象检测方法和设备 |
CN102867172A (zh) * | 2012-08-27 | 2013-01-09 | Tcl集团股份有限公司 | 一种人眼定位方法、系统及电子设备 |
CN106557778A (zh) * | 2016-06-17 | 2017-04-05 | 北京市商汤科技开发有限公司 | 通用物体检测方法和装置、数据处理装置和终端设备 |
CN106535134A (zh) * | 2016-11-22 | 2017-03-22 | 上海斐讯数据通信技术有限公司 | 一种基于wifi的多房间定位方法及服务器 |
Also Published As
Publication number | Publication date |
---|---|
EP3786846B1 (en) | 2024-07-24 |
US20200380292A1 (en) | 2020-12-03 |
US11093800B2 (en) | 2021-08-17 |
EP3786846A1 (en) | 2021-03-03 |
CN110414541A (zh) | 2019-11-05 |
CN110414541B (zh) | 2022-09-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019205729A1 (zh) | 用于识别物体的方法、设备和计算机可读存储介质 | |
US10755128B2 (en) | Scene and user-input context aided visual search | |
Peng et al. | Self-paced joint sparse representation for the classification of hyperspectral images | |
Kao et al. | Visual aesthetic quality assessment with a regression model | |
Zhang et al. | Quantifying facial age by posterior of age comparisons | |
US11023806B2 (en) | Learning apparatus, identifying apparatus, learning and identifying system, and recording medium | |
Kae et al. | Augmenting CRFs with Boltzmann machine shape priors for image labeling | |
US9552510B2 (en) | Facial expression capture for character animation | |
WO2020114118A1 (zh) | 面部属性识别方法、装置、存储介质及处理器 | |
WO2018108129A1 (zh) | 用于识别物体类别的方法及装置、电子设备 | |
Guo et al. | A probabilistic fusion approach to human age prediction | |
CN110909618B (zh) | 一种宠物身份的识别方法及装置 | |
JP6029041B2 (ja) | 顔印象度推定方法、装置、及びプログラム | |
Yankelevsky et al. | Structure-aware classification using supervised dictionary learning | |
Demirkus et al. | Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos | |
Wang et al. | Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion | |
Anvar et al. | Multiview face detection and registration requiring minimal manual intervention | |
Zhang et al. | Second-and high-order graph matching for correspondence problems | |
CN111382410B (zh) | 刷脸验证方法及系统 | |
JP2019204505A (ja) | オブジェクト検出装置及び方法及び記憶媒体 | |
CN112749737A (zh) | 图像分类方法及装置、电子设备、存储介质 | |
Sun et al. | Perceptual multi-channel visual feature fusion for scene categorization | |
Kumar et al. | Predictive analytics on gender classification using machine learning | |
Lin et al. | Face recognition for video surveillance with aligned facial landmarks learning | |
Huan et al. | Human action recognition based on HOIRM feature fusion and AP clustering BOW |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19792150 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2019792150 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2019792150 Country of ref document: EP Effective date: 20201126 |