CN110874602A

CN110874602A - Image identification method and device

Info

Publication number: CN110874602A
Application number: CN201811003602.7A
Authority: CN
Inventors: 张修宝; 李剑; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2020-03-10

Abstract

The application provides an image recognition method, wherein the method comprises the following steps: acquiring different images to be detected; respectively extracting the features of different images to be detected to obtain a feature vector corresponding to each image to be detected; combining the characteristic vectors corresponding to different images to be detected respectively to obtain combined characteristic vectors; and inputting the combined feature vectors into a pre-trained verification classification model, and identifying whether different images to be detected are images of the same object to be detected. By adopting the mode, the identification and verification of the image to be detected can be realized without calculating the similarity or comparing the similarity with the preset threshold value, so that the problem that the optimal threshold value needs to be updated by using the test sets corresponding to different application scenes in different application scenes is effectively avoided, the verification classification model has robustness in different application scenes, and the transportability of the verification classification model in different application scenes is improved.

Description

Image identification method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image recognition method and apparatus.

Background

With the continuous development of artificial intelligence technology, the intelligent identification of the object to be detected can be realized based on the image processing technology in many application scenes. For example, in application scenarios such as bank transaction, taxi taking in software, etc., face recognition and the like are required for users.

At present, a face recognition method that is commonly used is to calculate a similarity between an acquired face image and a face image that is stored in advance, and then compare the calculated similarity with a preset threshold value to determine whether the two face images are face images of the same person. In the above-described face recognition mode, one of the factors affecting the accuracy of face recognition is the size of the threshold. In the existing scheme, the image samples in the test set are mainly used for testing and verifying to obtain the optimal threshold value, so that the accuracy of face recognition is high.

However, the inventors have found that the optimal threshold value tends to vary from test set to test set, for example, the optimal threshold value obtained under test set a is 0.39, while the optimal threshold value obtained under test set B may be 0.29. Therefore, for a trained model, an optimal threshold value must be set according to a specific test scenario, and when the test scenario is changed, if the optimal threshold value is not updated, the accuracy of face recognition may be reduced. Therefore, the human face recognition mode has low portability under different application scenes.

Disclosure of Invention

In view of this, an embodiment of the present invention provides an image recognition method and an image recognition apparatus, so as to effectively avoid a problem that an optimal threshold needs to be updated by using test sets corresponding to different application scenarios in different application scenarios.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring different images to be detected;

respectively extracting the features of the different images to be detected to obtain a feature vector corresponding to each image to be detected;

combining the characteristic vectors respectively corresponding to the different images to be detected to obtain combined characteristic vectors;

and inputting the combined feature vectors into a pre-trained verification classification model, and identifying whether the different images to be detected are images of the same object to be detected.

Specifically, it is right respectively the difference is waited to detect the image and is carried out feature extraction, obtains every eigenvector that waits to detect the image correspondence, includes:

and inputting the different images to be detected into a pre-trained feature extraction model for feature extraction, and then obtaining a feature vector corresponding to each image to be detected.

In a possible embodiment, the method further comprises:

acquiring a first sample training set, wherein the first sample training set comprises a plurality of image samples of different objects to be detected;

training to obtain the feature extraction model according to the following modes:

selecting a first preset number of image samples from the first sample training set and inputting the image samples into a feature extraction model to be trained;

for each input image sample, performing feature extraction on each image sample to obtain a feature vector, classifying the feature vector of each image sample, and determining a first classification result for indicating an object to be detected represented by the image sample;

calculating a first loss value of the training process of the round based on a first classification result corresponding to each image sample and a first preset result corresponding to each image sample;

and when the first loss value is greater than a first set value, adjusting model parameters of the feature extraction model to be trained, and performing the next round of training process by using the adjusted feature extraction model to be trained until the calculated first loss value is less than or equal to the first set value, and determining that the training is finished.

Wherein, will different eigenvectors that wait to detect the image difference and correspond respectively merge, obtain the eigenvector after the mergence, include:

respectively corresponding characteristic vectors of the different images to be detected are combined in parallel to obtain combined characteristic vectors; the combined feature vector is provided with a plurality of feature channels, and each feature channel is mapped with the corresponding feature vector of each image to be detected.

Further, after the determined first loss value is less than or equal to the first set value, the method further comprises:

and determining a second sample training set used for training the verification classification model to be trained according to the feature vector of each image sample.

In a possible implementation, the determining a second sample training set according to the feature vector of each image sample includes:

randomly pairing the feature vectors of different image samples to determine a plurality of positive sample pairs and a plurality of negative sample pairs, wherein each positive sample pair comprises the feature vectors of different image samples for representing the same object to be detected, and each negative sample pair comprises the feature vectors of different image samples for representing different objects to be detected; wherein the second training set of samples includes the determined plurality of pairs of positive samples and a plurality of pairs of negative samples.

Specifically, the verification classification model may be obtained by training in the following manner:

selecting a second preset number of positive sample pairs and a third preset number of negative sample pairs from the second sample training set;

inputting the selected positive sample pairs and the selected negative sample pairs into the verification classification model to be trained to obtain second classification results corresponding to each positive sample pair and each negative sample pair respectively, wherein the second classification results are used for indicating whether the feature vectors of different image samples represent the same object to be tested;

calculating a second loss value of the training process of the round based on a second classification result and a second preset result corresponding to each positive sample pair and a second classification result and a third preset result corresponding to each negative sample pair;

and when the second loss value is greater than a second set value, adjusting the model parameters of the verification classification model to be trained, and performing the next round of training process by using the adjusted verification classification model to be trained until the calculated second loss value is less than or equal to the second set value, and determining that the training is finished.

In a possible embodiment, the inputting the selected positive sample pair and the selected negative sample pair into the verification classification model to be trained includes:

combining the feature vectors of different image samples in each positive sample pair to obtain a positive sample pair feature vector corresponding to each positive sample pair, and inputting the positive sample pair feature vector corresponding to each positive sample pair into the verification classification model to be trained; and the number of the first and second groups,

combining the feature vectors of different image samples in each negative sample pair to obtain a negative sample pair feature vector corresponding to each negative sample pair, and inputting the negative sample pair feature vector corresponding to each negative sample pair into the verification classification model to be trained.

For each positive sample pair, combining the feature vectors of different image samples in each positive sample pair in parallel to obtain a positive sample pair feature vector corresponding to each positive sample pair, wherein each positive sample pair feature vector is provided with a plurality of feature channels, and each feature channel is mapped with the feature vector of each pattern sample in the corresponding positive sample pair;

and for each negative sample pair, combining the feature vectors of different image samples in each negative sample pair in parallel to obtain a negative sample pair feature vector corresponding to each negative sample pair, wherein each negative sample pair feature vector is provided with a plurality of feature channels, and each feature channel is mapped with the feature vector of each pattern sample in the corresponding negative sample pair.

In a possible embodiment, the inputting the selected positive sample pairs and negative sample pairs into the verification classification model to be trained to obtain second classification results corresponding to each positive sample pair and each negative sample pair respectively includes:

for each input positive sample pair feature vector, performing feature fusion on each positive sample pair feature vector, classifying the feature vectors of the fused positive samples, and calculating a second classification result corresponding to each positive sample pair; and the number of the first and second groups,

and performing feature fusion on the feature vector of each negative sample pair aiming at the input feature vector of each negative sample pair, classifying the feature vectors of the fused negative samples, and calculating the second classification result corresponding to each negative sample pair.

In one possible design, the feature extraction model includes at least one first convolution layer, at least one first fully-connected layer; the number of the neurons included in the last first full connection layer in the at least one first full connection layer is the same as the number of the types of the objects to be detected; the last but one first full connection layer of the at least one first full connection layer is connected with a feature merging unit, and the feature merging unit is used for merging the feature vectors corresponding to the different images to be detected output by the last but one first full connection layer.

In one possible design, the verification classification model includes at least one second convolutional layer, at least one second fully-connected layer, and a classifier; a first second convolution layer in the at least one second convolution layer is connected with the feature merging unit, and the first second convolution layer is used for performing feature fusion on the merged feature vector; the last second full-connection layer in the at least one second full-connection layer comprises two neurons, and the classifier is used for outputting a classification result of whether the different images to be detected are images of the same object to be detected.

In a second aspect, an embodiment of the present application further provides an image recognition apparatus, including:

the acquisition module is used for acquiring different images to be detected;

the first processing module is used for respectively extracting the characteristics of the different images to be detected to obtain a characteristic vector corresponding to each image to be detected;

the second processing module is used for merging the characteristic vectors respectively corresponding to the different images to be detected to obtain merged characteristic vectors;

and the third processing module is used for inputting the combined feature vectors into a pre-trained verification classification model and identifying whether the different images to be detected are images of the same object to be detected.

In a possible implementation manner, the first processing module is specifically configured to:

In one possible design, the obtaining module is further configured to: acquiring a first sample training set, wherein the first sample training set comprises a plurality of image samples of different objects to be detected;

the device further comprises:

the first model training module is used for training to obtain the feature extraction model according to the following modes:

In a possible implementation manner, the second processing module is specifically configured to:

In one possible design, the apparatus further includes:

and the sample determining module is used for determining a second sample training set used for training the verification classification model to be trained according to the feature vector of each image sample.

The sample determination module is specifically configured to:

randomly pairing the feature vectors of different image samples to determine a plurality of positive sample pairs and a plurality of negative sample pairs, wherein each positive sample pair comprises the feature vectors of different image samples for representing the same object to be detected, and each negative sample pair comprises the feature vectors of different image samples for representing different objects to be detected;

wherein the second training set of samples includes the determined plurality of pairs of positive samples and a plurality of pairs of negative samples.

In one possible design, the apparatus further includes:

the second model training module is used for training to obtain the verification classification model according to the following modes:

In a possible implementation manner, the second model training module, when inputting the selected positive sample pair and the selected negative sample pair into the verification classification model to be trained, is specifically configured to:

In a possible implementation manner, the second model training module, when merging the feature vectors of different image samples in each positive sample pair to obtain a positive sample pair feature vector corresponding to each positive sample pair, is specifically configured to:

combining the feature vectors of different image samples in each positive sample pair in parallel to obtain a positive sample pair feature vector corresponding to each positive sample pair, wherein each positive sample pair feature vector is provided with a plurality of feature channels, and each feature channel is mapped with the feature vector of each pattern sample in the corresponding positive sample pair;

the second model training module is specifically configured to, when merging the feature vectors of different image samples in each negative sample pair to obtain a negative sample pair feature vector corresponding to each negative sample pair:

and combining the feature vectors of different image samples in each negative sample pair in parallel to obtain a negative sample pair feature vector corresponding to each negative sample pair, wherein each negative sample pair feature vector is provided with a plurality of feature channels, and each feature channel is mapped with the feature vector of each pattern sample in the corresponding negative sample pair.

In a possible implementation manner, when the selected positive sample pair and the selected negative sample pair are input into the verification classification model to be trained to obtain the second classification result corresponding to each positive sample pair and each negative sample pair, the second model training module is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the image recognition method according to the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the image recognition method according to the first aspect.

In the embodiment of the application, when image recognition is carried out, different images to be detected can be obtained, the feature vectors corresponding to the different images to be detected are extracted, after the different feature vectors are combined, the combined feature vectors can be input into a pre-trained verification classification model for recognition and verification, and whether the different images to be detected are images of the same object to be detected can be detected. Compared with the prior art, the method provided by the application does not need to calculate the similarity or compare the similarity with the preset threshold, so that the problem that the optimal threshold needs to be updated by using the test sets corresponding to different application scenes in different application scenes is effectively solved, the verification classification model has higher robustness in different application scenes, and the transportability of the verification classification model in different application scenes is improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 illustrates a schematic diagram of a system architecture to which embodiments of the present application are applicable;

fig. 2 is a schematic flowchart illustrating an image recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a possible model structure according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating a training process of a feature extraction model provided by an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a training process of a verification classification model provided by an embodiment of the present application;

FIG. 6 is a specific flowchart of a model training process provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram illustrating an image recognition apparatus according to an embodiment of the present application;

fig. 8 shows a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

First, an application scenario to which the present application is applicable will be described. The method and the device are applicable to scenes such as face recognition, object authenticity verification and the like which need to analyze and recognize multiple images of the object to be detected. For example, in the current scenes such as banking business, taxi taking business and the like, face recognition is often performed to realize identity verification. Specifically, for example, in the taxi taking service, before the taxi taking service is performed by a driver, a client used by the driver may collect a face image of the driver and upload the face image to the service server, and then the service server may compare the collected face image with a face image recorded in advance in a database to detect whether the driver currently performing the taxi taking service is consistent with a driver registered on the taxi taking platform, so as to verify whether the identity of the driver is legal.

The current common face recognition mode is as follows: extracting a feature vector of the acquired face image and a feature vector of a pre-stored face image, further performing Euclidean distance measurement or cosine distance measurement on the extracted feature vector to obtain the similarity between the acquired face image and the pre-stored face image, and then comparing the calculated similarity with a preset threshold value to determine whether the acquired face image and the pre-stored face image are the face image of the same person.

However, it has been found that, in the above manner, the size of the preset threshold affects the accuracy of face recognition, and it is necessary to calculate the optimal threshold by performing testing and verification using the image samples in the test set to ensure the accuracy of face recognition. Moreover, because the image samples in the test set are different in different application scenes, the similarity between multiple images of the same object to be detected can be distinguished, so that the accuracy of face recognition is easily reduced if the used threshold value is not updated for different application scenes. Therefore, the portability of the face recognition mode in different application scenes is not strong, a great deal of effort is needed to obtain test sets in different application scenes for testing and verifying respectively to obtain the optimal threshold values in different application scenes, the operation is complicated, and if the calculated optimal threshold values have deviation, the accuracy rate of face recognition is low easily.

In order to solve the above problems, the present application provides an image recognition method and apparatus, which can solve the problem that the threshold value used in different application scenarios must be updated in the existing manner on the basis of ensuring the accuracy of image recognition.

Before introducing the technical solutions provided in the present application, an exemplary description is given of a system architecture to which the present application is applicable. Fig. 1 is a schematic diagram of a system architecture to which an embodiment of the present invention is applicable, and the system architecture includes a client and a service server. The client may be installed in a terminal device, such as a mobile phone, a tablet computer, a vehicle-mounted terminal, and the like. Specifically, a communication connection can be established between the service server and the client, the client can acquire an image to be detected (for example, a face image of a driver) and upload the image to the service server, and the service server can identify whether the image to be detected acquired by the client and the pre-recorded image to be detected (for example, a face image of an authenticated driver) are images of the same object to be detected through an image processing algorithm, and then send the detection result to the client. In addition, in an example, the system may further include a cloud storage center, the cloud storage center may store a pre-recorded image to be detected, and when the business server identifies the image to be detected collected by the client, the business server may obtain the pre-recorded image to be detected from the cloud storage center, and then perform image analysis processing on the two types of images to be detected, so as to obtain a detection result.

The technical solution provided by the present application is described in detail below with reference to specific embodiments.

Example one

Referring to fig. 2, a schematic flow chart of an image recognition method provided in the embodiment of the present application is shown, including the following steps:

step 201, acquiring different images to be detected.

Here, for different application scenarios, the service server may obtain the image to be detected from different data sources, for example, at least one client that establishes a communication connection with the service server, or a cloud storage center. Taking a taxi taking scene as an example, if the identity of a driver needs to be authenticated, the service server can acquire a face image of a user (namely the driver) of the current terminal equipment from the client, can also acquire a face image of an authenticated user (namely the authenticated driver) which is locally recorded and bound with an account number of the client, and performs subsequent analysis by taking the acquired two types of face images as different images to be detected. Or, the service server may also obtain the image to be detected from other storage devices such as the cloud storage center, which is not limited in the present application,

step 202, respectively extracting the features of different images to be detected to obtain a feature vector corresponding to each image to be detected.

In the embodiment of the application, the pre-trained feature extraction model can be used for carrying out feature extraction on different images to be detected so as to obtain the feature vector corresponding to each image to be detected. In a possible implementation manner, the pre-trained feature extraction model includes at least one first convolution layer and at least one first full-link layer, wherein, for each image to be detected, after the image to be detected is input to the feature extraction model, a plurality of feature maps are obtained by performing convolution operation on the first convolution layer, and further, the plurality of feature maps can be converted into corresponding feature vectors after passing through the first full-link layer, so as to classify the image to be detected. In another possible implementation manner, the pre-trained feature extraction model may further include a first pooling layer, where the first pooling layer functions to perform dimension reduction on the multiple feature maps obtained after the convolution processing, and the like. Configuration parameters in the first convolution layer, the first full-link layer, and the first pooling layer in the feature extraction model, for example, include the size of convolution kernel, the number of convolution kernels, and the number of neurons in the full-link layer, which can be adjusted in the model training process. The specific model training process for the above feature extraction model will be described in detail in the following embodiments, and will not be described here for the time being.

And 203, combining the characteristic vectors respectively corresponding to different images to be detected to obtain combined characteristic vectors.

In specific implementation, after obtaining the feature vectors corresponding to different images to be detected, the obtained feature vectors may be combined, the feature vectors corresponding to different images to be detected are combined into one feature vector, and the combined feature vector includes parameter values and the like for describing the image features of the different images to be detected, so as to be input into a pre-trained verification classification model for analysis processing.

There are various ways to combine the feature vectors corresponding to different images to be detected. In one embodiment, the corresponding positions of the plurality of feature vectors may be subtracted to obtain a merged feature vector, for example, if the feature vectors corresponding to the two images to be detected are both feature vectors of 1 × 1024, the merged feature vector obtained by subtracting the corresponding positions of the two feature vectors is also the feature vector of 1 × 1024. In another embodiment, the feature vectors may be further subjected to a stitching process to obtain a merged feature vector, for example, if the feature vectors corresponding to the two images to be detected are both 1 × 1024 feature vectors, the merged feature vector obtained by stitching the two feature vectors is a 1 × 2048 feature vector. In the above two embodiments, the merging effect achieved by merging a plurality of eigenvectors may be understood as a series connection of a plurality of eigenvectors.

In the embodiment of the application, a plurality of feature vectors can be combined in parallel to synthesize the feature vector comprising a plurality of feature channels, wherein each feature channel is mapped with the feature vector corresponding to each image to be detected, that is, each feature channel in the combined feature vector can reflect the image feature of the corresponding image to be detected. By adopting the method, the aim of inputting a plurality of feature vectors into the verification classification model in parallel can be achieved. For example, the feature vectors corresponding to the two images to be detected are all feature vectors of 1 × 1024, and then the combined feature vector obtained by combining the two feature vectors in parallel is equivalent to a feature vector of 2 × 1024, where 2 represents the number of feature channels of the feature vector.

Compare the mode that above-mentioned carries out a plurality of eigenvectors in series, adopt a plurality of eigenvectors to carry out the parallel combination's mode in this application can obtain the eigenvector of a plurality of characteristic passageways, follow-up can utilize the convolutional layer to carry out the feature fusion to the eigenvector of a plurality of characteristic passageways to treat and detect the relevance between the eigenvector that the image corresponds respectively with the difference better, like this when the different images of waiting to detect are the image of the same object that awaits measuring, can promote image identification's rate of accuracy.

And 204, inputting the combined feature vectors into a pre-trained verification classification model, and identifying whether different images to be detected are images of the same object to be detected.

Specifically, the pre-trained verification classification model includes at least one second convolutional layer, at least one second fully-connected layer, and a classifier. After the combined feature vector passes through the first second convolution layer, the feature fusion can be carried out on the image features of different images to be detected, which are respectively described by a plurality of feature channels in the combined feature vector, and then the convolution processing can be carried out through a plurality of second convolution layers, so that the fused image features can be extracted deeply, and the accuracy of image identification is further improved. Further, the fused image features may be calculated through a second fully-connected layer, where the number of neurons of a last fully-connected layer in at least one second fully-connected layer may be set to two, so as to perform a second classification. The classifier mainly analyzes the characteristic vector value calculated by the second full-connection layer and converts the characteristic vector value into a scalar numerical value, the converted scalar numerical value can also be understood as a score value of different classification results, and the classification result with the highest score value is the correct classification result. The configuration parameters of the second convolutional layer, the second fully-connected layer, and the classifier in the verification classification model may be adjusted in the model training process, and the model training process of the specific verification classification model will be described in detail in the following embodiments, which will not be described here for the time being.

In the embodiment of the application, the finally determined classification results include two types, one type of classification result represents that different images to be detected are images of the same object to be detected, and the other type of classification result represents that different images to be detected are not images of the same object to be detected. For example, in a face recognition scene, different face images can be acquired as different images to be detected, and then whether the different face images are face images of the same person or not can be recognized after the image recognition processing.

In specific implementation, after the classification result is identified, the service server may push the classification result to the client, or may make a decision based on the identified classification result. In an example, in a taxi taking scene, if the face image of the driver collected by the client and the face image of the authenticated driver recorded in advance are recognized to be the face image of the same person, it can be determined that the driver identity authentication passes, and then the taxi taking service is allowed to be performed by the driver, otherwise, if the two face images are recognized not to be the face images of the same person, it can be determined that the driver identity authentication fails, and then the taxi taking service is not allowed to be performed by the driver, and the like.

The above-mentioned mode that this application provided can extract the different eigenvectors that await measuring the image correspondence and merge the eigenvector of difference after obtaining the different image of waiting to measure, and then the eigenvector after will merging inputs to the verification classification model of training in advance, can be in order to realize discerning the different image of waiting to measure whether same object that awaits measuring image. Compared with the prior art, the method has the advantages that the similarity is not required to be calculated, and the comparison with the preset threshold value is also not required, so that the problem that the optimal threshold value needs to be updated by using the test sets corresponding to different application scenes in different application scenes is effectively solved, the verification classification model has higher robustness in different application scenes, and the transportability of the verification classification model in different application scenes is also improved.

Next, a description will be given of a model training process of the feature extraction model and the verification classification model involved in the image recognition process. Before introducing the model training process, first, a brief description is made with reference to the two pairs of feature extraction models and the model structure of the verification classification model in the embodiment, so as to better understand the technical solution provided by the present application.

Example two

Exemplarily, referring to fig. 3, a schematic diagram of a possible model structure recited in an embodiment of the present application is shown, where the feature extraction model includes first convolutional layers 1 to N1, first pooling layers 1 to N2, and first fully-connected layers 1 to N3, where the first convolutional layers and the first pooling layers are connected in sequence. The number of the neurons included in the last first full connection layer in the first full connection layers is the same as the number of the types of the objects to be detected, so that the images to be detected are the images of the objects to be detected. In the model training stage, the feature extraction model may further include a first classifier, and the first classification result output by the first classifier and used for indicating the object to be detected represented by the image to be detected is compared with a first preset result calibrated in advance manually, so as to determine whether the image features extracted by the feature extraction model are accurate, so as to determine whether the feature extraction model converges.

And the last but one first full connection layer in the first full connection layers can also be connected with a feature merging unit, and the feature merging unit is used for merging the feature vectors respectively corresponding to different images to be detected output by the last but one first full connection layer. For the manner of merging the plurality of feature vectors by the feature merging unit, reference may be made to the description in the first embodiment, and details are not described here again.

With continued reference to FIG. 3, the verification classification model includes the second convolutional layers 1 through N4, the second fully-connected layers 1 through N5, and the second classifier. The first convolution layer of the verification classification model may be connected to the feature merging unit, so as to implement feature fusion on the merged feature vector output by the feature merging unit, where the merged feature vector may be a feature vector with multiple feature channels. Further, the second convolutional layers 2 to N4 may further extract the fused image features, thereby improving the accuracy of the subsequent verification classification. Here, the verification classification model may be a two-classification model, that is, the last second fully-connected layer includes two neurons, and accordingly, the second classifier connected to the last second fully-connected layer may output scores for two classification results, where one classification result represents a classification result that different images to be detected are images of the same object to be detected, and the other classification result represents a classification result that different images to be detected are not images of the same object to be detected.

The model training process of the feature extraction model and the verification classification model is further described with reference to the model structure diagram shown in fig. 3.

EXAMPLE III

Referring to fig. 4, a schematic flow chart of a training process of a feature extraction model provided in the embodiment of the present application is shown, including the following steps:

step 401, a first sample training set is obtained, where the first sample training set includes a plurality of image samples of different objects to be detected.

Taking a face recognition scene as an example, the first sample training set may include a plurality of face images of different people, for example, hundreds of thousands of face images of ten thousand people may be taken as image samples, where the number of face images of each person is about 50, so as to ensure that the samples are distributed evenly.

Step 402, selecting a first preset number of image samples from the first sample training set and inputting the image samples into a feature extraction model to be trained.

Specifically, a first preset number of image samples may be sequentially input to the feature extraction model to be trained as image samples in the training process of the round for processing. The first preset number can be configured according to the actual model processing capacity.

And 403, performing feature extraction on each input image sample to obtain a feature vector, classifying the feature vector of each image sample, and determining a first classification result for indicating the object to be detected represented by the image sample.

Specifically, the number of neurons in the last fully-connected layer in the feature extraction model is the same as the total number of types of the image samples. When the feature vectors of each image sample are classified, the scoring values of the first classification results of the image samples representing each object to be detected can be respectively calculated, and then the first classification result with the highest scoring value is used as the final first classification result. For example, taking a face recognition scene as an example, assuming that an image sample selects 500 personal face images, the number of neurons in the last full-connection layer may be 500, when performing classification, scoring values corresponding to 500 classification results may be output, each classification result indicates that a face image is a personal face image, and a first classification result with a highest scoring value may be determined from the scoring values corresponding to the 500 classification results, and is used as a final first classification result.

Step 404, calculating a first loss value of the training process based on the first classification result corresponding to each image sample and the first preset result corresponding to each image sample.

The first preset result corresponding to each image sample can be understood as a theoretical value of the classification result of each image sample, and can be configured manually in advance. Aiming at each image sample in the first preset number of image samples, a first loss value of the training process of the current round can be calculated through a first classification result and a first preset result obtained through comparison, and the first loss value can represent the error condition of the current round when the current round classifies the first preset number of image samples.

And 405, judging whether the first loss value of the training process of the current round is greater than a set value.

If yes, go to step 406; if not, go to step 407.

And step 406, adjusting model parameters of the feature extraction model to be trained, returning to step 402, and performing a next round of training process by using the adjusted feature extraction model to be trained until the calculated first loss value is less than or equal to the first set value.

And 407, determining that the training of the feature extraction model is finished.

After the training of the feature extraction model is completed, the feature vector of each image sample extracted from the first sample training set by the feature extraction model can be subsequently utilized to determine a second sample training set for training the verification classification model to be trained, and the verification classification model is trained. See the description in example four for details.

Example four

Referring to fig. 5, a schematic diagram of a training process of a verification classification model according to an embodiment of the present application is shown. First, a second sample training set may be determined according to the feature vector of each image sample extracted by the feature extraction module. The second training set of samples may include a plurality of pairs of positive samples, and a plurality of pairs of negative samples. In one possible embodiment, a plurality of positive sample pairs and a plurality of negative sample pairs may be determined by randomly pairing feature vectors of different image samples. Each positive sample pair comprises a feature vector used for representing different image samples of the same object to be detected, and each negative sample pair comprises a feature vector used for representing different image samples of different objects to be detected. For example, taking a face recognition scene as an example, assuming that a first sample training set includes 50 ten thousand face image samples of 1 ten thousand people, feature vectors corresponding to the 50 ten thousand face image samples are obtained through a feature extraction model, and every two random pairing is performed on the feature vectors corresponding to the 50 ten thousand face image samples, so that 50 × 50 ten thousand pairs of feature vectors can be formed. Two feature vectors corresponding to image samples representing the same person are used as a positive sample pair, and two feature vectors corresponding to image samples representing different persons are used as a negative sample pair.

Further, the verification classification model can be trained by utilizing a second sample training set, and the verification classification model is finally obtained through model training.

Referring to fig. 6, a specific flowchart of a model training process provided in the embodiment of the present application is schematically illustrated, and includes the following steps:

step 601, selecting a second preset number of positive sample pairs and a third preset number of negative sample pairs from the second sample training set.

Specifically, a strategy of randomly selecting the positive sample pairs and the negative sample pairs without repeating may be adopted to select the positive sample pairs and the negative sample pairs from the second sample training set, so as to traverse as many sample pairs in the second sample training set as possible. The selected positive sample pairs with the second preset number and the negative sample pairs with the third preset number can be used as training samples in the training process of the current round and are processed in the verification classification model to be trained. The second preset number and the third preset number can be configured according to the actual model processing capacity. The second predetermined number and the third predetermined number may be the same or different. In one example, 128 positive sample pairs and 128 negative sample pairs may be selected during each round of training.

In the embodiment of the present application, before inputting the selected positive sample pair and the selected negative sample pair into the verification classification model, the positive sample pair and the negative sample pair may be preprocessed. Specifically, the feature vectors of different image samples in each positive sample pair may be merged to obtain a positive sample pair feature vector corresponding to each positive sample pair, and the positive sample pair feature vector corresponding to each positive sample pair is input into the verification classification model to be trained. And combining the feature vectors of different image samples in each negative sample pair to obtain a negative sample pair feature vector corresponding to each negative sample pair, and inputting the negative sample pair feature vector corresponding to each negative sample pair into the verification classification model to be trained.

In a possible implementation manner, for each positive sample pair, the feature vectors of different image samples in each positive sample pair may be combined in parallel to obtain a positive sample pair feature vector corresponding to each positive sample pair, each positive sample pair feature vector has a plurality of feature channels, and each feature channel is mapped to the feature vector of each pattern sample in the corresponding positive sample pair. For each negative sample pair, the feature vectors of different image samples in each negative sample pair may be combined in parallel to obtain a negative sample pair feature vector corresponding to each negative sample pair, each negative sample pair feature vector has a plurality of feature channels, and each feature channel is mapped to the feature vector of each pattern sample in the corresponding negative sample pair. For specific description of the merged feature vector, reference may be made to the contents mentioned in the first embodiment, and details are not repeated here.

Step 602, inputting the selected positive sample pairs and negative sample pairs into a verification classification model to be trained, and obtaining second classification results corresponding to each positive sample pair and each negative sample pair respectively, where the second classification results are used to indicate whether feature vectors of different image samples represent the same object to be tested.

In specific implementation, for each input positive sample pair feature vector, feature fusion may be performed on the feature vector of each positive sample pair, the feature vector of the fused positive sample pair is classified, and a second classification result corresponding to each positive sample pair is calculated. And performing feature fusion on the feature vector of each negative sample pair aiming at the feature vector of each input negative sample pair, classifying the feature vector of the fused negative sample pair, and calculating a second classification result corresponding to each negative sample pair.

The second classification result is used for indicating that the feature vectors of different image samples represent the same object, for example, the feature vectors of different image samples represent the face of the same person, and the second classification result is used for indicating that the feature vectors of different image samples represent different objects, for example, the feature vectors of different image samples represent the faces of different persons. Specifically, a score value corresponding to each second classification result may be calculated, and then the second classification result with the highest score value may be used as the final second classification result.

Step 603, calculating a second loss value of the training process of the current round based on the second classification result and the second preset result corresponding to each positive sample pair and the second classification result and the third preset result corresponding to each negative sample pair.

In this embodiment of the application, the second preset result and the third preset result are theoretical values that are manually preconfigured, for example, the theoretical value of the positive sample pair may be set to 1 to indicate that feature vectors of different image samples represent the same object to be detected, and the theoretical value of the negative sample pair may be set to 0 to indicate that feature vectors of different image samples represent different objects to be detected. And comparing the second classification result with the second preset result, and comparing the third classification result with the third preset result to calculate a second loss value of the current training process, wherein the second loss value can represent the error condition of the current training process when identifying whether different images to be detected are images of the same object to be detected.

And step 604, judging whether the second loss value of the training process of the current round is greater than a second set value.

If yes, go to step 605; if not, go to step 606.

And 605, adjusting model parameters of the verification classification model to be trained, returning to 601, and performing the next round of training process by using the adjusted verification classification model to be trained until the calculated second loss value is less than or equal to a second set value.

And step 606, determining that the training of the verification classification model is finished.

After the feature extraction model and the verification classification model are trained, the two trained models can be used for identifying different images to be detected and outputting a verification result of whether the different images to be detected represent the same object to be detected.

Based on the same technical concept, an image recognition device corresponding to the image recognition method is further provided in the embodiment of the present application, and as the principle of solving the problem of the device in the embodiment of the present application is similar to that of the image recognition method in the embodiment of the present application, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

EXAMPLE five

Referring to fig. 7, a schematic structural diagram of an image recognition apparatus provided in an embodiment of the present application is shown, where the apparatus 70 includes:

an obtaining module 71, configured to obtain different images to be detected;

the first processing module 72 is configured to perform feature extraction on the different images to be detected respectively to obtain a feature vector corresponding to each image to be detected;

the second processing module 73 is configured to combine the feature vectors corresponding to the different images to be detected, respectively, to obtain combined feature vectors;

and a third processing module 74, configured to input the combined feature vector into a pre-trained verification classification model, and identify whether the different images to be detected are images of the same object to be detected.

The above-mentioned device that provides in the embodiment of this application, when carrying out image recognition, can acquire the difference and wait to detect the image and draw the different eigenvectors that wait to detect the image and correspond, after merging different eigenvectors, can input the eigenvector after merging into the verification classification model of training in advance and carry out the identification verification, can be in order to detect out the different image of waiting to detect whether be the image of same object that awaits measuring. Compared with the prior art, the method provided by the application does not need to calculate the similarity or compare the similarity with the preset threshold, so that the problem that the optimal threshold needs to be updated by using the test sets corresponding to different application scenes in different application scenes is effectively solved, the verification classification model has higher robustness in different application scenes, and the transportability of the verification classification model in different application scenes is improved.

In a possible implementation manner, the first processing module 72 is specifically configured to:

In one possible design, the obtaining module 71 is further configured to: acquiring a first sample training set, wherein the first sample training set comprises a plurality of image samples of different objects to be detected;

the device further comprises:

a first model training module 75, configured to train the feature extraction model according to the following manner:

In a possible implementation manner, the second processing module 73 is specifically configured to:

In one possible design, the apparatus further includes:

and a sample determining module 76, configured to determine, according to the feature vector of each image sample, a second sample training set used for training the verification classification model to be trained.

The sample determining module 76 is specifically configured to:

In one possible design, the apparatus further includes:

a second model training module 77, configured to train the verification classification model according to the following manner:

In a possible implementation manner, the second model training module 77, when inputting the selected positive sample pair and the selected negative sample pair into the verification classification model to be trained, is specifically configured to:

In a possible implementation manner, when the feature vectors of different image samples in each positive sample pair are combined to obtain the positive sample pair feature vector corresponding to each positive sample pair, the second model training module 77 is specifically configured to:

the second model training module 77 is specifically configured to, when merging the feature vectors of different image samples in each negative sample pair to obtain a negative sample pair feature vector corresponding to each negative sample pair:

In a possible implementation manner, the second model training module 77, when inputting the selected positive sample pairs and negative sample pairs into the verification classification model to be trained to obtain second classification results corresponding to each positive sample pair and each negative sample pair, is specifically configured to:

In this embodiment, specific functions and interaction manners of the obtaining module 71, the first processing module 72, the second processing module 73, the third processing module 74, the first model training module 75, the sample determining module 76, and the second model training module 77 may refer to the descriptions of the embodiments corresponding to fig. 1 to 6, and are not described herein again.

Based on the same technical concept, the embodiment of the application also provides the computer equipment. Referring to fig. 8, a schematic structural diagram of a computer device 80 provided in the embodiment of the present application includes a processor 81, a memory 82, and a bus 83. The memory 82 is used for storing execution instructions, and includes a memory 821 and an external memory 822; the memory 821 herein is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 81 and the data exchanged with the external memory 822 such as a hard disk, the processor 81 exchanges data with the external memory 822 through the memory 821, and when the computer device 80 operates, the processor 81 communicates with the memory 82 through the bus 83, so that the processor 81 executes the following instructions in the user mode:

acquiring different images to be detected; respectively extracting the features of the different images to be detected to obtain a feature vector corresponding to each image to be detected; combining the characteristic vectors respectively corresponding to the different images to be detected to obtain combined characteristic vectors; and inputting the combined feature vectors into a pre-trained verification classification model, and identifying whether the different images to be detected are images of the same object to be detected.

The specific processing flow of the processor 81 may refer to the steps in the image recognition method described in fig. 1 to 6, and is not described herein again.

Based on the same technical concept, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the image recognition method.

Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the image recognition method can be executed, so as to effectively avoid a problem that the optimal threshold needs to be updated by using test sets corresponding to different application scenes in different application scenes.

Based on the same technical concept, embodiments of the present application further provide a computer program product, which includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the image recognition method, and specific implementation may refer to the above method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image recognition method, comprising:

acquiring different images to be detected;

2. The method of claim 1, wherein the performing feature extraction on the different images to be detected respectively to obtain a feature vector corresponding to each image to be detected comprises:

3. The method of claim 2, wherein the method further comprises:

4. The method according to any one of claims 1 to 3, wherein the merging the feature vectors respectively corresponding to the different images to be detected to obtain merged feature vectors comprises:

5. The method of claim 3, wherein after determining that the first loss value is less than or equal to the first set point, the method further comprises:

6. The method of claim 5, wherein determining a second training set of samples from the feature vector for each image sample comprises:

7. The method of claim 6, wherein the verification classification model is trained according to:

8. The method of claim 7, wherein inputting the selected pairs of positive and negative examples into the verification classification model to be trained comprises:

9. The method of claim 8, wherein the combining the feature vectors of the different image samples in each positive sample pair to obtain a positive sample pair feature vector for each positive sample pair comprises:

the merging the feature vectors of the different image samples in each negative sample pair to obtain the feature vector of the negative sample pair corresponding to each negative sample pair includes:

10. The method according to any one of claims 8 to 9, wherein the inputting the selected positive sample pairs and negative sample pairs into the verification classification model to be trained to obtain second classification results corresponding to each positive sample pair and each negative sample pair respectively comprises:

11. A method according to claim 2 or 3, wherein the feature extraction model comprises at least one first convolution layer, at least one first fully-connected layer;

the number of the neurons included in the last first full connection layer in the at least one first full connection layer is the same as the number of the types of the objects to be detected;

the last but one first full connection layer of the at least one first full connection layer is connected with a feature merging unit, and the feature merging unit is used for merging the feature vectors corresponding to the different images to be detected output by the last but one first full connection layer.

12. The method of claim 1 or 7, wherein the verification classification model comprises at least one second convolutional layer, at least one second fully-connected layer, and a classifier;

a first second convolution layer in the at least one second convolution layer is connected with the feature merging unit, and the first second convolution layer is used for performing feature fusion on the merged feature vector;

the last second full-connection layer in the at least one second full-connection layer comprises two neurons, and the classifier is used for outputting a classification result of whether the different images to be detected are images of the same object to be detected.

13. An image recognition apparatus, comprising:

the acquisition module is used for acquiring different images to be detected;

14. The apparatus of claim 13, wherein the first processing module is specifically configured to:

15. The apparatus of claim 14, wherein the obtaining module is further configured to:

the device further comprises:

16. The apparatus according to any one of claims 13 to 15, wherein the second processing module is specifically configured to:

17. The apparatus of claim 15, wherein the apparatus further comprises:

18. The apparatus of claim 17, wherein the sample determination module is specifically configured to:

19. The apparatus of claim 18, wherein the apparatus further comprises:

20. The apparatus of claim 19, wherein the second model training module, when inputting the selected pairs of positive and negative examples into the verification classification model to be trained, is specifically configured to:

21. The apparatus of claim 20, wherein the second model training module, when merging the feature vectors of the different image samples in each positive sample pair to obtain the positive sample pair feature vector corresponding to each positive sample pair, is specifically configured to:

22. The apparatus according to any one of claims 20 to 21, wherein the second model training module, when inputting the selected positive sample pairs and negative sample pairs into the verification classification model to be trained and obtaining the second classification result corresponding to each positive sample pair and each negative sample pair, is specifically configured to:

23. The apparatus of claim 14 or 15, wherein the feature extraction model comprises at least one first convolutional layer, at least one first fully-connected layer;

24. The apparatus of claim 13 or 19, wherein the verification classification model comprises at least one second convolutional layer, at least one second fully-connected layer, and a classifier;

25. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the image recognition method according to any one of claims 1 to 12.

26. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the image recognition method according to one of claims 1 to 12.