WO2020155939A1

WO2020155939A1 - Image recognition method and device, storage medium and processor

Info

Publication number: WO2020155939A1
Application number: PCT/CN2019/127817
Authority: WO
Inventors: 张玉兵
Original assignee: 广州视源电子科技股份有限公司
Priority date: 2019-01-31
Filing date: 2019-12-24
Publication date: 2020-08-06
Also published as: CN109766872A; CN109766872B

Abstract

Disclosed are an image recognition method and device, a storage medium, and a processor. The method comprises: acquiring an image to be recognized; acquiring a pre-established image recognition model, wherein the image recognition model is obtained by training an initial model by means of a plurality of training sets, the initial model is a recognition model established based on a branch training algorithm, and the same training set is extracted from the same data set, and different training sets are extracted from different data sets; and recognizing an image to be recognized by means of an image recognition model to obtain a recognition result.

Description

Image recognition method, device, storage medium and processor

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with an application number of 201910101257.9 on January 31, 2019. The entire content of this application is incorporated into this application by reference.

Technical field

This application relates to the field of image recognition, for example, to an image recognition method, device, storage medium, and processor.

Background technique

In the field of image recognition, especially in the mainstream face recognition field, recognition is performed through image recognition models. Image recognition models are all trained based on deep learning algorithm models. The quality of deep learning model training has an impact on recognition accuracy. Important. In the entire deep learning model training process, the data set used for training is the top priority, which will have a decisive impact on the final algorithm performance of the deep learning model.

Deep learning models are basically performed on a single training data set. For example, in the field of face recognition, the training data set can be face data collected in a certain scene or a public face database downloaded from the Internet. Because different data sets may cover the same person, and because the naming rules between different data sets are not uniform, it is difficult to merge face pictures of the same person according to their file names. In the face recognition classification training, the face pictures of the same person must be required to share the same label category number, so it is impossible to use multiple face data sets that may have intersections at the same time. A deep learning model trained only on a single training data set has low accuracy in image recognition and cannot meet the needs of different applications.

For the problem of low recognition accuracy of image recognition methods in related technologies, no effective solutions have been proposed yet.

Summary of the invention

The embodiments of the present application provide an image recognition method, device, storage medium, and processor to at least solve the problem of low recognition accuracy of the image recognition method in the related art.

According to one aspect of the embodiments of the present application, an image recognition method is provided, which includes: obtaining an image to be recognized; obtaining a pre-established image recognition model, wherein the image recognition model is obtained by training the initial model through multiple training sets Yes, the initial model is a recognition model based on the branch training algorithm, the same training set is extracted from the same data set, and different training sets are extracted from different data sets; the image recognition model is used to recognize the image to be recognized, Get the recognition result.

In an embodiment, the above method further includes: acquiring multiple data sets; classifying each image in the multiple data sets to obtain a label of each image, wherein the label is used to represent the classification result of each image, and The labels of at least two images contained in each data set are the same; sample images are extracted from each data set after classification to obtain multiple training sets.

In an embodiment, before extracting sample images from each classified data set to obtain multiple training sets, the above method further includes: extracting preset features of each image in each classified data set; The preset features of images are aligned for each image; sample images are extracted from each data set after the operation to obtain multiple training sets.

In an embodiment, when each image is a face image, the preset feature includes at least one of the following: eyes, eyebrows, nose tip, and mouth corners.

In an embodiment, extracting sample images from each data set after the operation to obtain multiple training sets includes: randomly extracting sample images from each data set after the operation; obtaining the storage path and labels of the sample images to obtain multiple training sets. Training sets.

In one embodiment, acquiring multiple data sets includes: acquiring a video image and a preset data set collected by an acquisition device; and detecting the video image and the preset data set to obtain multiple data sets.

In one embodiment, the above method further includes: establishing an initial model based on a branch training algorithm, where the initial model at least includes: multiple loss functions, multiple loss functions corresponding to multiple training sets one-to-one; multiple training sets Set parallel input into the initial model to train the initial model; determine whether the trained model meets the preset conditions; if the trained model meets the preset conditions, the trained model is determined to be an image recognition model.

In one embodiment, inputting multiple training sets into the initial model in parallel to train the initial model includes: inputting multiple training sets into the initial model in parallel to obtain function values of multiple loss functions; according to the multiple loss functions The function value of and the chain derivation algorithm to obtain the gradient value of each parameter in the initial model; the gradient value of each parameter is updated according to the stochastic gradient descent algorithm to obtain the trained model.

In one embodiment, judging whether the model obtained by training satisfies a preset condition includes: obtaining a verification set; verifying the model obtained by using the verification set to obtain the accuracy of the trained model; judging the accuracy of the trained model Whether the historical accuracy is the same, where the historical accuracy is the accuracy obtained by the trained model in the last verification process; if the accuracy of the trained model is the same as the historical accuracy, it is determined that the trained model meets the preset conditions.

In an embodiment, if the accuracy of the trained model is different from the historical accuracy, the accuracy of the trained model is determined to be the historical accuracy, and the initial model is continuously trained.

In one embodiment, accuracy is used to characterize the ratio of the sum of the verification results of all verification samples in the verification set to the total number of all verification samples.

In an embodiment, obtaining the verification set includes: obtaining images other than the sample images in the multiple data sets; randomly extracting image verification pairs from other images to obtain the verification set.

In an embodiment, the image verification pair includes: a positive sample pair and a negative sample pair, the positive sample pair includes two images with the same label, and the negative sample pair includes two images with different labels.

In an embodiment, the loss function is a square loss function.

According to another aspect of the embodiments of the present application, there is also provided an image recognition device, including: a first acquisition module for acquiring an image to be recognized; a second acquisition module for acquiring a pre-established image recognition model, wherein The image recognition model is obtained by training the initial model through multiple training sets. The initial model is a recognition model established based on the branch training algorithm. The same training set is extracted from the same data set, and different training sets are derived from different It is extracted from the data set; the recognition module is used to recognize the image to be recognized by using the image recognition model to obtain the recognition result.

According to another aspect of the embodiments of the present application, a storage medium is also provided, the storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the above-mentioned image recognition method when the program runs.

According to another aspect of the embodiments of the present application, a processor is also provided, which is configured to run a program, wherein the image recognition method described above is executed when the program is running.

In the embodiment of this application, an initial model can be established based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model. The image recognition model is used to perform the image recognition input by the user. Recognize, get the final recognition result. Compared with related technologies, the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies. The problem of low recognition accuracy in image recognition methods.

Description of the drawings

Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present application;

Fig. 2 is a schematic diagram of an optional face picture according to an embodiment of the present application;

Fig. 3 is a schematic diagram of an optional aligned face picture according to an embodiment of the present application;

4 is a schematic diagram of an optional face recognition deep neural network model based on a single data set input according to an embodiment of the present application;

Fig. 5 is a schematic diagram of an optional deep neural network model for face recognition based on input of multiple data sets according to an embodiment of the present application;

Fig. 6 is a flowchart of an optional image recognition method according to an embodiment of the present application; and

Fig. 7 is a schematic diagram of an image recognition device according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.

The terms "first" and "second" in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. The data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

Example 1

According to an embodiment of the present application, an embodiment of an image recognition method is provided. The steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although shown in the flowchart The logical order is shown, but in some cases, the steps shown or described can be performed in a different order than here.

Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present application. As shown in Fig. 1, the method includes the following steps:

Step S102: Obtain an image to be recognized.

In an embodiment, the above-mentioned image to be recognized may be an image that needs to be recognized. In the embodiment of the present application, a face image is taken as an example for description.

Step S104: Obtain a pre-established image recognition model. The image recognition model is obtained by training the initial model through multiple training sets. The initial model is a recognition model established based on the branch training algorithm. The same training set is from the same Extracted from one data set, and different training sets are extracted from different data sets.

In one embodiment, in order to improve the accuracy of image recognition, multiple training sets may be constructed through multiple different data sets in advance, and the initial model may be trained through the training sets, so as to obtain the final image recognition model.

In the field of face recognition, because different data sets may contain the same face pictures of people, and users cannot determine which people are the same in different data sets, different data sets cannot be simply and directly merged into one Single data set. The branch training method can be combined to build a deep neural network model to obtain the initial model. By separating different data sets for branch training, a trained image recognition model can be obtained, and the trained image recognition model can be deployed to application scenarios.

Step S106, using the image recognition model to recognize the image to be recognized, and obtain the recognition result.

In one embodiment, in the field of face recognition, the face recognition process can be performed by comparing the facial feature feat-ID (using Euclidean distance).

In the above-mentioned embodiment of this application, an initial model can be established based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model. The image recognition model is used to perform the image recognition input by the user. Recognize, get the final recognition result. Compared with related technologies, the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies. The technical problem of the low recognition accuracy of the image recognition method.

Optionally, in the foregoing embodiment of the present application, the method further includes: acquiring multiple data sets; classifying each image in the multiple data sets to obtain a label of each image, wherein the label is used to characterize each image As a result of the classification, the labels of at least two images contained in multiple data sets are the same; sample images are extracted from each data set after classification to obtain multiple training sets.

In one embodiment, in the field of face recognition, in order to construct multiple training sets, face pictures in different application scenarios may be obtained in advance to obtain multiple data sets. Since public face data sets downloaded from the Internet are generally already labeled, for unlabeled data sets, face images can be manually detected and extracted, classified and labeled, and face images belonging to the same person are placed Put them together and label them, and get a label for each photo. Suppose the total number of people is N, and each person has M face pictures. A certain number of face images can be randomly selected from each data set that has been labeled to obtain each training set.

Optionally, in the above-mentioned embodiment of the present application, before extracting sample images from each classified data set to obtain multiple training sets, the method further includes: extracting a preview of each image in each classified data set. Set features; based on the preset features of each image, perform alignment operations on each image; extract sample images from each data set after the operation to obtain multiple training sets.

Optionally, when each image is a face image, the preset feature includes at least one of the following: eyes, eyebrows, nose tip, and mouth corners.

In one embodiment, in the field of face recognition, the angle of the face and the position of the face in the face picture are inconsistent. In order to ensure the extraction of stable features and achieve a better face recognition effect, it is necessary to The image is aligned to remove the influence of face angle on face recognition. The key points include the positions of the eyes, nose tip, and mouth corners, as shown in Figure 2. The aligned face is shown in Figure 3.

Optionally, in the foregoing embodiment of the present application, extracting sample images from each data set after the operation to obtain multiple training sets includes: randomly extracting sample images from each data set after the operation; and obtaining the storage path of the sample images And labels to get multiple training sets.

In one embodiment, a face image containing both face identity information and verification information may be randomly selected from face images that have been annotated and face aligned to obtain sample images. Each training sample extracted is as follows: Face picture img_1, identity information (category number) of img_1, ..., face picture img_N, identity information (category number) of img_N.

Among them, the face picture img_1 refers to the storage path of the first face picture, the category number refers to the pre-labeled label for the person, and the category number generally starts from 0. Different labels represent the numerical codes for different people in the same data set. For example, if there are 100 people in the first data set, the category numbers are 1-0, 1-1, 1-2,..., 1-99; the second data set or scene covers 50 people, then the category numbers are respectively It is 2-0, 2-1, 2-2,..., 2-49. The two groups of category numbers are not the same, they come from different data sets.

Optionally, in the foregoing embodiment of the present application, acquiring multiple data sets includes: acquiring a video image and a preset data set collected by an acquisition device; and detecting the video image and the preset data set to obtain multiple data sets.

In one embodiment, in the field of face recognition, the collection device can be a camera installed in different application scenarios. The camera is used to collect video pictures and stored in a computer system through network transmission and data lines. The application scenario can be engineering Use scenarios corresponding to the project, such as bank remote teller machine (Video Teller Machine, VTM) verification, jewelry store VIP identification, etc. The aforementioned preset data set may be a public face data set downloaded from the Internet.

The face data sets obtained by the above methods may cover the same people. For example, the photos of customers captured by cameras in banks and jewelry stores may also appear on the Internet and be sorted into public face data sets. Moreover, the public face data sets A and B on the Internet may also contain face pictures of the same person.

For the video pictures collected by the camera, face detection is performed on the collected video pictures, and the face pictures are extracted and stored in the hard disk of the computer system.

Optionally, in the foregoing embodiment of the present application, the method further includes: establishing an initial model based on a branch training algorithm, where the initial model at least includes: multiple loss functions, and multiple loss functions have a one-to-one correspondence with multiple training sets ; Input multiple training sets into the initial model in parallel to train the initial model; determine whether the trained model meets the preset conditions; if the trained model meets the preset conditions, determine that the trained model is an image recognition model.

In one embodiment, only one Softmax loss loss function is used as the target for training in the related image recognition model. As shown in Figure 4, the image recognition model based on a single data set input shown in Figure 4 contains only one classification loss function. , Loss = SoftmaxLoss 1.

Different data sets can be divided for branch training and input into the same image recognition model in parallel. After the aligned face images in the i-th data set are forward-propagated to obtain features, they are connected to the corresponding loss function SoftmaxLoss i, Optimize as an independent objective function. As shown in Figure 5, when the face picture in the i-th face data set is input to the initial model for branch training, the corresponding loss function is Loss=SoftmaxLoss i.

In an embodiment, the image recognition model shown in FIG. 4 and FIG. 5 shows a schematic diagram of a simplified general residual network.

Optionally, the loss function is a square loss function.

In one embodiment, in the field of face recognition, in order to use Euclidean distance to perform the face recognition process, multiple loss functions in the initial model may be square loss functions.

In an embodiment, the aforementioned preset condition may be a training end judgment condition. When the model obtained by training satisfies the preset condition, it is determined that the training ends, and the final trained model is a trained image recognition model.

Optionally, in the foregoing embodiment of the present application, inputting multiple training sets into the initial model in parallel to train the initial model includes: inputting multiple training sets into the initial model in parallel to obtain function values of multiple loss functions; According to the function values of multiple loss functions and the chain derivation algorithm, the gradient value of each parameter in the initial model is obtained; the gradient value of each parameter is updated according to the stochastic gradient descent algorithm to obtain the trained model.

In one embodiment, after multiple training sets are input into the initial model in parallel, the function value Loss of the loss function can be obtained through branch training, and the image recognition model shown in Figure 5 can be obtained according to Loss and the chain derivation algorithm. The gradient value of each parameter updates the model parameters according to the stochastic gradient descent algorithm to obtain a trained model. After the trained model meets the training end judgment condition, the trained model can be determined as the final image recognition model.

Optionally, in the foregoing embodiment of the present application, judging whether the model obtained by training satisfies preset conditions includes: obtaining a verification set; verifying the model obtained by using the verification set to obtain the accuracy of the trained model; Whether the accuracy of the model is the same as the historical accuracy, where the historical accuracy is the accuracy of the trained model in the last verification process; if the accuracy of the trained model is the same as the historical accuracy, it is determined that the trained model meets the preset condition.

In one embodiment, during the training process of the image recognition model, the currently trained model can be tested on the validation set every fixed number of iterations. As the model is trained, the trained model will be tested on the validation set. The accuracy will continue to improve, but as the model continues to be trained, when the model tends to converge or overfitting occurs, the accuracy of the model on the validation set will no longer increase steadily, indicating that the model training can be stopped.

Optionally, accuracy is used to characterize the ratio of the sum of the verification results of all verification samples in the verification set to the total number of all verification samples.

In one embodiment, in the field of face recognition, the verification set is composed of a verification pair of randomly selected face pictures. According to the rules of the international standard face verification test set LFW, the number of face image verification pairs in the verification set is 6000 pairs. For a verification set containing 6000 pairs of face images, the test accuracy can be defined as:

Among them, x _{i is} used to characterize the verification result of the i-th face image verification pair. If the recognition result of the model is the same as the actual label of the face image verification pair, it is determined that the verification is correct, that is, x _i =1; if the recognition result of the model is different from the actual label of the face image verification pair, the verification error is determined. That is, x _i =0.

In an embodiment, the aforementioned historical accuracy may be the accuracy of the trained model obtained when the trained model was verified last time. If during this verification process, the accuracy of the trained model is the same as the historical accuracy, that is, the accuracy of the trained model is no longer steadily improving, the training can be determined to end, and the trained model will be used as the final image recognition model.

Optionally, in the foregoing embodiment of the present application, if the accuracy of the trained model is different from the historical accuracy, the accuracy of the trained model is determined to be the historical accuracy, and the initial model is continuously trained.

In an optional solution, if the accuracy of the trained model is different from the historical accuracy, that is, the trained model does not meet the preset conditions, it is determined that the training has not ended and the training needs to be continued. As the historical accuracy during the next model verification. It is judged again whether the accuracy of the trained model is the same as the historical accuracy, so as to determine whether the trained model meets the preset conditions.

Optionally, in the foregoing embodiment of the present application, obtaining the verification set includes: obtaining images other than the sample images in the multiple data sets; and randomly extracting image verification pairs from other images to obtain the verification set.

Optionally, the image verification pair includes: a positive sample pair and a negative sample pair, the positive sample pair includes two images with the same label, and the negative sample pair includes two images with different labels.

In an embodiment, in the field of face recognition, assuming that there are face pictures of K individuals used for the preparation of the training set, the face pictures of the remaining N-K individuals can be used for the preparation of the verification set. The verification set consists of randomly selected face photo verification pairs. Positive sample pairs and negative sample pairs are drawn. The number of positive and negative sample pairs is the same. For a verification set containing 6000 pairs of face image verification pairs, each positive and negative sample pair Take 3000 pairs. Among them, the positive sample pair is the ath picture of the nth person, and the bth picture of the nth person; the negative sample pair is the cth picture of the i-th person, and the dth picture of the jth person. When the image recognition model judges the two face pictures in the positive sample pair as the same person, the verification result can be determined to be correct; when the image recognition model judges the two face pictures in the negative sample pair as not the same person, it can confirm the verification The result is correct; otherwise, the verification result is wrong.

Fig. 6 is a flowchart of an optional image recognition method according to an embodiment of the present application, taking the field of face recognition as an example for description. As shown in Fig. 6, the method includes: collecting face pictures in multiple scenes ; Perform face detection on the collected face pictures, extract the face pictures and store them in the computer hard disk; manually classify and label the detected and extracted face pictures, and place the face pictures belonging to the same person Mark them together and mark them together; perform key point alignment operations on face images to remove the impact of face angles on face recognition; randomly select photos that have been marked and aligned to contain face identity information and verification The face image pair of the information is trained, that is, the face identity-verification training set is extracted; combined with the branch training algorithm to build a face recognition deep neural network model, the model contains multiple loss functions; face recognition based on multiple data sets The deep neural network model is trained to obtain a trained network model; to determine whether the test accuracy of the trained network model on the verification set is continuously improving, that is, to determine whether the training end condition is reached; if it is not met, continue model training; If it is satisfied, the face recognition algorithm network model and model parameters are obtained; the trained face recognition algorithm network model is deployed to the application scenario, and face recognition can be performed by comparing the facial features feat-ID (using Euclidean distance) Process.

The solution provided by the above embodiments can be used in the bank VIP recognition project to collect face pictures in real application scenarios, and at the same time download some public face data sets from the Internet; detect the face pictures in these data sets Align the operation, and make the corresponding face identity-verification training set; use the method described above to train the face recognition algorithm model, so as to obtain the face recognition algorithm with high recognition rate and recognition effect in the bank VIP recognition scene, This method can better combine the face data information in multiple data sets, so as to obtain a face recognition model with better recognition effect. The branch training facial deep neural network model that combines multiple data sets has a higher accuracy than the general deep learning network based on a single data set training (including successive fine-tuning on multiple data sets).

Example 2

According to an embodiment of the present application, an embodiment of an image recognition device is provided.

Fig. 7 is a schematic diagram of an image recognition device according to an embodiment of the present application. As shown in Fig. 7, the device includes:

The first obtaining module 72 is used to obtain the image to be recognized.

The second acquisition module 74 is used to acquire a pre-built image recognition model. The image recognition model is obtained by training the initial model through multiple training sets. The initial model is a recognition model established based on the branch training algorithm. The training set is extracted from the same data set, and different training sets are extracted from different data sets.

The recognition module 76 is configured to recognize the image to be recognized by using the image recognition model to obtain the recognition result.

In the above-mentioned embodiments of this application, an initial model can be established based on a branch training algorithm, and the initial model can be trained through multiple training sets generated from different data sets to obtain an image recognition model. The image recognition model is used to perform the image recognition Recognize, get the final recognition result. Compared with related technologies, the image recognition model that combines branch training with multiple data sets has a higher accuracy rate than the image recognition model trained based on a single data set, and achieves the technical effect of improving the recognition accuracy, thereby solving related technologies. The technical problem of the low recognition accuracy of the image recognition method.

Optionally, in the foregoing embodiment of the present application, the device further includes: a third acquisition module for acquiring multiple data sets; a classification module for classifying each image in the multiple data sets to obtain each image The label is used to characterize the classification result of each image, and the labels of at least two images contained in multiple data sets are the same; the first extraction module is used to extract sample images from each data set after classification to obtain Multiple training sets.

Optionally, in the above-mentioned embodiment of the present application, the device further includes: a second extraction module, configured to extract preset features of each image in each data set after classification; an alignment module, configured based on each image Preset features to perform alignment operations on each image; the third extraction module is used to extract sample images from each data set after the operation to obtain multiple training sets.

Optionally, in the above-mentioned embodiment of the present application, the third extraction module includes: an extraction unit for randomly extracting sample images from each data set after the operation; a first acquisition unit for acquiring storage paths and labels of the sample images , Get multiple training sets.

Optionally, in the foregoing embodiment of the present application, the third acquisition module includes: a second acquisition unit, configured to acquire a video image and a preset data set collected by the collection device; a detection unit, configured to compare the video image and preset data Collect multiple data sets for detection.

Optionally, in the foregoing embodiment of the present application, the device further includes: a building module for building an initial model based on a branch training algorithm, where the initial model at least includes: multiple loss functions, multiple loss functions, and multiple training sets There is a one-to-one correspondence; the training module is used to input multiple training sets into the initial model in parallel to train the initial model; the judgment module is used to judge whether the trained model meets the preset conditions; the determination module is used to if The model obtained by training satisfies the preset conditions, and the model obtained by training is determined to be an image recognition model.

Optionally, the loss function is a square loss function.

Optionally, in the above-mentioned embodiment of the present application, the training module includes: an input unit for inputting multiple training sets into the initial model in parallel to obtain function values of multiple loss functions; The function value of and the chain derivation algorithm to obtain the gradient value of each parameter in the initial model; the update unit is used to update the gradient value of each parameter according to the stochastic gradient descent algorithm to obtain the trained model.

Optionally, in the foregoing embodiment of the present application, the judgment module includes: a third acquisition unit, configured to acquire a verification set; and a verification unit, configured to verify the model obtained by training using the verification set to obtain the accuracy of the model obtained by training; The judging unit is used to judge whether the accuracy of the trained model is the same as the historical accuracy, where the historical accuracy is the accuracy of the trained model in the last verification process; the determining unit is used to determine if the accuracy of the trained model is the same as If the historical accuracy is the same, it is determined that the trained model meets the preset conditions.

Optionally, in the foregoing embodiment of the present application, the training module is further configured to determine that the accuracy of the trained model is the historical accuracy if the accuracy of the trained model is different from the historical accuracy, and continue to train the initial model.

Optionally, in the foregoing embodiment of the present application, the third acquiring unit is configured to acquire images other than the sample images in the multiple data sets, and randomly extract image verification pairs from the other images to obtain a verification set.

Example 3

According to an embodiment of the present application, an embodiment of a storage medium is provided. The storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the image recognition method in Embodiment 1 when the program is running.

Example 4

According to an embodiment of the present application, an embodiment of a processor is provided, and the processor is used to run a program, where the image recognition method in the foregoing embodiment 1 is executed when the program is running.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

In the above-mentioned embodiments of the present application, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be indirect couplings or communication connections through some interfaces, units or modules, and may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Part of the technical solution of this application or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can be a personal computer, A server or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code .

Claims

An image recognition method, including:

Obtain the image to be recognized;

Obtain a pre-established image recognition model, where the image recognition model is obtained by training an initial model through multiple training sets, the initial model is a recognition model established based on a branch training algorithm, and the same training set is derived from Extracted from the same data set, and different training sets are extracted from different data sets;

The image recognition model is used to recognize the image to be recognized to obtain a recognition result.
The method according to claim 1, further comprising:

Get multiple data sets;

Classify each image in the multiple data sets to obtain a label of each image, where the label is used to represent the classification result of each image, and at least two of the multiple data sets include Images have the same label;

Sample images are extracted from each classified data set to obtain the multiple training sets.
The method according to claim 2, before extracting the sample image from each classified data set to obtain the multiple training sets, the method further comprises:

Extracting preset features of each image in each data set after the classification;

Performing an alignment operation on each image based on the preset feature of each image;

The sample images are extracted from each data set after the operation to obtain the multiple training sets.
The method according to claim 3, wherein, when each image is a face image, the preset feature includes at least one of the following: eyes, eyebrows, nose tip, and mouth corners.
The method according to claim 3, wherein extracting the sample image from each data set after the operation to obtain the multiple training sets comprises:

Randomly extract the sample image from each data set after the operation;

Obtain the storage path and label of the sample image, and obtain the multiple training sets.
The method according to claim 2, wherein acquiring a plurality of data sets includes:

Obtain video images and preset data sets collected by the collection device;

The video image and the preset data set are detected to obtain the multiple data sets.
The method according to claim 2, further comprising:

Establishing the initial model based on the branch training algorithm, wherein the initial model at least includes: a plurality of loss functions, and the plurality of loss functions are in a one-to-one correspondence with the plurality of training sets;

Input the multiple training sets into the initial model in parallel, and train the initial model;

Determine whether the trained model meets the preset conditions;

If the trained model satisfies the preset condition, it is determined that the trained model is the image recognition model.
The method according to claim 7, wherein inputting the multiple training sets into the initial model in parallel to train the initial model comprises:

Input the multiple training sets into the initial model in parallel to obtain the function values of the multiple loss functions;

Obtaining the gradient value of each parameter in the initial model according to the function values of the multiple loss functions and the chain derivation algorithm;

The gradient value of each parameter is updated according to the stochastic gradient descent algorithm to obtain the trained model.
8. The method according to claim 7, wherein determining whether the trained model meets a preset condition comprises:

Get validation set;

Verifying the model obtained by the training by using the verification set to obtain the accuracy of the model obtained by the training;

Judging whether the accuracy of the trained model is the same as the historical accuracy, where the historical accuracy is the accuracy obtained by the trained model in the last verification process;

If the accuracy of the trained model is the same as the historical accuracy, it is determined that the trained model meets the preset condition.
The method according to claim 9, wherein if the accuracy of the model obtained by the training is different from the historical accuracy, the accuracy of the model obtained by the training is determined as the historical accuracy, and the initial model Conduct training.
The method according to claim 10, wherein the accuracy is used to characterize the ratio of the sum of the verification results of all verification samples in the verification set to the total number of all verification samples.
The method according to claim 9, wherein obtaining the verification set comprises:

Acquiring images other than the sample images in the multiple data sets;

An image verification pair is randomly extracted from the other images to obtain the verification set.
The method according to claim 12, wherein the image verification pair comprises: a positive sample pair and a negative sample pair, the positive sample pair includes two images with the same label, and the negative sample pair includes two images with different labels. image.
The method according to claim 7, wherein the loss function is a square loss function.
An image recognition device, including:

The first acquisition module is used to acquire the image to be recognized;

The second acquisition module is used to acquire a pre-established image recognition model, wherein the image recognition model is obtained by training an initial model through multiple training sets, and the initial model is a recognition model established based on a branch training algorithm , The same training set is extracted from the same data set, and different training sets are extracted from different data sets;

The recognition module is used to recognize the image to be recognized by using the image recognition model to obtain a recognition result.
A storage medium comprising a stored program, wherein the device where the storage medium is located is controlled to execute the image recognition method according to any one of claims 1 to 14 when the program is running.
A processor for running a program, wherein the image recognition method according to any one of claims 1 to 14 is executed when the program is running.