WO2023184958A1 - 目标识别及神经网络的训练 - Google Patents

目标识别及神经网络的训练 Download PDF

Info

Publication number
WO2023184958A1
WO2023184958A1 PCT/CN2022/128143 CN2022128143W WO2023184958A1 WO 2023184958 A1 WO2023184958 A1 WO 2023184958A1 CN 2022128143 W CN2022128143 W CN 2022128143W WO 2023184958 A1 WO2023184958 A1 WO 2023184958A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
neural network
target
feature
sample image
Prior art date
Application number
PCT/CN2022/128143
Other languages
English (en)
French (fr)
Inventor
王海波
朱烽
赵瑞
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023184958A1 publication Critical patent/WO2023184958A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, specifically to target recognition and neural network training.
  • Data islands refer to the data resources accumulated by different enterprises or clients. For the purpose of privacy protection or data security, they are like independent islands and cannot be connected and interacted. In other words, siled data lacks correlation with each other. Data is the cornerstone of deep learning networks (DNN, Deep Neural Network). However, due to the existence of data islands, the data that a single enterprise or client can obtain is very limited, making it difficult to increase the upper limit of DNN accuracy.
  • DNN Deep Neural Network
  • embodiments of the present disclosure provide a target recognition method and device, a neural network training method and device, electronic equipment, and storage media.
  • embodiments of the present disclosure provide a neural network training method, applied to the second client in a neural network training system including at least a first client and a second client.
  • the method may include: Acquire a first neural network, which is pre-trained by the first client using first sample image data, and the first sample image data is the first image data that the first client can obtain. Island data; input the second sample image data into the first neural network to obtain the first feature data output by the first neural network, and the second sample image data is the second sample image data that can be obtained by the second client.
  • Island data according to the second sample image data and the first feature data, the second neural network to be trained is trained to obtain the trained second neural network; using the second sample image data, the second neural network including the The trained second neural network and the target neural network of the first neural network are trained until convergence conditions are met.
  • the second sample image data may include multiple target categories, wherein each target category includes at least one target sample image.
  • the input of the second sample image data into the first neural network to obtain the first feature data output by the first neural network may include: for each target category, inputting each target included in the target category
  • the sample image is input into the first neural network to obtain the first image feature corresponding to each target sample image output by the first neural network; according to the first image feature of each target sample image in the target category
  • the first image feature determines the first type central feature and the category feature range corresponding to the target category; the first category central feature and the category feature range are determined as the first feature data corresponding to the target category.
  • determining the first class central feature and class feature range corresponding to the target class according to the first image feature of each target sample image in the target class may include: according to the The first image feature of each target sample image in the target category determines the first category central feature corresponding to the target category; determines the first image feature corresponding to each target sample image in the target category. The similarity between the first image feature and the first type central feature is determined, and the range of the type feature corresponding to the target category is determined based on the maximum value and the minimum value of the similarity.
  • training the second neural network to be trained according to the second sample image data and the first feature data to obtain the trained second neural network may include: converting the second sample The image data is input into the second neural network to be trained, and second feature data output by the second neural network is obtained; based on the first difference between the second feature data and the label data included in the second sample image data , and the second difference between the second feature data and the first feature data, adjust the network parameters of the second neural network until the convergence conditions are met, and obtain the trained second neural network .
  • the second sample image data may include multiple target categories, wherein each target category includes at least one target sample image.
  • the input of the second sample image data into the second neural network to be trained to obtain the second feature data output by the second neural network may include: for each of the target categories, the target category includes: Each of the target sample images is input into the second neural network to be trained, and the second image features corresponding to each of the target sample images output by the second neural network are obtained; according to each of the target sample images in the target category
  • the second image feature of the target sample image determines the second type of central feature corresponding to the target category; the second type of central feature and the second image feature corresponding to each of the target sample images are determined. is the second feature data corresponding to the target category.
  • the first feature data of each target category includes a first class center feature and a class feature range of the target class to which each target sample image belongs.
  • Adjusting the network parameters of the second neural network may include: for each target category, according to the second image features and the label data corresponding to each target sample image in the target category, Determine the first difference; according to the difference between the second type central feature corresponding to the target category and the first type central feature, and the third corresponding target sample image in the target category
  • the difference between the two image features and the class feature range is determined to determine the second difference; based on the first difference and the second difference, the network parameters of the second neural network are adjusted.
  • using the second sample image data to train a target neural network including the trained second neural network and the first neural network may include: training the second neural network The sample image data is input into the trained second neural network to obtain the third feature data output by the second neural network; according to the fusion weight determined based on the second sample image data, the first feature data and The third feature data is fused to obtain fused feature data; based on the third difference between the fused feature data and the label data included in the second sample image data, the network parameters of the target neural network are performed Adjust until convergence conditions are met.
  • performing a fusion process on the first feature data and the third feature data according to the fusion weight determined based on the second sample image data to obtain the fused feature data may include: The second sample image data is input into the attribute network of the target neural network to obtain the attribute information output by the attribute network; the first weight of the first feature data and the third feature are determined according to the attribute information. The second weight of the data; based on the first weight and the second weight, fuse the first feature data and the third feature data to obtain the fused feature data.
  • the first neural network may include a first face recognition network, and the first sample image data is first face data.
  • the second neural network may include a second face recognition network, and the second sample image data is second face data.
  • embodiments of the present disclosure provide a target recognition method, which may include: acquiring an image to be tested, the image to be tested including the target to be tested; inputting the image to be tested into a pre-trained target recognition network, The recognition result output by the target recognition network is obtained, and the target recognition network is a target neural network obtained according to the training method described in any embodiment of the first aspect.
  • inventions of the present disclosure provide a neural network training device, applied to the second client in a neural network training system including at least a first client and a second client.
  • the device may include:
  • the network acquisition module is configured to acquire a first neural network, which is pre-trained by the first client using first sample image data, and the first sample image data is obtained by the first client.
  • the first island data that can be obtained; a first processing module configured to input the second sample image data into the first neural network to obtain the first feature data output by the first neural network; the second sample image The data is the second island data that the second client can obtain; the first training module is configured to train the second neural network to be trained based on the second sample image data and the first feature data, Obtaining a trained second neural network; a second training module configured to use the second sample image data to train a target neural network including the trained second neural network and the first neural network , until the convergence conditions are met.
  • the second sample image data may include multiple target categories, wherein each target category includes at least one target sample image.
  • the first processing module is configured to: for each target category, input each target sample image included in the target category into the first neural network, and obtain each of the target sample images output by the first neural network.
  • the first image feature corresponding to the target sample image for each of the target categories, determine the first category corresponding to the target category based on the first image feature of each target sample image in the target category Central feature and class feature range; determine the first type central feature and the class feature range as the first feature data corresponding to the target category.
  • the first processing module is configured to: determine the first type center corresponding to the target category according to the first image feature of each target sample image in the target category. Features; determine the similarity between the first image feature corresponding to each target sample image in the target category and the first type central feature, and determine based on the maximum value and minimum value of the similarity.
  • the first training module is configured to: input the second sample image data into a second neural network to be trained, and obtain second feature data output by the second neural network; based on the The first difference between the second feature data and the label data included in the second sample image data, and the second difference between the second feature data and the first feature data, are for the second neural The network parameters of the network are adjusted until the convergence conditions are met, and the trained second neural network is obtained.
  • the second sample image data may include multiple target categories, wherein each target category includes at least one target sample image.
  • the first training module is configured to: for each target category, input each target sample image included in the target category into the second neural network to be trained, and obtain each image output by the second neural network. second image features corresponding to each target sample image; determining the second type central feature corresponding to the target category according to the second image feature of each target sample image in the target category; The second type central feature and the second image feature corresponding to each target sample image are determined as the second feature data corresponding to the target category.
  • the first training module is configured to: for each target category, according to the second image feature and the label data corresponding to each target sample image in the target category , determine the first difference; according to the difference between the second type of central feature corresponding to the target category and the first type of central feature, and the corresponding to each target sample image in the target category.
  • the second difference is determined based on the difference between the second image feature and the range of the class feature; and the network parameters of the second neural network are adjusted based on the first difference and the second difference.
  • the second training module is configured to: input the second sample image data into the trained second neural network to obtain third feature data output by the second neural network; according to Based on the fusion weight determined by the second sample image data, the first feature data and the third feature data are fused to obtain fused feature data; based on the fused feature data and the second sample image data Including the third difference between the label data, the network parameters of the target neural network are adjusted until the convergence condition is met.
  • the second training module is configured to: input the second sample image data into the attribute network of the target neural network to obtain attribute information output by the attribute network; determine based on the attribute information The first weight of the first feature data, and the second weight of the third feature data; based on the first weight and the second weight, the first feature data and the The third feature data is fused to obtain the fused feature data.
  • the first neural network may include a first face recognition network, and the first sample image data is first face data.
  • the second neural network may include a second face recognition network, and the second sample image data is second face data.
  • a target recognition device which may include: an image acquisition module configured to acquire an image to be tested, where the image to be tested includes a target to be tested; a second processing module configured to The image to be tested is input into a pre-trained target recognition network to obtain a recognition result output by the target recognition network.
  • the target recognition network is a target neural network obtained according to the training method described in any embodiment of the first aspect. .
  • embodiments of the present disclosure provide an electronic device, which may include: a processor and a memory, the memory stores computer instructions, and the computer instructions are used to cause the processor to execute according to the first aspect or the second aspect. The method of any embodiment of the aspect.
  • an embodiment of the present disclosure provides a storage medium that stores computer instructions, and the computer instructions are used to cause a computer to execute the method according to any embodiment of the first or second aspect.
  • the neural network training method in the embodiment of the present disclosure is applied to the second client in a neural network training system including at least a first client and a second client.
  • the method may include obtaining the first client's usage of the first sample.
  • the first neural network obtained by pre-training the image data inputs the second sample image data into the first neural network to obtain the first feature data output by the first neural network.
  • the data to be trained is The second neural network is trained to obtain the trained second neural network, and the second sample image data is used to train the target neural network including the second neural network and the first neural network until convergence conditions are met.
  • the simultaneous utilization of multiple parties' island data is realized.
  • the target neural network obtained by training has better compatibility, improves the network's ability to identify multi-party data, and greatly improves the target Recognition accuracy.
  • Figure 1 is a schematic structural diagram of a neural network training system according to some embodiments of the present disclosure.
  • Figure 2 is a flow chart of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 3 is a flow chart of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 4 is a flow chart of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 5 is a flow chart of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 6 is a flow chart of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 7 is a flow chart of a training method of a neural network in some embodiments of the present disclosure.
  • Figure 8 is a flow chart of a training method of a neural network according to some embodiments of the present disclosure.
  • Figure 9 is a schematic diagram of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 10 is a flow chart of a training method of a neural network in some embodiments of the present disclosure.
  • Figure 11 is a flow chart of a training method of a neural network in some embodiments of the present disclosure.
  • Figure 12 is a schematic diagram of a training method for a neural network in some embodiments of the present disclosure.
  • Figure 13 is a flowchart of a target recognition method according to some embodiments of the present disclosure.
  • Figure 14 is a structural block diagram of a neural network training device according to some embodiments of the present disclosure.
  • Figure 15 is a structural block diagram of a target recognition device according to some embodiments of the present disclosure.
  • Figure 16 is a structural block diagram of an electronic device according to some embodiments of the present disclosure.
  • Data is the cornerstone of deep learning networks (DNN, Deep Neural Network). If more data can be used, it will have greater potential to break through the upper limit of accuracy of neural networks. However, due to the existence of data islands, the data that a single enterprise or client can accumulate is very limited, making it difficult to increase the upper limit of DNN accuracy.
  • DNN Deep Neural Network
  • Company A has accumulated a lot of face data of users wearing masks collected through cameras
  • Company B has accumulated a lot of face data uploaded by users without masks. If the face data of Company A and Company B can be used to train the face recognition network at the same time, the face recognition network can learn the face features of the two data at the same time, thus greatly improving the accuracy of face recognition.
  • the island data of Enterprise A and Enterprise B cannot be obtained at the same time, making it difficult to further improve the upper limit of accuracy of the face recognition network.
  • federated learning Federated Learning
  • the basic principle of federated learning is to build an encrypted communication environment between "server-multiple clients".
  • Each client has independently stored and maintained island data.
  • the server sends the DNN to each client, so that each client can use its own island data to train the DNN locally, and send the trained network parameters and gradients to the server encrypted.
  • the server aggregates the network parameters and gradients from different clients to obtain the final trained DNN, thereby achieving the purpose of allowing island data to participate in network training without leaving the local area.
  • federated learning requires the construction of a huge encrypted communication environment to achieve encrypted communication between the server and each client. Its deployment is difficult, time-consuming, and costly. It also requires the DNN network structure of each client to be the same, which greatly limits the flexibility of island data utilization and network training efficiency.
  • embodiments of the present disclosure provide a training method and device for a target recognition network (such as a neural network for face recognition), a method and device for target (such as a face) recognition, an electronic device, and a storage medium, aiming to Breaking down data silos, effectively utilizing the siled data from all parties to participate in network training, improving target recognition accuracy, and making deployment simple and low-cost.
  • a target recognition network such as a neural network for face recognition
  • target such as a face recognition
  • an electronic device such as a face recognition
  • a storage medium aiming to Breaking down data silos, effectively utilizing the siled data from all parties to participate in network training, improving target recognition accuracy, and making deployment simple and low-cost.
  • FIG. 1 shows the structure of a training system for a neural network (for example, a neural network used for face recognition) according to an embodiment of the present disclosure.
  • a neural network for example, a neural network used for face recognition
  • the training system may include a first client 100 and a second client 200 , and the first client 100 and the second client 200 may establish a wired or wireless communication connection through a network 300 .
  • the first client 100 stores first sample image data
  • the second client 200 stores second sample image data.
  • the data types of the first sample image data and the second sample image data can be set accordingly, for example, they can be face data, natural scene data, vehicle data, etc., and the present disclosure does not limit this. .
  • the first sample image data and the second sample image data are island data of each other. That is, the first client 100 cannot obtain the second sample image data, and the second client 200 cannot obtain the first sample image data.
  • the first client 100 and the second client 200 may be two different enterprises.
  • the first client 100 is enterprise A, which has independently accumulated, stored and maintained face data, that is, first sample image data.
  • the second client 200 is enterprise B, which also has independently accumulated, stored and maintained face data, that is, second sample image data.
  • the data between enterprise A and enterprise B cannot be interoperable. That is, enterprise A cannot obtain the second sample image data, and similarly, enterprise B cannot obtain the first sample image data.
  • first client 100 and the second client 200 may be different departments of the same enterprise.
  • first client 100 is department A of enterprise X, which has independently accumulated, stored and maintained natural scene data, that is, first sample image data.
  • the second client 200 is department B of enterprise X, which also has independently accumulated, stored and maintained natural scene data, that is, second sample image data.
  • the data between Department A and Department B cannot be interoperable, that is, Department A cannot obtain the second sample image data, and similarly, Department B cannot obtain the first sample image data.
  • the embodiment of the present disclosure provides a neural network training method, which can be applied to the second client 200, so that the first sample image data and the second sample image data can be processed simultaneously Participate in online training.
  • the neural network training method of the present disclosure may include steps S210 to S240.
  • the second client 200 obtains the first neural network.
  • the first neural network is pre-trained by the first client using the first sample image data.
  • the first client 100 can pre-train the first neural network using the first sample image data that it can obtain, and then send the trained first neural network to the second client 200 through the network 300 .
  • the second client 200 inputs the second sample image data into the first neural network to obtain the first feature data output by the first neural network.
  • the first neural network can be used to assist in training the second neural network, so that the second neural network can It has the ability to recognize the features learned by the first neural network.
  • the second client 200 can obtain the second sample image data stored by itself.
  • the second sample image data is first input into the received first neural network, thereby obtaining the first feature data. .
  • the first neural network is trained by the first client 100 using the first sample image data. That is to say, the first neural network has good recognition ability for the characteristics of the first sample image data.
  • the second sample image data is input into the first neural network, so that the identification features extracted by the first neural network from the second sample image data can be obtained, that is, the first Feature data.
  • the first feature data extracted by the first neural network is used to assist in training the second neural network on the second client 200 side, so that the second neural network can be compatible with the first neural network.
  • the recognition capability is explained below in S230.
  • the second client 200 trains the second neural network to be trained based on the second sample image data and the first feature data to obtain the trained second neural network.
  • the second client side it has the second sample image data stored by itself, and also has the first feature data obtained by performing feature extraction on the second sample image data using the first neural network in S220.
  • the second neural network can only learn the features of the second sample image data, but not the features of the first sample image data.
  • the first feature data output from the first neural network trained by the first sample image data is integrated into the training process of the second neural network, so that the second neural network can be compatible with the first feature data.
  • the optimization of the objective function mainly includes the following two parts: 1) The second neural network trains the second sample image data
  • the classification loss represents the recognition ability of the second neural network for the second sample image data, that is, the difference between the predicted value and the label value of the second neural network's classification of the second sample image data; 2) Second
  • the loss between the features extracted by the neural network and the features extracted by the first neural network represents the difference between the features extracted by the first neural network and the features extracted by the second neural network for the same second sample image data. difference between.
  • the accuracy of the network recognition is higher.
  • the second client 200 uses the second sample image data to train the target neural network including the trained second neural network and the first neural network until the convergence condition is met.
  • the second sample image data is again used to train the target neural network including the second neural network and the first neural network until convergence is satisfied. condition.
  • the target neural network that will eventually be used in the prediction phase can be obtained.
  • the trained second neural network is not directly used as the final target neural network, but the previously obtained first neural network and the second neural network are merged to obtain the target neural network.
  • the features extracted by the network include both the extracted features of the first neural network and the extracted features of the second neural network, so that the trained target neural network can Both the data and the second sample image data have good robustness.
  • the target neural network may also be sent to the first client 100 . It can be understood that the first client 100 and the second client 200 only transmit the first neural network once at the beginning and the target neural network once at the end. There is no interaction between the two involving island data, thus ensuring data security. Under the premise of realizing the simultaneous utilization of the island data of both parties.
  • the neural network training is realized to utilize multi-party island data, so that the target recognition neural network obtained by training has better compatibility, effectively improving the The network’s recognition ability of multi-party data and target recognition accuracy.
  • the face recognition scene will be used as an example to illustrate the neural network training method of the present disclosure. That is, in the following embodiments, the first sample image data is the first face data. , the second sample image data is the second face data, the first neural network may include the first face recognition network, and the second neural network may include the second face recognition network.
  • both the first sample image data and the second sample image data may include multiple sample data, and each sample data may include a sample image, corresponding label data, and the target category to which it belongs.
  • both the first face data and the second face data may include multiple sample data, and each sample data may include a face sample image, corresponding label data, and face category.
  • the face category indicates the category to which the face sample image belongs. For example, multiple face sample images of the same person belong to the same face category.
  • the label data represents the ground truth that the corresponding face sample image belongs to a certain face category, and the label data can be obtained by manual annotation.
  • both the first face data and the second face data may include multiple sample data, and these sample data belong to multiple face categories.
  • Each sample data may include a face sample image and label data corresponding to the face sample image.
  • the first client 100 can use its own stored first face data to train the first face recognition network, thereby obtaining the trained first face recognition network. network. The following description will be made with reference to the embodiment of FIG. 3 .
  • the process of training the first face recognition network (hereinafter also referred to as the first neural network) by the first client 100 may include step S310 and step S320.
  • the first client 100 inputs the first sample image data into the first neural network to be trained, and obtains the output result of the first neural network.
  • the first client 100 adjusts the network parameters of the first neural network according to the difference between the output result and the label data until the convergence condition is met, and the trained first neural network is obtained.
  • the first face recognition network can adopt the FaceNet network structure, expressed as:
  • x S represents the face sample image in the first face data
  • M s represents the first face recognition network
  • the face sample image included in the sample data is input into the first face recognition network to be trained, and the first face recognition network obtains a classification result for the face sample image through processing such as convolution, pooling, and classification.
  • processing such as convolution, pooling, and classification.
  • the classification result represents the predicted value of the face sample image by the first face recognition network
  • the label data represents the true value of the face sample image, so that the classification can be calculated through the pre-constructed objective function
  • the network parameters of the first face recognition network can be optimized and adjusted based on the difference back propagation.
  • the above description only takes one of the sample data as an example.
  • the above process can be repeated, and the first face recognition network is continuously iteratively optimized until the convergence conditions are met, thus obtaining The first face recognition network after training.
  • the first client 100 can send the first face recognition network to the second client 200 through the network 300 . It can be understood that only the first face recognition network is transmitted between the first client 100 and the second client 200 without any transmission of the first face data. Therefore, it is ensured that the data on the first client 100 side is not transmitted. Go out locally to ensure data security.
  • the second client 200 After receiving the first face recognition network sent by the first client 100, it can use the first face recognition network and its own stored second face data to identify the second person. Face recognition network for compatibility training. The following will describe the embodiment with reference to FIGS. 4 to 6 .
  • the second client 200 uses the first face recognition network (hereinafter also referred to as the first neural network) to obtain the data from the second face data.
  • the process of first facial feature data may include steps S410 to S430.
  • the second client 200 inputs each target sample image into the first neural network, and obtains the first image feature corresponding to each target sample image output by the first neural network.
  • the target sample image is also the face sample image in the sample data included in the second face data.
  • the face sample image included in the sample data can be input into the first face recognition network, so that the first face recognition network can perform feature extraction on the face sample image through, for example, a convolution layer, The first image feature corresponding to the face sample image is obtained.
  • the specific process can be similar to the aforementioned formula (1), and will not be described again.
  • the second client 200 determines the first category central feature and category feature range corresponding to the target category based on the first image feature of each target sample image in the target category.
  • the target category is the face category to which the face sample image belongs.
  • the second face data may include multiple face categories. For example, multiple face sample images belonging to the same person in the second face data can belong to the same face category. That is, each face category includes at least one face sample image.
  • the face category includes N face sample images, and the N first image features corresponding to the N face sample images can be extracted through the aforementioned S410.
  • the first type central feature corresponding to the face category can be calculated based on the first image features corresponding to the N face sample images included in the face category, expressed as:
  • the first image feature corresponding to each face sample image in the face category and the first type central feature corresponding to the face category can be calculated through the above process.
  • the class feature range corresponding to the face category can be calculated based on the first type of central feature, which will be described below with reference to the embodiment in Figure 5 .
  • the first type of central feature corresponding to each face category can be obtained through the aforementioned formula (2), and the first type of central feature represents the average feature of the corresponding face category.
  • the face category includes a total of N face sample images and corresponding N first image features.
  • the face category also includes corresponding class center features.
  • the similarity between the first image feature and the class center feature of each face sample image may be calculated. This similarity represents the degree of similarity between each face sample image and the average feature.
  • the cosine similarity between each first image feature and the class center feature can be calculated, expressed as:
  • the minimum and maximum values of the similarity are determined, expressed as:
  • the maximum similarity value can be As the inner feature boundary of the face category, the minimum similarity value As the outer feature boundary of the face category, the range between the inner and outer feature boundaries is the class feature range corresponding to the face category.
  • the class center feature and class feature range corresponding to each face category can be obtained.
  • the first type central feature of the face category obtained above is and the class feature range (S min , S max ) together as the first feature data of the face category (hereinafter may also be referred to as the first face feature data).
  • the first face feature data corresponding to each face category will be obtained.
  • the training of the second face recognition network can be assisted based on the first face feature data, which will be described below with reference to the implementation in Figure 6 .
  • the process of training the second face recognition network (hereinafter also referred to as the second neural network) by the second client 200 may include step S610 and step S620.
  • the second client 200 inputs the second sample image data into the second neural network to be trained, and obtains the second feature data output by the second neural network.
  • the second client 200 Based on the first difference between the second feature data and the label data of the second sample image data, and the second difference between the second feature data and the first feature data, the second client 200 performs a test on the second neural network The network parameters are adjusted until the convergence conditions are met, and the trained second neural network is obtained.
  • the target items for compatibility training of the second face recognition network mainly include two parts: 1) The second face recognition network’s analysis of the second face data Classification loss; 2) The loss between the features extracted by the second face recognition network from the second face data and the features extracted by the first face recognition network from the second face data.
  • the second face data (ie, the second sample image data) can be input into the second face recognition network to be trained, and the second feature data output by the second face recognition network (hereinafter It can also be called the second facial feature data).
  • the second face feature data may include second image features corresponding to each face sample image and second type central features corresponding to each face category. The following description will be made with reference to the embodiment of FIG. 7 .
  • the process of obtaining the second facial feature data may include steps S611 to S613.
  • the process of calculating the second image feature of each face sample image and the second type central feature of each face category is similar to the aforementioned process of calculating the first image feature and the first type central feature.
  • the main difference is that :
  • the aforementioned first image features and first type central features are obtained based on the first face recognition network, while the second image features and second type central features in this embodiment are obtained based on the second face recognition network.
  • each face sample image of the second face data is input into the second face recognition network to be trained.
  • any suitable face recognition network can be used for the second face recognition network.
  • the network structure of the second face recognition network can be the same as that of the first face recognition network, or it can be different. There are no restrictions on disclosure.
  • the second face recognition network can also adopt the FaceNet network structure, expressed as:
  • x A represents the face sample image in the second face data
  • M A represents the second face recognition network
  • the face sample image x A included in the sample data can be input into the second face recognition network M A to be trained, so that the second face recognition network is based on Equation (6 ) Extract features from the face sample image x A to obtain the second image features corresponding to the face sample image.
  • the second image features corresponding to each face sample image can be obtained.
  • the second type central feature corresponding to the face category can be determined based on each second image feature included in the face category. For example, a certain face category includes N face sample images. For this face category, the calculated second type central feature is expressed as:
  • the second image feature corresponding to each face sample image in the face category and the second type central feature corresponding to the face category can be calculated through the above process. Thereafter, the second image feature and the second type of central feature may be used together as the second face feature data of the face category.
  • the second face recognition network can be trained in a supervised manner, which will be described below with reference to the implementation in Figure 8 .
  • the process of adjusting the network parameters of the second face recognition network may include steps S621 to S623.
  • the second image feature corresponding to the face sample image of the sample data can be obtained through the foregoing process.
  • the second face recognition network can predict and obtain the classification result of the face sample image based on the second image features.
  • the classification result represents the predicted value of the face sample image by the second face recognition network
  • the label data represents the true value of the category of the face sample image
  • For each target category determine the second difference based on the difference between the second category central feature corresponding to the target category and the first category central feature, and the difference between the second image feature corresponding to each target sample image and the category feature range. .
  • the second difference may include two parts: one is the difference between the second type central feature and the first type central feature; the other is the loss term that constrains the second image feature based on the class feature range.
  • the first type of central feature corresponding to the face category can be calculated
  • the second type of central feature corresponding to the face category can be calculated. Therefore, according to the first type of central features and the second type of central features The difference between the two can be calculated.
  • the corresponding category feature range (S min , S max ) can also be obtained through the aforementioned embodiment of Figure 5 .
  • the difference between the second image feature and the class feature range of each face sample image can be constrained at the same time.
  • the second image feature corresponding to the face sample image of the sample data can be calculated.
  • the first type of central feature of the face category to which the face sample image belongs The cosine similarity of , and then constrain the cosine similarity to be less than Therefore, the difference between the first type central feature and the second type central feature, and the difference between the second image feature and the type feature range can be collectively regarded as the second difference.
  • the above-mentioned first difference and the second difference can be synthesized, and the network parameters of the second face recognition network can be optimized and adjusted according to the comprehensive result of the difference through back propagation.
  • the second face recognition network can be iteratively optimized by repeating the above process until the convergence conditions are met, thereby obtaining the trained second face recognition network.
  • the first difference represents the constraints on the second face recognition network's ability to recognize the second face data
  • the second difference represents It is a constraint on the compatibility of the second face recognition network with the first face recognition network trained based on the first face data.
  • the second face recognition network obtained based on the above training process can have good compatible recognition capabilities for both the first face data and the second face data, thereby improving the accuracy of face recognition.
  • FIG. 9 shows a schematic diagram of compatibility training for the second face recognition network M A in the training method of the present disclosure. This will be further described below in conjunction with FIG. 9 .
  • the training process is performed on the second client 200 side.
  • the first face feature data can be obtained using the second face data and the first face recognition network Ms.
  • the second face feature data can be obtained using the second face data and the second face recognition network M A to be trained.
  • the pre-constructed loss function can be used to calculate the first difference and the second difference.
  • the loss value, and the network parameters of the second face recognition network M A are adjusted by back propagation according to the loss value until the convergence conditions are met, and the trained second face recognition network M A is obtained.
  • the first face recognition network is used to perform auxiliary compatibility training on the second face recognition network, so that the second face recognition network can train the first face data and the third face data that are isolated from each other. Both face data have good compatible recognition capabilities, and the network can converge better and improve face recognition accuracy.
  • the trained second face recognition network can be obtained through the above process.
  • the second face recognition network is not directly used as the final target face recognition network, but the target face recognition network is obtained by fusing the first face recognition network and the second face recognition network, and The second face data is again used to train the target face recognition network to improve the network's compatibility with each island data.
  • the following description will be made with reference to the embodiment of FIG. 10 .
  • the process of training a target face recognition network may include steps S1010 to S1030.
  • the second client 200 inputs the second sample image data into the trained second neural network to obtain the third feature data output by the second neural network.
  • the second client 200 According to the fusion weight determined based on the second sample image data, the second client 200 performs fusion processing on the first feature data and the third feature data to obtain fused feature data.
  • the second client 200 Based on the third difference between the fused feature data and the label data included in the second sample image data, the second client 200 adjusts the network parameters of the target neural network until the convergence condition is met.
  • a fusion layer can be added after the feature extraction layer of the first face recognition network and the second face recognition network to perform fusion processing on the extracted features of the two.
  • the second face data is input into the trained second face recognition network.
  • the second face recognition network can output the third feature data (hereinafter also can be called third facial feature data).
  • third facial feature data is essentially the same as the second facial feature data.
  • the second face data is input into the first face recognition network.
  • the first face recognition network can output the first face feature data.
  • the fusion weight may be determined according to the attribute information of the second face data.
  • the attribute information represents the difference in attributes between the first face data and the second face data.
  • the first face data mainly includes the face data of "children”
  • the second face data mainly includes the face data of "adults”
  • “age” is the attribute difference between the two island data.
  • the first face data mainly includes the face data of "men”
  • the second face data mainly includes the face data of "women”
  • “gender” is the attribute difference between the two island data.
  • the attribute information is not limited to the above examples, and can also be any other attribute information suitable for implementation. As long as the attribute information can make the first face data and the second face data have a certain difference as a whole, the present disclosure does not limit this.
  • the target face recognition network may also include an attribute recognition network, thereby using the attribute recognition network to extract attribute information of the face sample image, so that the target face recognition network can determine the corresponding fusion weight based on the attribute information.
  • the two After determining the fusion weight of the first facial feature data and the third facial feature data, the two can be fused according to the fusion weight to obtain fused feature data. It can be understood that the fused feature data simultaneously fuses the feature information extracted by using the first face recognition network and the second face recognition network respectively. Therefore, the fused feature data has a certain representativeness for the features of the two island data. .
  • the target face recognition network can add a classification layer after the fusion layer.
  • the classification layer is, for example, a fully connected layer.
  • the fully connected layer can predict and output the classification results corresponding to the face sample data based on the input fusion feature data.
  • the classification result represents the predicted value of the face sample image by the target face recognition network
  • the label data represents the true value of the face sample image
  • the target face recognition can be calculated through the pre-built loss function
  • the third difference between the prediction results output by the network and the existing label data is the loss. Therefore, the classification layer parameters of the target face recognition network can be optimized and adjusted according to the third difference back propagation. For multiple sample data in the second face data, the above process can be repeated to continuously iteratively optimize the target face recognition network until the convergence conditions are met, thereby obtaining the trained target face recognition network.
  • the target face recognition network is obtained by fusing the first face recognition network and the second face recognition network, thereby improving the target face recognition network's compatibility with multi-party island data and face recognition Accuracy.
  • the process of fusion processing of the first facial feature data and the third facial feature data may include steps S1021 to S1023.
  • Figure 12 shows a schematic diagram of training the target face recognition network in the training method of the present disclosure, which will be described in detail below with reference to Figure 12.
  • the target face recognition network may include an attribute network M attr .
  • the attribute network M attr represents a network for identifying attribute information of the second face data.
  • the attribute network M attr may be based on attributes. The type of information is pre-trained.
  • the purpose of introducing attribute information into the target face recognition network is to better integrate the first face feature data and the third face feature data. Therefore, the type of attribute information can be mainly for the first face data and the third face feature data. Attribute information for differential differentiation of two face data.
  • the attribute network M attr can be pre-trained based on whether the user wears a mask on the face, and is mainly used to extract the characteristics of the user's face and predict the corresponding attribute information.
  • the second face data is input into the attribute network M attr , and the attribute information of the second face data can be obtained.
  • the second face data is input into the first face recognition network M s to obtain the first face feature data, and the second face data is input into the trained second face recognition network M A to obtain the third face feature data.
  • the attribute information when determining the first weight of the first facial feature data and the second weight of the third facial feature data based on the attribute information, the attribute information may be further smoothed based on a smoothing coefficient.
  • a fully connected layer branch can be added to the second face recognition network M S , so that the corresponding smoothing coefficient T can be output based on the second face data, and the smoothing coefficient T can be used to smooth the attribute information.
  • Attr s represents the attribute information after smoothing processing
  • Attr represents the attribute information output by the attribute network
  • T represents the smoothing coefficient output by the second face recognition network.
  • the first weight of the first facial feature data and the second weight of the third facial feature data can be determined based on the attribute information. Then, based on the first weight and the second weight, a linear weighted fusion process is performed on the first facial feature data and the third facial feature data to obtain fused feature data.
  • the classification layer predicts the corresponding classification result based on the fused feature data, and then based on the difference between the classification result and the label data, and back-propagates the classification layer parameters of the target face recognition network based on the difference. Optimize until the convergence conditions are met and the training of the target face recognition network is completed.
  • the target face recognition network can be sent to the first client 100 through the network 300.
  • the first client 100 only needs to send the first face recognition network to the second client 200 once, and the second client 200 needs to send the target face recognition network to the first client 100 once.
  • there is no need for any communication involving island data which can effectively protect the security of island data.
  • the network architecture is simple, easy to deploy, and low cost.
  • the simultaneous utilization of multiple parties' island data is achieved.
  • the trained target neural network has better compatibility and improves the network's ability to identify multi-party data. , greatly improving the target recognition accuracy.
  • attribute information is fused to identify and predict target attributes, further enhancing the network's compatibility with island data of different attribute information and improving target recognition accuracy.
  • Embodiments of the present disclosure provide a target recognition method, which can be applied to electronic devices.
  • the electronic device in the embodiment of the present disclosure may be any device type suitable for implementation, such as a mobile terminal, a wearable device, a vehicle-mounted device, a server, a cloud platform, etc., and the disclosure is not limited thereto.
  • the target recognition method of the present disclosure may include step S1310 and step S1320.
  • the target recognition network described in the embodiment of the present disclosure is a target neural network trained according to the training method of any of the aforementioned embodiments.
  • the image to be tested is the image that is expected to recognize the face in the image.
  • the image to be tested including the face to be tested can be input into the method described in this disclosure.
  • the recognition results output by the target recognition network can be obtained.
  • target recognition in the embodiments of the present disclosure is not limited to face recognition scenarios, and can also be any other suitable scenarios, such as vehicle recognition, natural scene recognition, etc., which will not be described in detail in this disclosure.
  • the target recognition network has good compatible recognition capabilities for multi-party island data, the recognition accuracy of the image to be tested is higher, which satisfies high-precision target recognition scenarios.
  • the embodiment of the present disclosure provides a neural network training device, which can be applied to a second client, so that the first sample image data and the second sample image data can participate in network training at the same time.
  • the neural network training device of the example of the present disclosure may include: a network acquisition module 10 configured to acquire a first neural network, where the first neural network is a first client Pre-trained using the first sample image data, which is the first island data that the first client can obtain; the first processing module 20 is configured to input the second sample image data The first neural network obtains the first feature data output by the first neural network, and the second sample image data is the second island data that the second client can obtain; the first training module 30 is Configured to train the second neural network to be trained according to the second sample image data and the first feature data to obtain a trained second neural network; the second training module 40 is configured to utilize the third Using two sample image data, train a target neural network including the trained second neural network and the first neural network until convergence conditions are met.
  • the simultaneous utilization of multiple parties' island data is achieved.
  • the trained target neural network has better compatibility and improves the network's ability to identify multi-party data. , greatly improving the target recognition accuracy.
  • the second sample image data may include multiple target categories, wherein each target category includes at least one target sample image.
  • the first processing module 20 is configured to: for each target category, input each of the target sample images included in the target category into the first neural network, and obtain each of the target sample images output by the first neural network.
  • the first image feature corresponding to the target sample image for each of the target categories, determine the first image feature corresponding to the target category according to the first image feature of each target sample image in the target category.
  • Class center feature and class feature range determine the first class center feature and the class feature range as the first feature data corresponding to the target category.
  • the first processing module 20 is configured to: determine the first category corresponding to the target category according to the first image feature of each target sample image in the target category. Central feature; determine the similarity between the first image feature corresponding to each target sample image in the target category and the first type central feature, and based on the maximum value and minimum value of the similarity, Determine the class feature range corresponding to the target class.
  • the first training module 30 is configured to: input the second sample image data into a second neural network to be trained, and obtain second feature data output by the second neural network; based on the The first difference between the second feature data and the label data included in the second sample image data, and the second difference between the second feature data and the first feature data, for the second The network parameters of the neural network are adjusted until the convergence conditions are met, and the trained second neural network is obtained.
  • the second sample image data may include multiple target categories, wherein each target category includes at least one target sample image.
  • the first training module 30 is configured to: for each target category, input each target sample image included in the target category into the second neural network to be trained, and obtain the output of the second neural network.
  • the second type central feature and the second image feature corresponding to each target sample image are determined as the second feature data corresponding to the target category.
  • the first feature data of each target category includes a first class center feature and a class feature range of the target class to which each target sample image belongs.
  • the first training module 30 is configured to: for each target category, determine the first image feature and the label data corresponding to each target sample image in the target category. A difference; according to the difference between the second type central feature corresponding to the target category and the first type central feature, and the second image feature corresponding to each target sample image in the target category and The second difference is determined based on the difference in the class feature range; and the network parameters of the second neural network are adjusted based on the first difference and the second difference.
  • the second training module 40 is configured to: input the second sample image data into the trained second neural network to obtain third feature data output by the second neural network; according to The fusion weight determined by the second sample image data is fused with the first feature data and the third feature data to obtain fused feature data; based on the fused feature data and the second sample image data, the following Based on the third difference between the label data, the network parameters of the target neural network are adjusted until the convergence conditions are met.
  • the second training module 40 is configured to: input the second sample image data into the attribute network of the target neural network to obtain attribute information output by the attribute network; according to the attribute information Determine the first weight of the first characteristic data and the second weight of the third characteristic data; based on the first weight and the second weight, calculate the first characteristic data and the second weight of the third characteristic data.
  • the third feature data is subjected to fusion processing to obtain the fused feature data.
  • the first neural network may include a first face recognition network, and the first sample image data is first face data.
  • the second neural network may include a second face recognition network, and the second sample image data is second face data.
  • the simultaneous utilization of multiple parties' island data is achieved.
  • the trained target neural network has better compatibility and improves the network's ability to identify multi-party data. , greatly improving the target recognition accuracy.
  • attribute information is fused to identify and predict target attributes, further enhancing the network's compatibility with island data of different attribute information and improving target recognition accuracy.
  • the present disclosure provides a target recognition device, which may include: an image acquisition module 50 configured to acquire an image to be tested, where the image to be tested includes a target to be tested;
  • the second processing module 60 is configured to input the image to be tested into a pre-trained target recognition network to obtain a recognition result output by the target recognition network.
  • the target recognition network is based on any of the above embodiments.
  • the target neural network obtained by the training method.
  • the target recognition network has good compatible recognition capabilities for multi-party island data, the recognition accuracy of the image to be tested is higher, which satisfies high-precision target recognition scenarios.
  • examples of the present disclosure provide an electronic device, which may include: a processor; and a memory, the memory storing computer instructions, the computer instructions being used to cause the processor to perform any of the above implementations method as described.
  • examples of the present disclosure provide a storage medium storing computer instructions for causing a computer to perform the method described in any of the above embodiments.
  • FIG. 16 shows a schematic structural diagram of an electronic device 600 suitable for implementing the method of the present disclosure.
  • the electronic device shown in FIG. 16 the corresponding functions of the above-mentioned processor and storage medium can be realized.
  • the electronic device 600 includes a processor 601 that can perform various appropriate actions and processes according to programs stored in the memory 602 or loaded into the memory 602 from the storage part 608 .
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processor 601 and the memory 602 are connected to each other via a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604.
  • the following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., speakers, etc.; and a storage section 608 including a hard disk, etc. ; and a communication section 609 including a network interface card such as a LAN card, a modem, etc.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also connected to I/O interface 605 as needed.
  • Removable media 611 such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • the above method process may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program tangibly embodied on a machine-readable medium, the computer program including program code for performing the above-described method.
  • the computer program may be downloaded and installed from the network via communications portion 609 and/or installed from removable media 611 .
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more components for implementing the specified logical function. Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及目标识别及用于目标识别的神经网络的训练。根据本公开的一个示例,至少包括第一客户端和第二客户端的神经网络训练系统中的所述第二客户端在获取所述第一客户端利用第一样本图像数据训练得到的第一神经网络之后,将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据;根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络;利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,直至满足收敛条件。

Description

目标识别及神经网络的训练 技术领域
本公开涉及人工智能技术领域,具体涉及目标识别及神经网络的训练。
背景技术
数据孤岛是指不同的企业或者客户端各自积累的数据资源,出于隐私保护或者数据安全等目的,像一个个独立的孤岛一样无法进行连接互动。换言之,孤岛数据彼此之间缺乏关联性。数据是深度学习网络(DNN,Deep Neural Network)的基石,但是由于数据孤岛的存在,单个企业或者客户端所能获取的数据十分有限,导致DNN精度上限难以提高。
发明内容
为提高神经网络精度,本公开实施方式提供了一种目标识别方法及装置、神经网络的训练方法及装置、电子设备以及存储介质。
第一方面,本公开实施方式提供了一种神经网络的训练方法,应用于至少包括第一客户端和第二客户端的神经网络训练系统中的所述第二客户端,所述方法可以包括:获取第一神经网络,所述第一神经网络为所述第一客户端利用第一样本图像数据预先训练得到,所述第一样本图像数据为所述第一客户端能够获取的第一孤岛数据;将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,所述第二样本图像数据为所述第二客户端能够获取的第二孤岛数据;根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络;利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,直至满足收敛条件。
在一些实施方式中,所述第二样本图像数据可以包括多个目标类别,其中,每个目标类别中包括至少一个目标样本图像。所述将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,可以包括:对于每个目标类别,将所述目标类别包括的各个所述目标样本图像输入所述第一神经网络,得到所述第一神经网络输出的每个所述目标样本图像所对应的第一图像特征;根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的第一类中心特征以及类特征范围;所述第一类中心特征和所述类特征范围确定为所述目标类别对应的所述第一特征数据。
在一些实施方式中,所述根据所述目标类别中每个目标样本图像的所述第一图像特征,确定所述目标类别对应的第一类中心特征以及类特征范围,可以包括:根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的所述第一类中心特征;确定所述目标类别中每个所述目标样本图像所对应的所述第一图像特征与所述第一类中心特征的相似度,并根据所述相似度的最大值和最小值,确定所述目标类别对应的所述类特征范围。
在一些实施方式中,根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络,可以包括:将所述第二样本图像数据输入待训练的第二神经网络,得到所述第二神经网络输出的第二特征数据;基于所述第二特征数据与所述第二样本图像数据包括的标签数据之间的第一差异,以及所述第二特征数据与所述第一特征数据之间的第二差异,对所述第二神经网络的网络参数进行调整,直至满足收敛条件,得到所述训练后的第二神经网络。
在一些实施方式中,所述第二样本图像数据可以包括多个目标类别,其中,每个目标类别包括至少一个目标样本图像。所述将所述第二样本图像数据输入待训练的第二神经网络,得到所述第二神经网络输出的第二特征数据,可以包括:对于每个所述目标类 别,将所述目标类别包括的各个所述目标样本图像输入待训练的第二神经网络,得到所述第二神经网络输出的每个所述目标样本图像所对应的第二图像特征;根据所述目标类别中每个所述目标样本图像的所述第二图像特征,确定所述目标类别对应的第二类中心特征;将所述第二类中心特征以及每个所述目标样本图像对应的所述第二图像特征,确定为目标类别对应的所述第二特征数据。
在一些实施方式中,每个目标类别的第一特征数据包括每个目标样本图像所属的目标类别的第一类中心特征以及类特征范围。所述基于所述第二特征数据与所述第二样本图像数据包括的标签数据之间的第一差异,以及所述第二特征数据与所述第一特征数据之间的第二差异,对所述第二神经网络的网络参数进行调整,可以包括:对于每个所述目标类别,根据所述目标类别中每个所述目标样本图像对应的所述第二图像特征和所述标签数据,确定所述第一差异;根据所述目标类别对应的所述第二类中心特征与所述第一类中心特征的差异,以及所述目标类别中每个所述目标样本图像对应的所述第二图像特征与所述类特征范围的差异,确定所述第二差异;基于所述第一差异和所述第二差异,对所述第二神经网络的网络参数进行调整。
在一些实施方式中,所述利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,可以包括:将所述第二样本图像数据输入所述训练后的第二神经网络,得到所述第二神经网络输出的第三特征数据;根据基于所述第二样本图像数据确定的融合权重,对所述第一特征数据和所述第三特征数据进行融合处理,得到融合特征数据;基于所述融合特征数据和所述第二样本图像数据包括的标签数据之间的第三差异,对所述目标神经网络的网络参数进行调整,直至满足收敛条件。
在一些实施方式中,所述根据基于所述第二样本图像数据确定的融合权重,对所述第一特征数据和所述第三特征数据进行融合处理,得到融合特征数据,可以包括:将所述第二样本图像数据输入所述目标神经网络的属性网络,得到所述属性网络输出的属性信息;根据所述属性信息确定所述第一特征数据的第一权值,和所述第三特征数据的第二权值;基于所述第一权值和所述第二权值,对所述第一特征数据和所述第三特征数据进行融合处理,得到所述融合特征数据。
在一些实施方式中,所述第一神经网络可以包括第一人脸识别网络,所述第一样本图像数据为第一人脸数据。
在一些实施方式中,所述第二神经网络可以包括第二人脸识别网络,所述第二样本图像数据为第二人脸数据。
第二方面,本公开实施方式提供了一种目标识别方法,可以包括:获取待测图像,所述待测图像中包括待测目标;将所述待测图像输入预先训练得到的目标识别网络,得到所述目标识别网络输出的识别结果,所述目标识别网络是根据第一方面任一实施方式所述的训练方法得到的目标神经网络。
第三方面,本公开实施方式提供了一种神经网络的训练装置,应用于至少包括第一客户端和第二客户端的神经网络训练系统中的所述第二客户端,所述装置可以包括:网络获取模块,被配置为获取第一神经网络,所述第一神经网络为第一客户端利用第一样本图像数据预先训练得到,所述第一样本图像数据为所述第一客户端能够获取的第一孤岛数据;第一处理模块,被配置为将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,所述第二样本图像数据为所述第二客户端能够获取的第二孤岛数据;第一训练模块,被配置为根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络;第二训练模块,被配置为利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,直至满足收敛条件。
在一些实施方式中,所述第二样本图像数据可以包括多个目标类别,其中,每个目标类别中包括至少一个目标样本图像。所述第一处理模块被配置为:对于每个目标类别,将所述目标类别包括的各个所述目标样本图像输入所述第一神经网络,得到所述第一神 经网络输出的每个所述目标样本图像所对应的第一图像特征;对于每一个所述目标类别,根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的第一类中心特征以及类特征范围;将所述第一类中心特征和所述类特征范围确定为所述目标类别对应的所述第一特征数据。
在一些实施方式中,所述第一处理模块被配置为:根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的所述第一类中心特征;确定所述目标类别中每个所述目标样本图像所对应的所述第一图像特征与所述第一类中心特征的相似度,并根据所述相似度的最大值和最小值,确定所述目标类别对应的所述类特征范围。
在一些实施方式中,所述第一训练模块被配置为:将所述第二样本图像数据输入待训练的第二神经网络,得到所述第二神经网络输出的第二特征数据;基于所述第二特征数据与所述第二样本图像数据包括的标签数据之间的第一差异,以及所述第二特征数据与所述第一特征数据之间的第二差异,对所述第二神经网络的网络参数进行调整,直至满足收敛条件,得到所述训练后的第二神经网络。
在一些实施方式中,所述第二样本图像数据可以包括多个目标类别,其中,每个目标类别包括至少一个目标样本图像。所述第一训练模块被配置为:对于每个所述目标类别,将所述目标类别包括的各个所述目标样本图像输入待训练的第二神经网络,得到所述第二神经网络输出的每个所述目标样本图像所对应的第二图像特征;根据所述目标类别中每个所述目标样本图像的所述第二图像特征,确定所述目标类别对应的第二类中心特征;将所述第二类中心特征以及每个所述目标样本图像对应的所述第二图像特征,确定为目标类别对应的所述第二特征数据。
在一些实施方式中,所述第一训练模块被配置为:对于每个所述目标类别,根据所述目标类别中每个所述目标样本图像对应的所述第二图像特征和所述标签数据,确定所述第一差异;根据所述目标类别对应的所述第二类中心特征与所述第一类中心特征的差异,以及所述目标类别中每个所述目标样本图像对应的所述第二图像特征与所述类特征范围的差异,确定所述第二差异;基于所述第一差异和所述第二差异,对所述第二神经网络的网络参数进行调整。
在一些实施方式中,所述第二训练模块被配置为:将所述第二样本图像数据输入所述训练后的第二神经网络,得到所述第二神经网络输出的第三特征数据;根据基于所述第二样本图像数据确定的融合权重,对所述第一特征数据和所述第三特征数据进行融合处理,得到融合特征数据;基于所述融合特征数据和所述第二样本图像数据包括的标签数据之间的第三差异,对所述目标神经网络的网络参数进行调整,直至满足收敛条件。
在一些实施方式中,所述第二训练模块被配置为:将所述第二样本图像数据输入所述目标神经网络的属性网络,得到所述属性网络输出的属性信息;根据所述属性信息确定所述第一特征数据的第一权值,和所述第三特征数据的第二权值;基于所述第一权值和所述第二权值,对所述第一特征数据和所述第三特征数据进行融合处理,得到所述融合特征数据。
在一些实施方式中,所述第一神经网络可以包括第一人脸识别网络,所述第一样本图像数据为第一人脸数据。
在一些实施方式中,所述第二神经网络可以包括第二人脸识别网络,所述第二样本图像数据为第二人脸数据。
第四方面,本公开实施方式提供了一种目标识别装置,可以包括:图像获取模块,被配置为获取待测图像,所述待测图像中包括待测目标;第二处理模块,被配置为将所述待测图像输入预先训练得到的目标识别网络,得到所述目标识别网络输出的识别结果,所述目标识别网络是根据第一方面任一实施方式所述的训练方法得到的目标神经网络。
第五方面,本公开实施方式提供了一种电子设备,可以包括:处理器和存储器,所述存储器存储有计算机指令,所述计算机指令用于使所述处理器执行根据第一方面或者 第二方面中任一实施方式所述的方法。
第六方面,本公开实施方式提供了一种存储介质,存储有计算机指令,所述计算机指令用于使计算机执行根据第一方面或者第二方面中任一实施方式所述的方法。
本公开实施方式的神经网络训练方法,应用于至少包括第一客户端和第二客户端的神经网络训练系统中的所述第二客户端,所述方法可以包括获取第一客户端利用第一样本图像数据预先训练得到的第一神经网络,将第二样本图像数据输入第一神经网络,得到第一神经网络输出的第一特征数据,根据第二样本图像数据以及第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络,利用第二样本图像数据对包括第二神经网络和第一神经网络的目标神经网络进行训练,直至满足收敛条件。本公开实施方式中,在保证孤岛数据安全性的前提下,实现对多方孤岛数据的同时利用,训练得到的目标神经网络具有更好的兼容性,提高网络对多方数据的识别能力,大大提高目标识别精度。
附图说明
图1是根据本公开一些实施方式中神经网络训练系统的结构示意图。
图2是根据本公开一些实施方式中神经网络的训练方法的流程图。
图3是根据本公开一些实施方式中神经网络的训练方法的流程图。
图4是根据本公开一些实施方式中神经网络的训练方法的流程图。
图5是根据本公开一些实施方式中神经网络的训练方法的流程图。
图6是根据本公开一些实施方式中神经网络的训练方法的流程图。
图7是根据本公开一些实施方式中神经网络的训练方法的流程图。
图8是根据本公开一些实施方式中神经网络的训练方法的流程图。
图9是根据本公开一些实施方式中神经网络的训练方法的原理图。
图10是根据本公开一些实施方式中神经网络的训练方法的流程图。
图11是根据本公开一些实施方式中神经网络的训练方法的流程图。
图12是根据本公开一些实施方式中神经网络的训练方法的原理图。
图13是根据本公开一些实施方式中目标识别方法的流程图。
图14是根据本公开一些实施方式中神经网络的训练装置的结构框图。
图15是根据本公开一些实施方式中目标识别装置的结构框图。
图16是根据本公开一些实施方式中电子设备的结构框图。
具体实施方式
数据是深度学习网络(DNN,Deep Neural Network)的基石,如果能够利用更多的数据,将更有潜力突破神经网络的精度上限。但是由于数据孤岛的存在,单个企业或者客户端所能累积获取的数据十分有限,从而导致DNN精度上限难以提高。
以人脸识别场景为例,利用更多的人脸图像数据训练人脸识别网络是提升人脸识别精度的有效方法。但是,由于人脸数据的高度隐私性,不同企业或客户端积累的数据均独立存储和维护,彼此的数据库之间相互孤立,因此形成数据孤岛。
例如,企业A积累有较多的通过摄像头采集的用户佩戴口罩的人脸数据,而企业B积累有较多的用户上传的不佩戴口罩的人脸数据。如果能够同时利用企业A和企业B的人脸数据来训练人脸识别网络,人脸识别网络即可同时学习到两种数据的人脸特征,从而大大提高人脸识别精度。但是,由于数据孤岛的存在,企业A与企业B的孤岛数 据无法同时获取,导致人脸识别网络的精度上限难以进一步提高。
相关技术中,可以通过联邦学习(Federated Learning)的方法打破数据孤岛,实现孤岛数据的利用。联邦学习的基本原理是,构建一个“服务器——多客户端”之间的加密通信环境。其中,每个客户端具有独立存储和维护的孤岛数据。服务器将DNN发送至各个客户端,从而,每个客户端可以利用自己的孤岛数据在本地训练DNN,将训练后的网络参数和梯度加密发送至服务器。然后,服务器对来自不同客户端的网络参数和梯度进行聚合处理得到最终训练后的DNN,以此实现孤岛数据不出本地即可参与网络训练的目的。
但是,联邦学习需要构建庞大的加密通信环境,以实现服务器与各个客户端之间的加密通信。其部署难度大、时间周期长、成本高的同时,还要求每个客户端的DNN网络结构相同,极大限制了孤岛数据利用的灵活性和网络训练效率。
基于此,本公开实施方式提供了一种目标识别网络(例如用于人脸识别的神经网络)的训练方法及装置、目标(例如人脸)识别的方法及装置、电子设备以及存储介质,旨在打破数据孤岛,有效利用各方孤岛数据参与网络训练,提高目标识别精度,并且部署简单、成本低。
图1示出了本公开实施方式的神经网络(例如用于人脸识别的神经网络)的训练系统的结构,下面结合图1对本公开实施方式的应用环境进行说明。
如图1所示,训练系统可以包括第一客户端100和第二客户端200,第一客户端100和第二客户端200可以通过网络300建立有线或者无线的通信连接。
本公开实施方式中,第一客户端100存储有第一样本图像数据,第二客户端200存储有第二样本图像数据。可以理解,根据应用场景的不同,第一样本图像数据和第二样本图像数据的数据类型可以相应设置,例如可以是人脸数据、自然场景数据、车辆数据等等,本公开对此不作限制。
其中,第一样本图像数据和第二样本图像数据互为孤岛数据。也即,第一客户端100无法获取得到第二样本图像数据,而第二客户端200同样无法获取得到第一样本图像数据。
在一个示例中,第一客户端100和第二客户端200可以是两个不同的企业。例如,第一客户端100为企业A,其具有独立积累、存储和维护的人脸数据,也即第一样本图像数据。第二客户端200为企业B,其同样具有独立积累、存储和维护的人脸数据,也即第二样本图像数据。出于数据价值或隐私保护等目的,企业A与企业B之间的数据无法互通,也即企业A无法获取到第二样本图像数据,同样,企业B也无法获取到第一样本图像数据。
在另一个示例中,第一客户端100和第二客户端200可以是同一个企业的不同部门。例如,第一客户端100为企业X的部门A,其具有独立积累、存储和维护的自然场景数据,也即第一样本图像数据。第二客户端200为企业X的部门B,其同样具有独立积累、存储和维护的自然场景数据,也即第二样本图像数据。出于数据价值或隐私保护等目的,部门A与部门B之间的数据无法互通,也即部门A无法获取到第二样本图像数据,同样,部门B也无法获取到第一样本图像数据。
当然,可以理解,本公开实施方式系统的应用场景并不局限于上述示例,还可以应用于任何适于联合两个或多个孤岛数据进行网络训练的场景,本公开对此不再枚举。
在图1所示的基础上,本公开实施方式提供了一种神经网络的训练方法,该方法可应用于第二客户端200,以使第一样本图像数据和第二样本图像数据可以同时参与网络训练。
如图2所示,在一些实施方式中,本公开示例的神经网络的训练方法,可以包括步骤S210至步骤S240。
S210、第二客户端200获取第一神经网络。
本公开实施方式中,第一神经网络为第一客户端利用第一样本图像数据预先训练得到。
参见图1可知,第一客户端100与第二客户端200之间,虽然两者存储的孤岛数据不可互通,但是两者之间建立有可通信连接。从而,第一客户端100可以预先利用自己能够获取的第一样本图像数据训练得到第一神经网络,然后将训练后的第一神经网络通过网络300发送至第二客户端200。
本公开下述实施方式中,对第一客户端100训练得到第一神经网络的过程进行具体说明,在此暂不详述。
S220、第二客户端200将第二样本图像数据输入第一神经网络,得到第一神经网络输出的第一特征数据。
本公开实施方式中,在第二客户端200接收到第一客户端100发送的第一神经网络之后,可利用第一神经网络对第二神经网络进行辅助训练,从而使得第二神经网络可以兼具对第一神经网络学习到的特征的识别能力。
具体来说,第二客户端200可以获取到自己存储的第二样本图像数据,本公开实施方式中,首先将第二样本图像数据输入接收到的第一神经网络中,从而得到第一特征数据。
可以理解,第一神经网络是第一客户端100利用第一样本图像数据训练得到的,也就是说,第一神经网络对第一样本图像数据的特征具有很好的识别能力。本公开实施方式中,在第二客户端200一侧,将第二样本图像数据输入第一神经网络中,从而可以得到第一神经网络从第二样本图像数据提取的识别特征,也即第一特征数据。
在本公开实施方式中,正是利用第一神经网络提取的第一特征数据,对第二客户端200一侧的第二神经网络进行辅助训练,使得第二神经网络可以兼容第一神经网络的识别能力,下面在S230中进行说明。
S230、第二客户端200根据第二样本图像数据以及第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络。
具体来说,对于第二客户端一侧,其具有自身存储的第二样本图像数据,还具有通过S220利用第一神经网络对第二样本图像数据进行特征提取得到的第一特征数据。
可以理解,若仅利用第二样本图像数据来训练第二神经网络,那么第二神经网络仅能学习到针对第二样本图像数据的特征,而不会学习到第一样本图像数据的特征。本公开实施方式中,将从经第一样本图像数据训练得到的第一神经网络输出的第一特征数据融合进第二神经网络的训练过程,使得第二神经网络可以兼容第一特征数据的特征识别。
也即,本公开实施方式中,在第二客户端200一侧,对第二神经网络进行训练时,目标函数的优化主要包括如下两个部分:1)第二神经网络对第二样本图像数据的分类损失,其表示的是第二神经网络对于第二样本图像数据的识别能力,也即第二神经网络对第二样本图像数据分类的预测值与标签值之间的差异;2)第二神经网络提取特征与第一神经网络提取特征之间的损失,其表示的是,对于同样的第二样本图像数据,利用第一神经网络提取到的特征与利用第二神经网络提取到的特征之间的差异。可以理解,第一神经网络提取到的特征与利用第二神经网络提取到的特征两者之间的差异越小,表示第二神经网络兼容第一神经网络的能力越强,在利用第二神经网络来识别与第一样本图像数据更接近的目标时,网络识别的精度越高。
因此,通过约束上述两个损失项,利用第一特征数据和第二样本图像数据,对待训练的第二神经网络进行训练,可以得到同时兼容第一样本图像数据和第二样本图像数据的特征识别的第二神经网络。
S240、第二客户端200利用第二样本图像数据,对包括训练后的第二神经网络和第一神经网络的目标神经网络进行训练,直至满足收敛条件。
具体而言,在本公开实施方式中,在得到训练后的第二神经网络之后,再次利用第二样本图像数据对包括第二神经网络和第一神经网络的目标神经网络进行训练,直至满足收敛条件。这样,训练完成后,可得到最终将于预测阶段使用的目标神经网络。
值得说明的是,本公开实施方式中,并非将训练后的第二神经网络直接作为最终的目标神经网络,而是融合之前得到的第一神经网络和第二神经网络来得到目标神经网络。这样,在对目标神经网络进行训练时,网络提取的特征既包括第一神经网络的提取特征,又包括第二神经网络的提取特征,使得经训练得到的目标神经网络无论对第一样本图像数据还是第二样本图像数据均具有很好的鲁棒性。
对融合网络进行训练的过程,本公开下述实施方式中进行说明,在此暂不详述。
在一些实施方式中,在训练得到目标神经网络之后,还可以将目标神经网络发送至第一客户端100。可以理解,第一客户端100和第二客户端200只有在最开始传输一次第一神经网络,和在最后传输一次目标神经网络,两者不存在任何涉及孤岛数据的交互,在保证了数据安全的前提下,实现了双方孤岛数据的同时利用。
通过上述可知,本公开实施方式中,在保证孤岛数据安全性的前提下,实现了神经网络训练对多方孤岛数据的利用,使得训练得到的目标识别神经网络具有更好的兼容性,有效提高了该网络对多方数据的识别能力以及目标识别精度。
值得说明的是,本公开下文实施方式中,将以人脸识别场景为例对本公开神经网络训练方法进行说明,也即,在下文实施方式中,第一样本图像数据为第一人脸数据,第二样本图像数据为第二人脸数据,第一神经网络即可包括第一人脸识别网络,第二神经网络即可包括第二人脸识别网络。
但是可以理解,本公开并不局限于人脸识别场景,还可以是其他任何适于实施的应用场景,例如前述的车辆识别、自然场景识别等,本公开对此不再赘述,下面结合图2实施方式进行说明。
本公开实施方式中,无论第一样本图像数据还是第二样本图像数据,其均可以包括多个样本数据,每个样本数据可以包括样本图像、对应的标签数据以及所属的目标类别。以人脸识别场景为例,无论第一人脸数据还是第二人脸数据,其均可以包括多个样本数据,每个样本数据可以包括人脸样本图像、对应的标签数据以及人脸类别。人脸类别表示人脸样本图像所属的类别,例如,同一个人的多张人脸样本图像即属于同一个人脸类别。而标签数据表示对应的人脸样本图像属于某个人脸类别的真实值(Groundtruth),标签数据可以由人工标注的方式得到。
简而言之,第一人脸数据和第二人脸数据均可以包括多个样本数据,这些样本数据分属于多个人脸类别。其中,每个样本数据可以包括人脸样本图像以及该人脸样本图像对应的标签数据。
在一些实施方式中,对于第一客户端100一侧,第一客户端100可以利用自身存储的第一人脸数据对第一人脸识别网络进行训练,从而得到训练后的第一人脸识别网络。下面结合图3实施方式进行说明。
如图3所示,在一些实施方式中,第一客户端100对第一人脸识别网络(以下也可称为第一神经网络)进行训练的过程可以包括步骤S310和步骤S320。
S310、第一客户端100将第一样本图像数据输入待训练的第一神经网络,得到第一神经网络的输出结果。
S320、第一客户端100根据输出结果与标签数据之间的差异,对第一神经网络的网络参数进行调整,直至满足收敛条件,得到训练后的第一神经网络。
本公开实施方式中,对于第一人脸识别网络的具体网络结构不作限制,可以采用任何适于实现的人脸识别网络。例如一个示例中,第一人脸识别网络可以采用FaceNet网络结构,表示为:
Figure PCTCN2022128143-appb-000001
式(1)中,x S表示第一人脸数据中的人脸样本图像,M s表示第一人脸识别网络,
Figure PCTCN2022128143-appb-000002
表示利用第一人脸识别网络从第一人脸数据提取的人脸样本图像的特征。下面以一个样本数据为例,对第一人脸识别网络的训练过程进行说明。
将样本数据包括的人脸样本图像输入待训练的第一人脸识别网络,第一人脸识别网络通过例如卷积、池化、分类等处理,得到针对该人脸样本图像的分类结果。可以理解,分类结果表示的是第一人脸识别网络对该人脸样本图像的预测值,而标签数据表示的是该人脸样本图像的真实值,从而可以通过预先构建的目标函数计算得到分类结果与标签数据之间的差异。之后,可以根据该差异反向传播对第一人脸识别网络的网络参数进行优化调整。
上述仅以其中一个样本数据为例进行说明,对于第一人脸数据中的多个样本数据,均可以重复上述过程,对第一人脸识别网络不断进行迭代优化,直至满足收敛条件,从而得到训练后的第一人脸识别网络。
第一客户端100在得到训练后的第一人脸识别网络之后,即可通过网络300将第一人脸识别网络发送至第二客户端200。可以理解,第一客户端100与第二客户端200之间只传输第一人脸识别网络,而没有进行任何针对第一人脸数据的传输,因此保证第一客户端100一侧的数据不出本地,保证数据安全。
对于第二客户端200一侧,在接收到第一客户端100发送的第一人脸识别网络之后,即可利用第一人脸识别网络以及其自身存储的第二人脸数据对第二人脸识别网络进行兼容性训练。下面结合图4至图6实施方式进行说明。
如图4所示,在一些实施方式中,本公开示例的训练方法中,第二客户端200利用第一人脸识别网络(以下也可称为第一神经网络)从第二人脸数据得到第一人脸特征数据的过程可以包括步骤S410至步骤S430。
S410、第二客户端200将各个目标样本图像输入第一神经网络,得到第一神经网络输出的每个目标样本图像所对应的第一图像特征。
具体而言,在人脸识别场景中,目标样本图像也即第二人脸数据包括的样本数据中的人脸样本图像。以一个样本数据为例,可将该样本数据包括的人脸样本图像输入第一人脸识别网络中,从而第一人脸识别网络可以通过例如卷积层对该人脸样本图像进行特征提取,得到该人脸样本图像对应的第一图像特征。具体过程可与前述式(1)类似,对此不再赘述。
S420、对于每一个目标类别,第二客户端200根据目标类别中每个目标样本图像的第一图像特征,确定目标类别对应的第一类中心特征以及类特征范围。
具体来说,在人脸识别场景中,目标类别即为人脸样本图像所属的人脸类别。结合前述可知,第二人脸数据中可以包括多个人脸类别,例如,第二人脸数据中多个属于同一个人的人脸样本图像,即可属于同一个人脸类别。也即,每个人脸类别中包括至少一个人脸样本图像。
以任意一个人脸类别为例,该人脸类别包括N个人脸样本图像,通过前述S410可以提取得到N个人脸样本图像对应的N个第一图像特征。对于该人脸类别,即可根据该人脸类别包括的N个人脸样本图像对应的第一图像特征,计算得到该人脸类别对应的第一类中心特征,表示为:
Figure PCTCN2022128143-appb-000003
式(2)中,
Figure PCTCN2022128143-appb-000004
表示利用第一人脸识别网络M s提取的第二人脸数据中第i类的第k个人脸样本图像对应的第一图像特征,
Figure PCTCN2022128143-appb-000005
表示第一类中心特征。可以理解,第一类中心特征表示该人脸类别的人脸样本图像的平均特征,其能够反映该人脸类别的特征平均值。
对于任意一个人脸类别,依次通过上述过程即可计算得到该人脸类别中每个人脸样本图像对应的第一图像特征,以及该人脸类别所对应的第一类中心特征。
在得到第一类中心特征之后,可根据第一类中心特征计算得到该人脸类别对应的类特征范围,下面结合图5实施方式进行说明。
S421、根据目标类别中每个目标样本图像的第一图像特征,确定目标类别对应的第一类中心特征。
S422、确定目标类别中每个第一图像特征与第一类中心特征的相似度,并根据相似度的最大值确定目标类别对应的类特征范围。
具体来说,通过前述式(2)可以得到每个人脸类别对应的第一类中心特征,第一类中心特征表示对应的人脸类别的平均特征。
以任意一个人脸类别为例,该人脸类别共包括N个人脸样本图像和对应的N个第一图像特征,该人脸类别还包括对应的类中心特征。在本公开实施方式中,可以计算每个人脸样本图像的第一图像特征与类中心特征的相似度。该相似度表示每个人脸样本图像与平均特征的相似程度,在一个示例中,可以计算每个第一图像特征与类中心特征的余弦相似度,表示为:
Figure PCTCN2022128143-appb-000006
式(3)中,
Figure PCTCN2022128143-appb-000007
表示利用第一人脸识别网络M s提取的第i类的第k个第一图像特征与类中心特征的相似度,
Figure PCTCN2022128143-appb-000008
表示利用第一人脸识别网络M s提取的第i类的第k个第一图像特征,
Figure PCTCN2022128143-appb-000009
表示第i类的类中心特征。
在得到每个第一图像特征与所述的类中心特征的相似度之后,确定相似度的最小值和最大值,表示为:
Figure PCTCN2022128143-appb-000010
Figure PCTCN2022128143-appb-000011
式(4)和(5)中,
Figure PCTCN2022128143-appb-000012
表示第i类中的相似度最小值,
Figure PCTCN2022128143-appb-000013
表示第i类中的相似度最大值,
Figure PCTCN2022128143-appb-000014
表示第i类中的各个相似度。
在确定相似度最小值和相似度最大值之后,可以将相似度最大值
Figure PCTCN2022128143-appb-000015
作为该人脸类别的内特征边界,而将相似度最小值
Figure PCTCN2022128143-appb-000016
作为该人脸类别的外特征边界,内外特征边界之间的范围即为该人脸类别对应的类特征范围。
上述对其中一个人脸类别进行了说明,对于第二人脸数据包括的各个人脸类别依次执行上述过程,即可得到每个人脸类别对应的类中心特征以及类特征范围。
S430、将第一类中心特征和类特征范围确定为目标类别对应的第一特征数据。
具体而言,对于第二人脸数据包括的任意一个人脸类别,将上述得到的该人脸类别的第一类中心特征
Figure PCTCN2022128143-appb-000017
和类特征范围(S min,S max)共同作为该人脸类别的第一特征数据(以下也可称为第一人脸特征数据)。
通过上述可知,第二客户端200的第二人脸数据在经过第一人脸识别网络进行特征提取之后,将得到每个人脸类别对应的第一人脸特征数据。之后,即可根据第一人脸特征数据辅助对第二人脸识别网络的训练,下面结合图6实施方式进行说明。
如图6所示,在一些实施方式中,第二客户端200对第二人脸识别网络(以下也可称为第二神经网络)进行训练的过程可以包括步骤S610和步骤S620。
S610、第二客户端200将第二样本图像数据输入待训练的第二神经网络,得到第二神经网络输出的第二特征数据。
S620、基于第二特征数据与第二样本图像数据的标签数据之间的第一差异,以及 第二特征数据与第一特征数据之间的第二差异,第二客户端200对第二神经网络的网络参数进行调整,直至满足收敛条件,得到训练后的第二神经网络。
具体而言,通过上述可知,对于第二人脸识别网络(即第二神经网络)进行兼容性训练的目标项主要包括两个部分:1)第二人脸识别网络对第二人脸数据的分类损失;2)第二人脸识别网络从第二人脸数据提取的特征与第一人脸识别网络从第二人脸数据提取的特征之间的损失。
从而,本公开实施方式中,可将第二人脸数据(即第二样本图像数据)输入待训练的第二人脸识别网络中,得到第二人脸识别网络输出的第二特征数据(以下也可称为第二人脸特征数据)。第二人脸特征数据可以包括每个人脸样本图像对应的第二图像特征、每个人脸类别对应的第二类中心特征。下面结合图7实施方式进行说明。
如图7所示,在一些实施方式中,本公开示例的训练方法,得到第二人脸特征数据的过程可以包括步骤S611至步骤S613。
S611、将各个目标样本图像输入待训练的第二神经网络,得到第二神经网络输出的每个目标样本图像所对应的第二图像特征。
S612、对于每一个目标类别,根据目标类别中每个目标样本图像的第二图像特征,确定目标类别对应的第二类中心特征。
S613、将第二类中心特征以及每个目标样本图像对应的第二图像特征,确定为目标类别对应的第二特征数据。
总的来说,计算每个人脸样本图像的第二图像特征以及每个人脸类别的第二类中心特征的过程,与前述计算第一图像特征和第一类中心特征的过程类似,主要区别在于:前述的第一图像特征和第一类中心特征是基于第一人脸识别网络得到的,而本实施方式的第二图像特征和第二类中心特征是基于第二人脸识别网络得到的。
具体来说,将第二人脸数据的各个人脸样本图像输入待训练的第二人脸识别网络。本公开实施方式中,对于第二人脸识别网络可以采用任何适于实现的人脸识别网络,第二人脸识别网络的网络结构可以与第一人脸识别网络的相同,也可以不同,本公开对此不作限制。例如一个示例中,第二人脸识别网络同样可以采用FaceNet网络结构,表示为:
Figure PCTCN2022128143-appb-000018
式(6)中,x A表示第二人脸数据中的人脸样本图像,M A表示第二人脸识别网络,
Figure PCTCN2022128143-appb-000019
表示利用第二人脸识别网络从第二人脸数据提取的人脸样本图像的特征,也即本公开所述的第二图像特征。
以第二人脸数据的一个样本数据为例,可将样本数据包括的人脸样本图像x A输入待训练的第二人脸识别网络M A中,从而第二人脸识别网络基于式(6)对人脸样本图像x A进行特征提取,得到该人脸样本图像对应的第二图像特征
Figure PCTCN2022128143-appb-000020
依次对第二人脸数据的每个样本数据进行上述处理,即可得到每个人脸样本图像对应的第二图像特征。
对于任意一个人脸类别,可根据该人脸类别中包括的各个第二图像特征,确定该人脸类别对应的第二类中心特征。例如,某个人脸类别包括N个人脸样本图像,对于该人脸类别,计算得到的第二类中心特征,表示为:
Figure PCTCN2022128143-appb-000021
式(7)中,
Figure PCTCN2022128143-appb-000022
表示利用第二人脸识别网络M A提取的第二人脸数据中第i类的第k个人脸样本图像对应的第二图像特征,
Figure PCTCN2022128143-appb-000023
表示第二类中心特征。
对于任意一个人脸类别,依次通过上述过程即可计算得到该人脸类别中每个人脸 样本图像对应的第二图像特征,以及该人脸类别所对应的第二类中心特征。此后,可将第二图像特征和第二类中心特征共同作为该人脸类别的第二人脸特征数据。
在得到第一人脸特征数据和第二人脸特征数据之后,即可对第二人脸识别网络进行有监督训练,下面结合图8实施方式进行说明。
如图8所示,在一些实施方式中,本公开示例的训练方法,对第二人脸识别网络的网络参数进行调整的过程可以包括步骤S621至步骤S623。
S621、根据每个目标样本图像对应的第二图像特征和标签数据,确定第一差异。
具体来说,以第二人脸数据中的一个样本数据为例,通过前述过程可以得到该样本数据的人脸样本图像对应的第二图像特征。第二人脸识别网络可以根据该第二图像特征,预测得到对该人脸样本图像的分类结果。
可以理解,分类结果表示的是第二人脸识别网络对该人脸样本图像的预测值,而标签数据表示该人脸样本图像的类别真实值,从而通过预先构建的损失函数即可计算得到分类结果与标签数据之间的损失,也即本公开所述的第一差异。
S622、对于每个目标类别,根据目标类别对应的第二类中心特征与第一类中心特征的差异,以及每个目标样本图像对应的第二图像特征与类特征范围的差异,确定第二差异。
本公开实施方式中,第二差异可以包括两个部分:一是第二类中心特征与第一类中心特征的差异;二是基于类特征范围对第二图像特征进行约束的损失项。
具体来说,以第二人脸数据中的任意一个人脸类别为例,通过前述图5实施方式,可以计算得到该人脸类别对应的第一类中心特征
Figure PCTCN2022128143-appb-000024
而通过前述图7实施方式,可以计算得到该人脸类别对应的第二类中心特征
Figure PCTCN2022128143-appb-000025
从而,根据第一类中心特征
Figure PCTCN2022128143-appb-000026
和第二类中心特征
Figure PCTCN2022128143-appb-000027
即可计算得到两者之间的差异。
同时,对于每个人脸类别,还可以通过前述图5实施方式得到对应的类特征范围(S min,S max)。在本公开实施方式,可以同时约束每个人脸样本图像的第二图像特征与类特征范围的差异。例如一个示例中,对于任意一个样本数据,可以计算样本数据的人脸样本图像对应的第二图像特征
Figure PCTCN2022128143-appb-000028
与该人脸样本图像所属的人脸类别的第一类中心特征
Figure PCTCN2022128143-appb-000029
的余弦相似度,然后约束余弦相似度小于
Figure PCTCN2022128143-appb-000030
从而,可以将第一类中心特征与第二类中心特征的差异,以及第二图像特征与类特征范围的差异,共同作为第二差异。
S623、基于第一差异和第二差异,对第二神经网络的网络参数进行调整,直至满足收敛条件,得到所述训练后的第二人脸识别网络。
具体而言,可以综合上述第一差异和第二差异,根据该差异综合结果反向传播对第二人脸识别网络的网络参数进行优化调整。对于第二人脸数据中的多个样本数据,可以通过重复上述过程,对第二人脸识别网络不断进行迭代优化,直至满足收敛条件,从而得到训练后的第二人脸识别网络。
可以理解,在上述对第二人脸识别网络进行有监督训练的过程中,第一差异表示的是针对第二人脸识别网络对第二人脸数据识别能力的约束,而第二差异表示的是针对第二人脸识别网络对基于第一人脸数据训练得到的第一人脸识别网络的兼容性的约束。这样,基于上述训练过程得到的第二人脸识别网络可以对第一人脸数据和第二人脸数据均具有很好的兼容识别能力,以此提高人脸识别精度。
图9示出了本公开训练方法中,对第二人脸识别网络M A进行兼容性训练的原理图,下面结合图9进一步进行说明。
如图9所示,该训练过程在第二客户端200一侧进行。基于前述图4和图5的过程,利用第二人脸数据和第一人脸识别网络M s可以得到第一人脸特征数据。同时,基于前述图6和图7的过程,利用第二人脸数据和待训练的第二人脸识别网络M A可以得到 第二人脸特征数据。然后根据前述图8的过程,基于第一人脸特征数据、第二人脸特征数据以及第二人脸数据中的标签数据,利用预先构建的损失函数可以计算得到包括第一差异和第二差异的损失值,并根据该损失值反向传播调整第二人脸识别网络M A的网络参数直至满足收敛条件,得到训练后的第二人脸识别网络M A
通过上述可知,本公开实施方式中,利用第一人脸识别网络对第二人脸识别网络进行辅助兼容性训练,使得第二人脸识别网络可以对互为孤岛的第一人脸数据和第二人脸数据均具有很好的兼容识别能力,同时网络可以更好的收敛,提高人脸识别精度。
可以理解,通过上述过程即可得到训练后的第二人脸识别网络。在本公开实施方式中,并非直接将第二人脸识别网络作为最终的目标人脸识别网络,而是通过融合第一人脸识别网络和第二人脸识别网络得到目标人脸识别网络,并且再次利用第二人脸数据对目标人脸识别网络进行训练,以提高网络对各个孤岛数据的兼容性。下面结合图10实施方式进行说明。
如图10所示,在一些实施方式中,本公开示例的训练方法,对目标人脸识别网络(以下也可称为目标神经网络)进行训练的过程可以包括步骤S1010至步骤S1030。
S1010、第二客户端200将第二样本图像数据输入训练后的第二神经网络,得到第二神经网络输出的第三特征数据。
S1020、根据基于第二样本图像数据确定的融合权重,第二客户端200对第一特征数据和第三特征数据进行融合处理,得到融合特征数据。
S1030、基于融合特征数据和第二样本图像数据包括的标签数据之间的第三差异,第二客户端200对目标神经网络的网络参数进行调整,直至满足收敛条件。
在一些实施方式中,可以在第一人脸识别网络和第二人脸识别网络的特征提取层之后添加融合层,从而对两者的提取特征进行融合处理。
具体来说,将第二人脸数据输入训练后的第二人脸识别网络,基于与上述提取第二人脸特征数据相同的过程,第二人脸识别网络可以输出第三特征数据(以下也可称为第三人脸特征数据)。可以理解,第三人脸特征数据本质上与第二人脸特征数据相同。本领域技术人员参照前述实施方式即可理解并充分实施,对此不再赘述。同时,将第二人脸数据输入第一人脸识别网络,基于前述实施方式过程,第一人脸识别网络可以输出第一人脸特征数据。
在对第一人脸特征数据和第三人脸特征数据进行融合处理时,首先需要确定两者的融合权重,并基于该融合权重分别对两个数据进行融合处理。
在一些实施方式中,可以根据第二人脸数据的属性信息确定融合权重。属性信息表示第一人脸数据和第二人脸数据在属性上的差异。例如一个示例中,第一人脸数据主要包括“儿童”的人脸数据,而第二人脸数据主要包括“成人”的人脸数据,则“年龄”即为两个孤岛数据的属性差异。例如又一个示例中,第一人脸数据主要包括“男人”的人脸数据,而第二人脸数据主要包括“女人”的人脸数据,则“性别”即为两个孤岛数据的属性差异。
本领域技术人员可以理解,属性信息并不局限于上述示例,还可以是其他任何适于实施的属性信息。只要属性信息能够从整体上使得第一人脸数据和第二人脸数据具有一定的差异,本公开对此不作限制。
在一些示例中,目标人脸识别网络还可以包括属性识别网络,从而利用属性识别网络对人脸样本图像的属性信息进行提取,使得目标人脸识别网络可以根据属性信息确定对应的融合权重。本公开下述实施方式中进行说明,在此暂不详述。
在确定第一人脸特征数据和第三人脸特征数据的融合权重之后,可根据该融合权重对两者进行融合处理,从而得到融合特征数据。可以理解,融合特征数据同时融合了分别利用第一人脸识别网络和第二人脸识别网络所提取到的特征信息,从而,该融合特征数据对于两个孤岛数据的特征均具有一定的代表性。
目标人脸识别网络可以在融合层之后添加分类层,分类层例如为全连接层,全连接层可以根据输入的融合特征数据,预测输出人脸样本数据对应的分类结果。
可以理解,分类结果表示的是目标人脸识别网络对人脸样本图像的预测值,而标签数据表示的是该人脸样本图像的真实值,从而可以通过预先构建的损失函数计算目标人脸识别网络输出的预测结果与既有的标签数据之间的第三差异,也即损失。从而,可以根据第三差异反向传播对目标人脸识别网络的分类层参数进行优化调整。对于第二人脸数据中的多个样本数据,可以重复上述过程,对目标人脸识别网络不断进行迭代优化,直至满足收敛条件,从而得到训练后的目标人脸识别网络。
通过上述可知,本公开实施方式中,通过融合第一人脸识别网络和第二人脸识别网络得到目标人脸识别网络,从而提高目标人脸识别网络对多方孤岛数据的兼容能力以及人脸识别精度。
如图11所示,在一些实施方式中,本公开示例的训练方法,对第一人脸特征数据和第三人脸特征数据进行融合处理的过程可以包括步骤S1021至步骤S1023。
S1021、将第二样本图像数据输入目标神经网络的属性网络,得到属性网络输出的属性信息。
S1022、根据属性信息确定第一特征数据的第一权值,和第三特征数据的第二权值。
S1023、基于第一权值和第二权值,对第一特征数据和第三特征数据进行融合处理,得到融合特征数据。
图12示出了本公开训练方法中,对目标人脸识别网络进行训练的原理图,下面结合图12进行具体说明。
如图12所示,在一些实施方式中,目标人脸识别网络可以包括属性网络M attr,属性网络M attr表示对第二人脸数据的属性信息进行识别的网络,属性网络M attr可以基于属性信息的类型预先训练得到。
通过前述可知,目标人脸识别网络引入属性信息的目的是为了更好的融合第一人脸特征数据和第三人脸特征数据,因此属性信息的类型可以是主要针对第一人脸数据和第二人脸数据差异化区分的属性信息。
例如,假设第一人脸数据主要为用户面部无任何佩戴物的人脸数据,而第二人脸数据主要为用户面部佩戴口罩的人脸数据。从而,属性网络M attr即可以是针对用户面部是否佩戴口罩预先进行训练得到的,主要用于提取用户面部佩戴物的特征,预测得到对应的属性信息。
参见图12所示,在本实施方式中,将第二人脸数据输入属性网络M attr中,可以得到第二人脸数据的属性信息。将第二人脸数据输入第一人脸识别网络M s得到第一人脸特征数据,将第二人脸数据输入训练后的第二人脸识别网络M A得到第三人脸特征数据。
在一些实施方式中,在基于属性信息确定第一人脸特征数据的第一权值和第三人脸特征数据的第二权值时,可以进一步基于平滑系数对属性信息进行平滑处理。
例如一个示例中,可以在第二人脸识别网络M S中增加一个全连接层分支,从而可以根据第二人脸数据输出对应的平滑系数T,并利用平滑系数T对属性信息进行平滑处理,表示为:
Figure PCTCN2022128143-appb-000031
式(8)中,Attr s表示平滑处理后的属性信息,Attr表示属性网络输出的属性信息,T表示第二人脸识别网络输出的平滑系数。
根据式(8)得到平滑处理后的属性信息之后,即可根据该属性信息确定第一人脸特征数据的第一权值,以及第三人脸特征数据的第二权值。然后基于第一权值和第二权值,对第一人脸特征数据和第三人脸特征数据进行线性加权融合处理,得到融合特征数 据。
在得到融合特征数据之后,分类层根据融合特征数据预测得到对应的分类结果,然后基于分类结果与标签数据之间的差异,并根据该差异反向传播对目标人脸识别网络的分类层参数进行优化,直至满足收敛条件,完成目标人脸识别网络的训练。
对于第二客户端200,在得到训练完成的目标人脸识别网络之后,即可将该目标人脸识别网络通过网络300发送至第一客户端100。可以理解,本公开实施方式中,仅需要第一客户端100向第二客户端200发送一次第一人脸识别网络、第二客户端200向第一客户端100发送一次目标人脸识别网络,除此之外不需要进行任何涉及孤岛数据的通信,可有效保护孤岛数据安全,并且网络架构简单,部署容易,成本低。
通过上述可知,本公开实施方式中,在保证孤岛数据安全性的前提下,实现对多方孤岛数据的同时利用,训练得到的目标神经网络具有更好的兼容性,提高网络对多方数据的识别能力,大大提高目标识别精度。并且,融合属性信息对目标属性进行识别预测,进一步增强网络对不同属性信息的孤岛数据的兼容性,提高目标识别精度。
本公开实施方式提供了一种目标识别方法,该目标识别方法可应用于电子设备。本公开实施方式的电子设备可以是任何适于实施的设备类型,例如移动终端、可穿戴设备、车载设备、服务器、云平台等,本公开对此不作限制。
如图13所示,在一些实施方式中,本公开示例的目标识别方法,可以包括步骤S1310和步骤S1320。
S1310、获取待测图像。
S1320、将待测图像输入预先训练得到的目标识别网络,得到目标识别网络输出的识别结果。
具体而言,本公开实施方式所述的目标识别网络,是根据前述任意实施方式的训练方法训练得到的目标神经网络。
以人脸识别场景为例,在人脸识别中,待测图像即为期望于对图像中的人脸进行识别的图像,即可将包括待测人脸的待测图像输入本公开所述的目标识别网络中,从而即可得到目标识别网络输出的识别结果。
当然,可以理解,本公开实施方式的目标识别并不局限于人脸识别场景,还可以是其他任何适于实施的场景,例如车辆识别、自然场景识别等,本公开对此不再赘述。
通过上述可知,在本公开实施方式中,由于目标识别网络对多方孤岛数据均具有很好的兼容识别能力,因此对待测图像的识别精度更高,满足高精度的目标识别场景。
本公开实施方式提供了一种神经网络的训练装置,该装置可应用于第二客户端,以使第一样本图像数据和第二样本图像数据可以同时参与网络训练。
如图14所示,在一些实施方式中,本公开示例的神经网络的训练装置,可以包括:网络获取模块10,被配置为获取第一神经网络,所述第一神经网络为第一客户端利用第一样本图像数据预先训练得到,所述第一样本图像数据为所述第一客户端能够获取的第一孤岛数据;第一处理模块20,被配置为将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,所述第二样本图像数据为所述第二客户端能够获取的第二孤岛数据;第一训练模块30,被配置为根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络;第二训练模块40,被配置为利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,直至满足收敛条件。
通过上述可知,本公开实施方式中,在保证孤岛数据安全性的前提下,实现对多方孤岛数据的同时利用,训练得到的目标神经网络具有更好的兼容性,提高网络对多方数据的识别能力,大大提高目标识别精度。
在一些实施方式中,所述第二样本图像数据可以包括多个目标类别,其中,每个 目标类别中包括至少一个目标样本图像。所述第一处理模块20被配置为:对于每个目标类别,将所述目标类别包括的各个所述目标样本图像输入所述第一神经网络,得到所述第一神经网络输出的每个所述目标样本图像所对应的第一图像特征;对于每一个所述目标类别,根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的第一类中心特征以及类特征范围;将所述第一类中心特征和所述类特征范围确定为所述目标类别对应的所述第一特征数据。
在一些实施方式中,所述第一处理模块20被配置为:根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的所述第一类中心特征;确定所述目标类别中每个所述目标样本图像所对应的所述第一图像特征与所述第一类中心特征的相似度,并根据所述相似度的最大值和最小值,确定所述目标类别对应的所述类特征范围。
在一些实施方式中,所述第一训练模块30被配置为:将所述第二样本图像数据输入待训练的第二神经网络,得到所述第二神经网络输出的第二特征数据;基于所述第二特征数据与所述第二样本图像数据包括的标签数据之间的第一差异,以及所述第二特征数据与所述第一特征数据之间的第二差异,对所述第二神经网络的网络参数进行调整,直至满足收敛条件,得到所述训练后的第二神经网络。
在一些实施方式中,所述第二样本图像数据可以包括多个目标类别,其中,每个目标类别包括至少一个目标样本图像。所述第一训练模块30被配置为:对于每个所述目标类别,将所述目标类别包括的各个所述目标样本图像输入待训练的第二神经网络,得到所述第二神经网络输出的每个所述目标样本图像所对应的第二图像特征;根据所述目标类别中每个所述目标样本图像的所述第二图像特征,确定所述目标类别对应的第二类中心特征;将所述第二类中心特征以及每个所述目标样本图像对应的所述第二图像特征,确定为目标类别对应的所述第二特征数据。
在一些实施方式中,每个目标类别的第一特征数据包括每个目标样本图像所属的目标类别的第一类中心特征以及类特征范围。所述第一训练模块30被配置为:对于每个所述目标类别,根据所述目标类别中每个所述目标样本图像对应的所述第二图像特征和所述标签数据,确定所述第一差异;根据所述目标类别对应的所述第二类中心特征与所述第一类中心特征的差异,以及所述目标类别中每个所述目标样本图像对应的所述第二图像特征与所述类特征范围的差异,确定所述第二差异;基于所述第一差异和所述第二差异,对所述第二神经网络的网络参数进行调整。
在一些实施方式中,所述第二训练模块40被配置为:将所述第二样本图像数据输入训练后的第二神经网络,得到所述第二神经网络输出的第三特征数据;根据基于所述第二样本图像数据确定的融合权重,对所述第一特征数据和所述第三特征数据进行融合处理,得到融合特征数据;基于所述融合特征数据和所述第二样本图像数据包括的标签数据之间的第三差异,对所述目标神经网络的网络参数进行调整,直至满足收敛条件。
在一些实施方式中,所述第二训练模块40被配置为:将所述第二样本图像数据输入所述目标神经网络的属性网络,得到所述属性网络输出的属性信息;根据所述属性信息确定所述第一特征数据的第一权值,和所述第三特征数据的第二权值;基于所述第一权值和所述第二权值,对所述第一特征数据和所述第三特征数据进行融合处理,得到所述融合特征数据。
在一些实施方式中,所述第一神经网络可以包括第一人脸识别网络,所述第一样本图像数据为第一人脸数据。
在一些实施方式中,所述第二神经网络可以包括第二人脸识别网络,所述第二样本图像数据为第二人脸数据。
通过上述可知,本公开实施方式中,在保证孤岛数据安全性的前提下,实现对多方孤岛数据的同时利用,训练得到的目标神经网络具有更好的兼容性,提高网络对多方数据的识别能力,大大提高目标识别精度。并且,融合属性信息对目标属性进行识别预测,进一步增强网络对不同属性信息的孤岛数据的兼容性,提高目标识别精度。
如图15所示,在一些实施方式中,本公开示例提供了一种目标识别装置,可以包括:图像获取模块50,被配置为获取待测图像,所述待测图像中包括待测目标;第二处理模块60,被配置为将所述待测图像输入预先训练得到的目标识别网络,得到所述目标识别网络输出的识别结果,所述目标识别网络是根据上文任一实施方式所述的训练方法得到的目标神经网络。
通过上述可知,在本公开实施方式中,由于目标识别网络对多方孤岛数据均具有很好的兼容识别能力,因此对待测图像的识别精度更高,满足高精度的目标识别场景。
在一些实施方式中,本公开示例提供了一种电子设备,可以包括:处理器;和存储器,所述存储器存储有计算机指令,所述计算机指令用于使所述处理器执行上文任一实施方式所述的方法。
在一些实施方式中,本公开示例提供了一种存储介质,存储有计算机指令,所述计算机指令用于使计算机执行上文任一实施方式所述的方法。
具体而言,图16示出了适于用来实现本公开方法的电子设备600的结构示意图,通过图16所示电子设备,可实现上述处理器及存储介质相应功能。
如图16所示,电子设备600包括处理器601,其可以根据存储在存储器602中的程序或者从存储部分608加载到存储器602中的程序而执行各种适当的动作和处理。在存储器602中,还存储有电子设备600操作所需的各种程序和数据。处理器601和存储器602通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施方式,上文方法过程可以被实现为计算机软件程序。例如,本公开的实施方式包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行上述方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。
附图中的流程图和框图,图示了按照本公开各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
显然,上述实施方式仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本公开创造的保护范围之中。

Claims (14)

  1. 一种神经网络的训练方法,其特征在于,应用于至少包括第一客户端和第二客户端的神经网络训练系统中的所述第二客户端,所述方法包括:
    获取第一神经网络,所述第一神经网络为所述第一客户端利用第一样本图像数据预先训练得到,所述第一样本图像数据为所述第一客户端能够获取的第一孤岛数据;
    将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,所述第二样本图像数据为所述第二客户端能够获取的第二孤岛数据;
    根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络;
    利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,直至满足收敛条件。
  2. 根据权利要求1所述的方法,其特征在于,
    所述第二样本图像数据包括多个目标类别,其中,每个目标类别中包括至少一个目标样本图像;
    所述将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,包括:
    对于每个所述目标类别,
    将所述目标类别包括的各个所述目标样本图像输入所述第一神经网络,得到所述第一神经网络输出的每个所述目标样本图像所对应的第一图像特征;
    根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的第一类中心特征以及类特征范围;
    将所述第一类中心特征和所述类特征范围确定为所述目标类别对应的所述第一特征数据。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的第一类中心特征以及类特征范围,包括:
    根据所述目标类别中每个所述目标样本图像的所述第一图像特征,确定所述目标类别对应的所述第一类中心特征;
    确定所述目标类别中每个所述目标样本图像所对应的所述第一图像特征与所述第一类中心特征的相似度,并
    根据所述相似度的最大值和最小值,确定所述目标类别对应的所述类特征范围。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络,包括:
    将所述第二样本图像数据输入待训练的第二神经网络,得到所述第二神经网络输出的第二特征数据;
    基于所述第二特征数据与所述第二样本图像数据包括的标签数据之间的第一差异,以及所述第二特征数据与所述第一特征数据之间的第二差异,对所述第二神经网络的网络参数进行调整,直至满足收敛条件,得到所述训练后的第二神经网络。
  5. 根据权利要求4所述的方法,其特征在于,
    所述第二样本图像数据包括多个目标类别,其中,每个目标类别包括至少一个目标样本图像;
    所述将所述第二样本图像数据输入待训练的第二神经网络,得到所述第二神经网络输出的第二特征数据,包括:
    对于每个所述目标类别,
    将所述目标类别包括的各个所述目标样本图像输入待训练的第二神经网络,得到所述第二神经网络输出的每个所述目标样本图像所对应的第二图像特征;
    根据所述目标类别中每个所述目标样本图像的所述第二图像特征,确定所述目标类别对应的第二类中心特征;
    将所述第二类中心特征以及每个所述目标样本图像对应的所述第二图像特征,确定为所述目标类别对应的所述第二特征数据。
  6. 根据权利要求5所述的方法,其特征在于,
    每个目标类别的第一特征数据包括每个目标样本图像所属的目标类别的第一类中心特征以及类特征范围;
    所述基于所述第二特征数据与所述第二样本图像数据包括的标签数据之间的第一差异,以及所述第二特征数据与所述第一特征数据之间的第二差异,对所述第二神经网络的网络参数进行调整,包括:
    对于每个所述目标类别,
    根据所述目标类别中每个所述目标样本图像对应的所述第二图像特征和所述标签数据,确定所述第一差异;
    根据所述目标类别对应的所述第二类中心特征与所述第一类中心特征的差异,以及所述目标类别中每个所述目标样本图像对应的所述第二图像特征与所述类特征范围的差异,确定所述第二差异;
    基于所述第一差异和所述第二差异,对所述第二神经网络的网络参数进行调整。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,包括:
    将所述第二样本图像数据输入所述训练后的第二神经网络,得到所述第二神经网络输出的第三特征数据;
    根据基于所述第二样本图像数据确定的融合权重,对所述第一特征数据和所述第三特征数据进行融合处理,得到融合特征数据;
    基于所述融合特征数据和所述第二样本图像数据包括的标签数据之间的第三差异,对所述目标神经网络的网络参数进行调整,直至满足收敛条件。
  8. 根据权利要求7所述的方法,其特征在于,所述根据基于所述第二样本图像数据确定的融合权重,对所述第一特征数据和所述第三特征数据进行融合处理,得到融合特征数据,包括:
    将所述第二样本图像数据输入所述目标神经网络的属性网络,得到所述属性网络输出的属性信息;
    根据所述属性信息确定所述第一特征数据的第一权值,和所述第三特征数据的第二权值;
    基于所述第一权值和所述第二权值,对所述第一特征数据和所述第三特征数据进行融合处理,得到所述融合特征数据。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,
    所述第一神经网络包括第一人脸识别网络,所述第一样本图像数据为第一人脸数据;和/或,
    所述第二神经网络包括第二人脸识别网络,所述第二样本图像数据为第二人脸数据。
  10. 一种目标识别方法,其特征在于,包括:
    获取待测图像,所述待测图像中包括待测目标;
    将所述待测图像输入预先训练得到的目标识别网络,得到所述目标识别网络输出的识别结果,所述目标识别网络是根据权利要求1至9任一项所述的训练方法得到的目标神经网络。
  11. 一种神经网络的训练装置,其特征在于,应用于至少包括第一客户端和第二客户端的神经网络训练系统中的所述第二客户端,所述装置包括:
    网络获取模块,被配置为获取第一神经网络,所述第一神经网络为所述第一客户端利用第一样本图像数据预先训练得到,所述第一样本图像数据为所述第一客户端能够获取的第一孤岛数据;
    第一处理模块,被配置为将第二样本图像数据输入所述第一神经网络,得到所述第一神经网络输出的第一特征数据,所述第二样本图像数据为所述第二客户端能够获取的第二孤岛数据;
    第一训练模块,被配置为根据所述第二样本图像数据以及所述第一特征数据,对待训练的第二神经网络进行训练,得到训练后的第二神经网络;
    第二训练模块,被配置为利用所述第二样本图像数据,对包括所述训练后的第二神经网络和所述第一神经网络的目标神经网络进行训练,直至满足收敛条件。
  12. 一种目标识别装置,其特征在于,包括:
    图像获取模块,被配置为获取待测图像,所述待测图像中包括待测目标;
    第二处理模块,被配置为将所述待测图像输入预先训练得到的目标识别网络,得到所述目标识别网络输出的识别结果,所述目标识别网络是根据权利要求1至9任一项所述的训练方法得到的目标神经网络。
  13. 一种电子设备,其特征在于,包括:
    处理器;和
    存储器,存储有计算机指令,所述计算机指令用于使所述处理器执行根据权利要求1至10任一项所述的方法。
  14. 一种存储介质,其特征在于,存储有计算机指令,所述计算机指令用于使计算机执行根据权利要求1至10任一项所述的方法。
PCT/CN2022/128143 2022-03-29 2022-10-28 目标识别及神经网络的训练 WO2023184958A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210322086.4 2022-03-29
CN202210322086.4A CN114912572A (zh) 2022-03-29 2022-03-29 目标识别方法及神经网络的训练方法

Publications (1)

Publication Number Publication Date
WO2023184958A1 true WO2023184958A1 (zh) 2023-10-05

Family

ID=82763493

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128143 WO2023184958A1 (zh) 2022-03-29 2022-10-28 目标识别及神经网络的训练

Country Status (2)

Country Link
CN (1) CN114912572A (zh)
WO (1) WO2023184958A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114912572A (zh) * 2022-03-29 2022-08-16 深圳市商汤科技有限公司 目标识别方法及神经网络的训练方法
CN116798103B (zh) * 2023-08-29 2023-12-01 广州诚踏信息科技有限公司 基于人工智能的人脸图像处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256423A (zh) * 2017-05-05 2017-10-17 深圳市丰巨泰科电子有限公司 一种增广神经网架构及其训练方法、计算机可读存储介质
CN110598504A (zh) * 2018-06-12 2019-12-20 北京市商汤科技开发有限公司 图像识别方法及装置、电子设备和存储介质
CN113191479A (zh) * 2020-01-14 2021-07-30 华为技术有限公司 联合学习的方法、系统、节点及存储介质
US20210279528A1 (en) * 2020-03-03 2021-09-09 Assa Abloy Ab Systems and methods for fine tuning image classification neural networks
CN114912572A (zh) * 2022-03-29 2022-08-16 深圳市商汤科技有限公司 目标识别方法及神经网络的训练方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256423A (zh) * 2017-05-05 2017-10-17 深圳市丰巨泰科电子有限公司 一种增广神经网架构及其训练方法、计算机可读存储介质
CN110598504A (zh) * 2018-06-12 2019-12-20 北京市商汤科技开发有限公司 图像识别方法及装置、电子设备和存储介质
CN113191479A (zh) * 2020-01-14 2021-07-30 华为技术有限公司 联合学习的方法、系统、节点及存储介质
US20210279528A1 (en) * 2020-03-03 2021-09-09 Assa Abloy Ab Systems and methods for fine tuning image classification neural networks
CN114912572A (zh) * 2022-03-29 2022-08-16 深圳市商汤科技有限公司 目标识别方法及神经网络的训练方法

Also Published As

Publication number Publication date
CN114912572A (zh) 2022-08-16

Similar Documents

Publication Publication Date Title
WO2023184958A1 (zh) 目标识别及神经网络的训练
WO2021164365A1 (zh) 图神经网络模型训练方法、装置及系统
CN111400504B (zh) 企业关键人的识别方法和装置
CN112580826B (zh) 业务模型训练方法、装置及系统
CN113128701A (zh) 面向样本稀疏性的联邦学习方法及系统
WO2023143178A1 (zh) 对象分割方法、装置、设备及存储介质
WO2024104376A1 (zh) 一种视听辅助的细粒度触觉信号重建方法
CN113191479A (zh) 联合学习的方法、系统、节点及存储介质
CN113689372A (zh) 图像处理方法、设备、存储介质及程序产品
CN115660116A (zh) 基于稀疏适配器的联邦学习方法及系统
CN113033824B (zh) 模型超参数确定方法、模型训练方法及系统
CN115114329A (zh) 数据流异常检测的方法、装置、电子设备和存储介质
CN116644439B (zh) 一种基于去噪扩散模型的模型安全性评估方法
CN116935083A (zh) 一种图像聚类方法和装置
WO2023217117A1 (zh) 图像评估方法、装置、设备、存储介质和程序产品
US20230316278A1 (en) Face recognition method, apparatus, electronic device, and storage medium
CN115937020B (zh) 图像处理方法、装置、设备、介质和程序产品
WO2023116744A1 (zh) 图像处理方法、装置、设备及介质
Krishna et al. Integrated intelligent computing, communication and security
CN114648712A (zh) 视频分类方法、装置、电子设备及计算机可读存储介质
WO2024139666A1 (zh) 双目标域推荐模型的训练方法及装置
CN114925744B (zh) 联合训练的方法及装置
US20230084507A1 (en) Servers, methods and systems for fair and secure vertical federated learning
TWI834530B (zh) 加解密系統以及方法
CN114429420B (zh) 图像的生成方法、装置、可读介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934776

Country of ref document: EP

Kind code of ref document: A1