Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary architecture 100 to which embodiments of the disclosed method for generating a model or apparatus for generating a model may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
Various client applications may be installed on the terminal devices 101, 102, 103. Such as image processing applications, search applications, content sharing applications, beauty applications, instant messaging applications, model training applications, and the like. The terminal devices 101, 102, 103 may interact with the server 105 via the network 104 to receive or send messages or the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices that can receive user operations, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, and may be a model training server performing model training using a training sample set uploaded by the terminal devices 101, 102, 103, for example. The model training server can identify sample images in the obtained training samples to obtain labeling information corresponding to the training samples in the training sample set, and then model training is performed by using the training samples and the labeling information corresponding to the training samples to generate an information prediction model. After the information prediction model is trained, the server may transmit the information prediction model to the terminal apparatuses 101, 102, and 103, or may predict face images using the human information prediction model and transmit the prediction results.
The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating the model provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the terminal devices 101, 102, and 103. Accordingly, the means for generating the model may be provided in the server 105, or may be provided in the terminal devices 101, 102, 103. Furthermore, the information prediction method provided by the embodiment of the present disclosure may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103, and accordingly, the information prediction apparatus may be provided in the server 105, or may be provided in the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the training sample set required to train the model need not be obtained from a remote location and the facial image to be recognized need not be obtained from a remote location, the system architecture described above may not include a network, including only a terminal device or server.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present disclosure is shown. The method for generating a model comprises the following steps:
step 201, a training sample set is obtained.
In this embodiment, an executing subject (for example, the terminal device 101, 102, 103 or the server 105 shown in fig. 1) of the method for generating a model may acquire the training sample set by means of wired connection or wireless connection. Here, the training samples in the training sample set include sample face images. The sample face image may be acquired by the imaging device to which the execution subject is attached or by the imaging device connected thereto, or may be stored locally in advance. The execution body described above may acquire the sample face image by way of path information indicating a location where the sample face image is stored. The object presented by the sample face image may be a human face.
Step 202, identifying sample facial images in the training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to objects represented by the sample facial images.
In this embodiment, the specific number of the information recognition models is set manually, and can be determined according to the needs of the application scenario. The preset number of information recognition models are different information recognition models. Specifically, an information recognition model for recognizing facial expressions, an information recognition model for recognizing ages, an information recognition model for recognizing races, an information recognition model for recognizing sexes, and the like may be included, but not limited thereto. That is, each information recognition model has a specific recognition function, which can usually recognize only one or two kinds of specific information. Here, the selected information recognition model needs to be determined according to the related information obtained as needed. The above-mentioned associated information includes but is not limited to: facial expression information, attribute information. Typically, the attribute information includes, but is not limited to, age, gender, race, and the like. Specifically, in a certain application scenario, when the related information required to be obtained includes facial expression information and age information, the selected information identification model may be an information identification model for performing facial expression identification and an information identification model for performing age identification.
In the present embodiment, each of the above-described preset number of information recognition models is used to characterize association information between the sample face image and the recognition result. Therefore, for the same training sample set, the identification result obtained by identifying the sample face image by each information identification model is used as the related information.
As an example, in a certain application scenario, the preset number of information recognition models include an information recognition model for recognizing facial expressions, an information recognition model for recognizing ages and sexes, and an information recognition model for recognizing ethnicities. The execution subject may input each sample face image in the training sample set to the respective information recognition models, and obtain facial expression information corresponding to each sample face image output by the information recognition model for recognizing facial expression, age and gender information corresponding to each sample face image output by the information recognition model for recognizing age and gender, race information corresponding to each sample face image output by the information recognition model for recognizing race.
Here, each of the information recognition models may be a correspondence table in which a plurality of face images and corresponding information are stored, the correspondence table being previously prepared by a technician based on statistics of a large number of face images and information (facial expression information, age information, race information) corresponding to the face images; the model may be a model obtained by training an initial model (e.g., a neural network) by a machine learning method based on a preset training sample.
Step 203, training the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output by using a machine learning method to obtain an information prediction model.
In this embodiment, according to the training sample set obtained in step 201 and the association information corresponding to each training sample in the training sample set obtained in step 202, the executing entity may input each sample face image in the training sample set to the information prediction model to be trained to obtain an output result, compare the output result with the association information, and determine whether the training of the information prediction model to be trained is completed based on the comparison result. Specifically, it may be determined whether a difference between the output result and the associated information reaches a preset threshold. When the preset threshold is reached, the training is determined to be finished, and when the training threshold is not reached, the parameters of the information prediction model to be trained can be adjusted to continue training. Here, the information prediction model to be trained may be a convolutional neural network, a deep neural network, or the like.
In some optional implementation manners of this embodiment, the information prediction model may be further trained by the following steps:
step 2021, inputting each sample face image in the training sample set to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image; and inputting the obtained feature map into the full-connection layer to obtain a sample output result of the sample face image.
Here, the information prediction model to be trained may be a neural network (e.g., a convolutional neural network, a deep neural network), or the like. The neural network may include a feature extraction layer and a fully connected layer. The feature extraction layer is used for extracting features of the face image and generating a feature map corresponding to the input sample face image. The feature map may include texture, shape, contour, etc. of the image. And the full connection layer is connected with the feature extraction layer and is used for determining a sample output result corresponding to the sample facial image after the full connection is carried out on the features extracted by the feature extraction layer.
Here, the information prediction model to be trained is previously set with facial expression information of a plurality of categories, a plurality of age group segments, a plurality of race category information, and gender information. The fully-connected layer may determine probability values corresponding to the respective pieces of information based on the features indicated by the respective feature maps. For example, a probability value corresponding to each facial expression information, a probability value corresponding to each age group information, a probability value corresponding to each family category information, a probability value corresponding to a female, and a probability value corresponding to a male.
The execution subject may select, as the sample output result, information indicated by the maximum probability value for each category from a plurality of probability values corresponding to information indicating different categories.
Step 2022, determining whether the preset loss function converges based on the obtained sample output result.
Here, the preset loss function may be a logarithmic loss function, for example. Whether the preset loss function converges or not is determined, that is, whether the loss value of the loss function reaches a preset threshold or whether the absolute value of the change in the loss value is smaller than the preset threshold or not is determined. The preset loss function may be determined to converge in response to the loss value reaching a preset threshold, or the loss value changing by an absolute value less than the preset threshold. It is to be noted that the absolute value of the above-mentioned change in the loss value is an absolute value of a difference between the loss value calculated by the loss function based on the current training and the loss value obtained by the last training. Here, the loss value of the above-described preset loss function is used to indicate an error between the sample output result in the obtained sample output result set and the associated information of the corresponding sample face image.
Step 2023, determining that the training of the information prediction model is complete in response to determining that the predetermined loss function converges.
In this embodiment, according to whether the predetermined loss function determined in step 2022 converges, when the predetermined loss function converges, it may be determined that the training of the information prediction model is completed.
Step 2024, in response to determining that the preset loss function is not converged, updating parameters of the information prediction model to be trained by using a back propagation algorithm, and continuing to execute the training steps shown in steps 2021 to 2023.
In this embodiment, the parameter for updating the neural network to be trained may be, for example, a value of a filter of each layer of the neural network, a size of the filter, a step size, and the like, and may also be used to update the number of layers of the neural network. The executing agent may update the parameters of the neural network to be trained using a back propagation algorithm in response to determining that the preset penalty function is not converged, and then proceed to the training steps shown in steps 2021-2023.
According to the method for generating the model, the training sample set is obtained, the pre-trained preset number of information recognition models are used for recognizing the sample face images in the training sample set, and the associated information corresponding to the objects displayed by the sample face images is obtained, so that manual marking is not needed, the labor cost is saved, and the working efficiency is improved.
With further reference to fig. 3, a flow 300 of one embodiment of an information prediction method of the present disclosure is shown. The process 300 of the information generating method includes the following steps:
step 301, an image of the face of the target user is acquired.
In the present embodiment, the execution subject of the above-described information generation method (e.g., the terminal apparatus 101, 102, 103 or the server 105 shown in fig. 1) may acquire the face image of the target user by means of wired connection or wireless connection. Here, the face image of the target user may be acquired by the imaging device to which the execution subject is attached or by the imaging device connected thereto, or may be stored locally in advance. The execution body described above may acquire the face image by way of path information indicating a location where the face image of the target user is stored.
Step 302, inputting the face image to a pre-trained information prediction model to obtain the associated information corresponding to the face image.
In this embodiment, the information prediction model is generated by the generation method of the information prediction model described in the embodiment corresponding to fig. 2.
Here, the association information includes at least one of: facial expression information, attribute information. Specifically, the feature extraction layer of the information prediction model may extract features of the face image to obtain a feature map corresponding to the face image. Then, the fully-connected layer of the information prediction model may fully connect the feature maps to obtain probability values corresponding to respective pieces of information preset in the information prediction model. The preset information of various categories may include facial expression information, age group sections, race category information, gender information, and the like, and the specific category of the related information is determined based on the training result of the information prediction model. For example, a probability value corresponding to each facial expression information, a probability value corresponding to each age group information, a probability value corresponding to each family category information, a probability value corresponding to a female, and a probability value corresponding to a male.
The execution subject may select, as the association information, information indicated by a maximum probability value for each category from a plurality of probability values corresponding to information indicating different categories.
According to the method provided by the embodiment of the disclosure, the facial image of the target user is acquired, and then the acquired facial image is input to the pre-trained information prediction model, so that the associated information corresponding to the facial image is obtained, and the predicted associated information is more accurate.
Further referring to fig. 4, an application scenario of the information prediction method of the present disclosure is shown.
In the application scenario shown in fig. 4, the photographing apparatus inputs the acquired user face image 301 to the server 402. The server 402 inputs the acquired head portrait 401 of the face of the user to the information prediction model 403, so as to obtain the relevant information corresponding to the user presented by the image 401 of the face of the user. The associated information includes: smile, male, yellow race.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating a model, which corresponds to the embodiment of the method shown in fig. 2, and which may be applied in various electronic devices in particular.
As shown in fig. 5, the apparatus 500 for generating a model provided in this embodiment includes an obtaining unit 501, a recognition unit 502, and a training unit 503. Wherein the obtaining unit 501 is configured to obtain a training sample set, the training sample set including a sample face image; the recognition unit 502 is configured to recognize sample facial images in a training sample set by using a preset number of information recognition models trained in advance, and obtain associated information corresponding to an object represented by the sample facial images; a training unit 503 configured to train the sample face images in the training sample set as input and the related information corresponding to the sample face images as expected output, and obtain an information prediction model by using a machine learning method.
In the present embodiment, in the apparatus 500 for generating a model: the specific processing of the obtaining unit 501, the identifying unit 502, and the training unit 503 and the technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some optional implementation manners of this embodiment, the information prediction model to be trained includes a feature extraction layer and a full connection layer; and the training unit 503 is further configured to: based on the sample face images in the training sample set, performing the following training steps: for each sample face image in a training sample set, inputting the sample face image to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image; inputting the obtained feature map into a full connection layer to obtain a sample output result of the sample face image; determining whether a preset loss function is converged based on a sample output result corresponding to a training sample in the determined training sample set, wherein the preset loss function is used for indicating an error between the sample output result in the obtained sample output result set and corresponding associated information; and determining that the training of the information prediction model is completed in response to determining that the preset loss function converges.
In some optional implementations of this embodiment, the apparatus 500 for generating a model further includes: and an adjusting unit (not shown) configured to adjust parameters of the information prediction model to be trained by using a back propagation algorithm in response to determining that the preset loss function is not converged, and to continue to perform the training step.
The device for generating the model, provided by the embodiment of the disclosure, is used for identifying the sample facial images in the training sample set by acquiring the training sample set and then utilizing the pre-trained information identification models with the preset number to obtain the associated information corresponding to the object presented by the sample facial images, so that manual marking is not needed, the labor cost is saved, and the working efficiency is improved.
With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an information prediction apparatus, which corresponds to the embodiment of the method shown in fig. 3, and which can be applied in various electronic devices.
As shown in fig. 6, the information prediction apparatus 600 provided in the present embodiment includes an image acquisition unit 601 and a generation unit 602. An image acquisition unit 601 configured to acquire a face image of a target user; a generating unit 602 configured to input the face image to the information prediction model generated by the method according to any one of the embodiments of the first aspect, and obtain associated information corresponding to the face image, wherein the associated information includes at least one of the following: facial expression information, attribute information.
Referring now to fig. 7, shown is a schematic diagram of an electronic device (e.g., terminal device in fig. 1) 700 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be included in the terminal device; or may exist separately without being assembled into the terminal device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein the training sample set comprises a sample face image; identifying sample facial images in a training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to an object represented by the sample facial images; the information prediction model is obtained by training using a machine learning method, with the sample face images in the training sample set as input and the related information corresponding to the sample face images as expected output.
Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a facial image of a target user; inputting the face image to the information prediction model generated by the method according to the first aspect, and obtaining the associated information corresponding to the face image, wherein the associated information comprises at least one of the following: facial expression information, attribute information.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a processor including an acquisition unit, a recognition unit, and a training unit. Where the names of these units do not in some cases constitute a limitation on the units themselves, for example, an acquisition unit may also be described as a "unit that acquires a training sample set".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.