CN110009059B

CN110009059B - Method and apparatus for generating a model

Info

Publication number: CN110009059B
Application number: CN201910304768.0A
Authority: CN
Inventors: 陈日伟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2022-03-29
Anticipated expiration: 2039-04-16
Also published as: CN110009059A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for generating models. One embodiment of the method comprises: acquiring a training sample set, wherein the training sample set comprises a sample face image; identifying sample facial images in a training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to an object represented by the sample facial images; the information prediction model is obtained by training using a machine learning method, with the sample face images in the training sample set as input and the related information corresponding to the sample face images as expected output. This embodiment can need not carry out artifical mark, has saved the cost of labor, improves work efficiency.

Description

Method and apparatus for generating a model

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating a model.

Background

With the development of science and technology and the popularization of artificial intelligence technology, the artificial intelligence technology can be applied to various fields. For example, the method can be applied to various fields such as voice recognition, image recognition, smart home and the like. The development of artificial intelligence technology provides great convenience for users in all aspects. The method of machine learning enables the artificial intelligence technology to be developed rapidly.

In the related machine learning method, in order to obtain a model having a specific function (for example, an image recognition function or a voice recognition function), a training sample is generally labeled manually, and the model is trained using the training sample and manual labeling information to obtain a desired model.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatuses for generating models.

In a first aspect, an embodiment of the present disclosure provides a method for generating a model, the method including: acquiring a training sample set, wherein the training sample set comprises a sample face image; identifying sample facial images in a training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to an object represented by the sample facial images; the information prediction model is obtained by training using a machine learning method, with the sample face images in the training sample set as input and the related information corresponding to the sample face images as expected output.

In some embodiments, the information prediction model to be trained comprises a feature extraction layer and a fully connected layer; and training to obtain an information prediction model by using a machine learning method by taking the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output, wherein the training comprises the following training steps: for each sample face image in a training sample set, inputting the sample face image to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image; inputting the obtained feature map into a full connection layer to obtain a sample output result of the sample face image; determining whether a preset loss function is converged based on the obtained sample output result, wherein the preset loss function is used for indicating an error between the sample output result in the obtained sample output result set and the corresponding associated information; and determining that the training of the information prediction model is completed in response to determining that the preset loss function converges.

In some embodiments, the method further comprises: and in response to the fact that the preset loss function is determined not to be converged, updating parameters of the information prediction model to be trained by using a back propagation algorithm, and continuing to execute the training step.

In a second aspect, an embodiment of the present disclosure provides an information prediction method, including: acquiring a facial image of a target user; inputting the face image to the information prediction model generated by the method according to the first aspect, and obtaining the associated information corresponding to the face image, wherein the associated information comprises at least one of the following: facial expression information, attribute information.

In a third aspect, an embodiment of the present disclosure provides an apparatus for generating a model, the apparatus including: an acquisition unit configured to acquire a training sample set including a sample face image; the identification unit is configured to identify the sample facial images in the training sample set by using a preset number of information identification models trained in advance to obtain associated information corresponding to the objects presented by the sample facial images; and a training unit configured to train the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output, and obtain the information prediction model by utilizing a machine learning method.

In some embodiments, the information prediction model to be trained comprises a feature extraction layer and a fully connected layer; and the training unit is further configured to perform the following training steps: for each sample face image in a training sample set, inputting the sample face image to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image; inputting the obtained feature map into a full connection layer to obtain a sample output result of the sample face image; determining whether a preset loss function is converged based on the obtained sample output result, wherein the preset loss function is used for indicating an error between the sample output result in the obtained sample output result set and the corresponding associated information; and determining that the training of the information prediction model is completed in response to determining that the preset loss function converges.

In some embodiments, the apparatus further comprises: and an adjusting unit configured to adjust parameters of the information prediction model to be trained by using a back propagation algorithm in response to determining that the preset loss function is not converged, and continuing to perform the training step.

In a fourth aspect, an embodiment of the present disclosure provides an information prediction apparatus, including: an image acquisition unit configured to acquire a face image of a target user; a generating unit configured to input the face image to the information prediction model generated by the method according to any one of the embodiments of the first aspect, and obtain associated information corresponding to the face image, wherein the associated information includes at least one of the following: facial expression information, attribute information.

In a fifth aspect, an embodiment of the present disclosure provides a terminal device, including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as described in any one of the implementations of the method described in the first and second aspects.

In a sixth aspect, embodiments of the present disclosure provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any one of the implementations of the methods described in the first and second aspects.

According to the method and the device for generating the model, the training sample set is obtained, and then the pre-trained preset number of information recognition models are used for recognizing the sample facial images in the training sample set to obtain the associated information corresponding to the objects represented by the sample facial images, so that manual marking is not needed, the labor cost is saved, and the working efficiency is improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a model according to the present disclosure;

FIG. 3 is a flow diagram for one embodiment of an information prediction method according to the present disclosure;

FIG. 4 is a schematic diagram of an application scenario of an information prediction method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating models according to the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of an information prediction apparatus according to the present disclosure;

FIG. 7 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary architecture 100 to which embodiments of the disclosed method for generating a model or apparatus for generating a model may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

Various client applications may be installed on the

terminal devices

101, 102, 103. Such as image processing applications, search applications, content sharing applications, beauty applications, instant messaging applications, model training applications, and the like. The

terminal devices

101, 102, 103 may interact with the server 105 via the network 104 to receive or send messages or the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices that can receive user operations, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, and may be a model training server performing model training using a training sample set uploaded by the

terminal devices

101, 102, 103, for example. The model training server can identify sample images in the obtained training samples to obtain labeling information corresponding to the training samples in the training sample set, and then model training is performed by using the training samples and the labeling information corresponding to the training samples to generate an information prediction model. After the information prediction model is trained, the server may transmit the information prediction model to the

terminal apparatuses

101, 102, and 103, or may predict face images using the human information prediction model and transmit the prediction results.

The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the method for generating the model provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103. Accordingly, the means for generating the model may be provided in the server 105, or may be provided in the

terminal devices

101, 102, 103. Furthermore, the information prediction method provided by the embodiment of the present disclosure may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the information prediction apparatus may be provided in the server 105, or may be provided in the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the training sample set required to train the model need not be obtained from a remote location and the facial image to be recognized need not be obtained from a remote location, the system architecture described above may not include a network, including only a terminal device or server.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present disclosure is shown. The method for generating a model comprises the following steps:

step 201, a training sample set is obtained.

In this embodiment, an executing subject (for example, the

terminal device

101, 102, 103 or the server 105 shown in fig. 1) of the method for generating a model may acquire the training sample set by means of wired connection or wireless connection. Here, the training samples in the training sample set include sample face images. The sample face image may be acquired by the imaging device to which the execution subject is attached or by the imaging device connected thereto, or may be stored locally in advance. The execution body described above may acquire the sample face image by way of path information indicating a location where the sample face image is stored. The object presented by the sample face image may be a human face.

Step 202, identifying sample facial images in the training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to objects represented by the sample facial images.

In this embodiment, the specific number of the information recognition models is set manually, and can be determined according to the needs of the application scenario. The preset number of information recognition models are different information recognition models. Specifically, an information recognition model for recognizing facial expressions, an information recognition model for recognizing ages, an information recognition model for recognizing races, an information recognition model for recognizing sexes, and the like may be included, but not limited thereto. That is, each information recognition model has a specific recognition function, which can usually recognize only one or two kinds of specific information. Here, the selected information recognition model needs to be determined according to the related information obtained as needed. The above-mentioned associated information includes but is not limited to: facial expression information, attribute information. Typically, the attribute information includes, but is not limited to, age, gender, race, and the like. Specifically, in a certain application scenario, when the related information required to be obtained includes facial expression information and age information, the selected information identification model may be an information identification model for performing facial expression identification and an information identification model for performing age identification.

In the present embodiment, each of the above-described preset number of information recognition models is used to characterize association information between the sample face image and the recognition result. Therefore, for the same training sample set, the identification result obtained by identifying the sample face image by each information identification model is used as the related information.

As an example, in a certain application scenario, the preset number of information recognition models include an information recognition model for recognizing facial expressions, an information recognition model for recognizing ages and sexes, and an information recognition model for recognizing ethnicities. The execution subject may input each sample face image in the training sample set to the respective information recognition models, and obtain facial expression information corresponding to each sample face image output by the information recognition model for recognizing facial expression, age and gender information corresponding to each sample face image output by the information recognition model for recognizing age and gender, race information corresponding to each sample face image output by the information recognition model for recognizing race.

Here, each of the information recognition models may be a correspondence table in which a plurality of face images and corresponding information are stored, the correspondence table being previously prepared by a technician based on statistics of a large number of face images and information (facial expression information, age information, race information) corresponding to the face images; the model may be a model obtained by training an initial model (e.g., a neural network) by a machine learning method based on a preset training sample.

Step 203, training the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output by using a machine learning method to obtain an information prediction model.

In this embodiment, according to the training sample set obtained in step 201 and the association information corresponding to each training sample in the training sample set obtained in step 202, the executing entity may input each sample face image in the training sample set to the information prediction model to be trained to obtain an output result, compare the output result with the association information, and determine whether the training of the information prediction model to be trained is completed based on the comparison result. Specifically, it may be determined whether a difference between the output result and the associated information reaches a preset threshold. When the preset threshold is reached, the training is determined to be finished, and when the training threshold is not reached, the parameters of the information prediction model to be trained can be adjusted to continue training. Here, the information prediction model to be trained may be a convolutional neural network, a deep neural network, or the like.

In some optional implementation manners of this embodiment, the information prediction model may be further trained by the following steps:

step 2021, inputting each sample face image in the training sample set to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image; and inputting the obtained feature map into the full-connection layer to obtain a sample output result of the sample face image.

Here, the information prediction model to be trained may be a neural network (e.g., a convolutional neural network, a deep neural network), or the like. The neural network may include a feature extraction layer and a fully connected layer. The feature extraction layer is used for extracting features of the face image and generating a feature map corresponding to the input sample face image. The feature map may include texture, shape, contour, etc. of the image. And the full connection layer is connected with the feature extraction layer and is used for determining a sample output result corresponding to the sample facial image after the full connection is carried out on the features extracted by the feature extraction layer.

Here, the information prediction model to be trained is previously set with facial expression information of a plurality of categories, a plurality of age group segments, a plurality of race category information, and gender information. The fully-connected layer may determine probability values corresponding to the respective pieces of information based on the features indicated by the respective feature maps. For example, a probability value corresponding to each facial expression information, a probability value corresponding to each age group information, a probability value corresponding to each family category information, a probability value corresponding to a female, and a probability value corresponding to a male.

The execution subject may select, as the sample output result, information indicated by the maximum probability value for each category from a plurality of probability values corresponding to information indicating different categories.

Step 2022, determining whether the preset loss function converges based on the obtained sample output result.

Here, the preset loss function may be a logarithmic loss function, for example. Whether the preset loss function converges or not is determined, that is, whether the loss value of the loss function reaches a preset threshold or whether the absolute value of the change in the loss value is smaller than the preset threshold or not is determined. The preset loss function may be determined to converge in response to the loss value reaching a preset threshold, or the loss value changing by an absolute value less than the preset threshold. It is to be noted that the absolute value of the above-mentioned change in the loss value is an absolute value of a difference between the loss value calculated by the loss function based on the current training and the loss value obtained by the last training. Here, the loss value of the above-described preset loss function is used to indicate an error between the sample output result in the obtained sample output result set and the associated information of the corresponding sample face image.

Step 2023, determining that the training of the information prediction model is complete in response to determining that the predetermined loss function converges.

In this embodiment, according to whether the predetermined loss function determined in step 2022 converges, when the predetermined loss function converges, it may be determined that the training of the information prediction model is completed.

Step 2024, in response to determining that the preset loss function is not converged, updating parameters of the information prediction model to be trained by using a back propagation algorithm, and continuing to execute the training steps shown in steps 2021 to 2023.

In this embodiment, the parameter for updating the neural network to be trained may be, for example, a value of a filter of each layer of the neural network, a size of the filter, a step size, and the like, and may also be used to update the number of layers of the neural network. The executing agent may update the parameters of the neural network to be trained using a back propagation algorithm in response to determining that the preset penalty function is not converged, and then proceed to the training steps shown in steps 2021-2023.

According to the method for generating the model, the training sample set is obtained, the pre-trained preset number of information recognition models are used for recognizing the sample face images in the training sample set, and the associated information corresponding to the objects displayed by the sample face images is obtained, so that manual marking is not needed, the labor cost is saved, and the working efficiency is improved.

With further reference to fig. 3, a flow 300 of one embodiment of an information prediction method of the present disclosure is shown. The process 300 of the information generating method includes the following steps:

step 301, an image of the face of the target user is acquired.

In the present embodiment, the execution subject of the above-described information generation method (e.g., the

terminal apparatus

101, 102, 103 or the server 105 shown in fig. 1) may acquire the face image of the target user by means of wired connection or wireless connection. Here, the face image of the target user may be acquired by the imaging device to which the execution subject is attached or by the imaging device connected thereto, or may be stored locally in advance. The execution body described above may acquire the face image by way of path information indicating a location where the face image of the target user is stored.

Step 302, inputting the face image to a pre-trained information prediction model to obtain the associated information corresponding to the face image.

In this embodiment, the information prediction model is generated by the generation method of the information prediction model described in the embodiment corresponding to fig. 2.

Here, the association information includes at least one of: facial expression information, attribute information. Specifically, the feature extraction layer of the information prediction model may extract features of the face image to obtain a feature map corresponding to the face image. Then, the fully-connected layer of the information prediction model may fully connect the feature maps to obtain probability values corresponding to respective pieces of information preset in the information prediction model. The preset information of various categories may include facial expression information, age group sections, race category information, gender information, and the like, and the specific category of the related information is determined based on the training result of the information prediction model. For example, a probability value corresponding to each facial expression information, a probability value corresponding to each age group information, a probability value corresponding to each family category information, a probability value corresponding to a female, and a probability value corresponding to a male.

The execution subject may select, as the association information, information indicated by a maximum probability value for each category from a plurality of probability values corresponding to information indicating different categories.

According to the method provided by the embodiment of the disclosure, the facial image of the target user is acquired, and then the acquired facial image is input to the pre-trained information prediction model, so that the associated information corresponding to the facial image is obtained, and the predicted associated information is more accurate.

Further referring to fig. 4, an application scenario of the information prediction method of the present disclosure is shown.

In the application scenario shown in fig. 4, the photographing apparatus inputs the acquired user face image 301 to the server 402. The server 402 inputs the acquired head portrait 401 of the face of the user to the information prediction model 403, so as to obtain the relevant information corresponding to the user presented by the image 401 of the face of the user. The associated information includes: smile, male, yellow race.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating a model, which corresponds to the embodiment of the method shown in fig. 2, and which may be applied in various electronic devices in particular.

As shown in fig. 5, the apparatus 500 for generating a model provided in this embodiment includes an obtaining unit 501, a recognition unit 502, and a training unit 503. Wherein the obtaining unit 501 is configured to obtain a training sample set, the training sample set including a sample face image; the recognition unit 502 is configured to recognize sample facial images in a training sample set by using a preset number of information recognition models trained in advance, and obtain associated information corresponding to an object represented by the sample facial images; a training unit 503 configured to train the sample face images in the training sample set as input and the related information corresponding to the sample face images as expected output, and obtain an information prediction model by using a machine learning method.

In the present embodiment, in the apparatus 500 for generating a model: the specific processing of the obtaining unit 501, the identifying unit 502, and the training unit 503 and the technical effects thereof can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementation manners of this embodiment, the information prediction model to be trained includes a feature extraction layer and a full connection layer; and the training unit 503 is further configured to: based on the sample face images in the training sample set, performing the following training steps: for each sample face image in a training sample set, inputting the sample face image to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image; inputting the obtained feature map into a full connection layer to obtain a sample output result of the sample face image; determining whether a preset loss function is converged based on a sample output result corresponding to a training sample in the determined training sample set, wherein the preset loss function is used for indicating an error between the sample output result in the obtained sample output result set and corresponding associated information; and determining that the training of the information prediction model is completed in response to determining that the preset loss function converges.

In some optional implementations of this embodiment, the apparatus 500 for generating a model further includes: and an adjusting unit (not shown) configured to adjust parameters of the information prediction model to be trained by using a back propagation algorithm in response to determining that the preset loss function is not converged, and to continue to perform the training step.

The device for generating the model, provided by the embodiment of the disclosure, is used for identifying the sample facial images in the training sample set by acquiring the training sample set and then utilizing the pre-trained information identification models with the preset number to obtain the associated information corresponding to the object presented by the sample facial images, so that manual marking is not needed, the labor cost is saved, and the working efficiency is improved.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an information prediction apparatus, which corresponds to the embodiment of the method shown in fig. 3, and which can be applied in various electronic devices.

As shown in fig. 6, the information prediction apparatus 600 provided in the present embodiment includes an image acquisition unit 601 and a generation unit 602. An image acquisition unit 601 configured to acquire a face image of a target user; a generating unit 602 configured to input the face image to the information prediction model generated by the method according to any one of the embodiments of the first aspect, and obtain associated information corresponding to the face image, wherein the associated information includes at least one of the following: facial expression information, attribute information.

Referring now to fig. 7, shown is a schematic diagram of an electronic device (e.g., terminal device in fig. 1) 700 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be included in the terminal device; or may exist separately without being assembled into the terminal device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein the training sample set comprises a sample face image; identifying sample facial images in a training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to an object represented by the sample facial images; the information prediction model is obtained by training using a machine learning method, with the sample face images in the training sample set as input and the related information corresponding to the sample face images as expected output.

Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a facial image of a target user; inputting the face image to the information prediction model generated by the method according to the first aspect, and obtaining the associated information corresponding to the face image, wherein the associated information comprises at least one of the following: facial expression information, attribute information.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a processor including an acquisition unit, a recognition unit, and a training unit. Where the names of these units do not in some cases constitute a limitation on the units themselves, for example, an acquisition unit may also be described as a "unit that acquires a training sample set".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for generating a model, comprising:

acquiring a training sample set, wherein the training sample set comprises a sample face image;

identifying sample facial images in the training sample set by using a preset number of pre-trained information identification models to obtain associated information corresponding to an object represented by the sample facial images, wherein the associated information comprises an identification result obtained by identifying the same sample facial image by each information identification model in the preset number of information identification models, the preset number is more than 1, and the information identification models comprise neural networks;

and training the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output by using a machine learning method to obtain an information prediction model.

2. The method of claim 1, wherein the information prediction model to be trained comprises a feature extraction layer and a fully connected layer; and

the training of the information prediction model by using the machine learning method with the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output comprises the following training steps:

for each sample face image in a training sample set, inputting the sample face image to a feature extraction layer of an information prediction model to be trained to obtain a feature map of the sample face image;

inputting the obtained feature map into a full connection layer to obtain a sample output result of the sample face image;

determining whether a preset loss function converges based on the obtained sample output result, wherein the preset loss function is used for indicating an error between the sample output result in the obtained sample output result set and the corresponding associated information;

and determining that the training of the information prediction model is completed in response to determining that the preset loss function converges.

3. The method of claim 2, wherein the method further comprises:

and in response to the fact that the preset loss function is determined not to be converged, updating parameters of the information prediction model to be trained by using a back propagation algorithm, and continuing to execute the training step.

4. An apparatus for generating a model, comprising:

an acquisition unit configured to acquire a training sample set including a sample face image;

the identification unit is configured to identify sample facial images in the training sample set by using a preset number of information identification models trained in advance to obtain associated information corresponding to an object presented by the sample facial images, wherein the associated information comprises an identification result obtained by identifying the same sample facial image by each information identification model in the preset number of information identification models, the preset number is greater than 1, and the information identification models comprise neural networks;

and the training unit is configured to train the sample face images in the training sample set as input and the associated information corresponding to the sample face images as expected output, and an information prediction model is obtained by utilizing a machine learning method.

5. The apparatus of claim 4, wherein the information prediction model to be trained comprises a feature extraction layer and a fully connected layer; and

the training unit is further configured to perform the following training steps:

determining whether a preset loss function is converged based on the obtained sample output result, wherein the preset loss function is used for indicating an error between the sample output result in the obtained sample output result set and corresponding associated information;

6. The apparatus of claim 5, wherein the apparatus further comprises:

an adjusting unit configured to adjust parameters of the information prediction model to be trained using a back propagation algorithm in response to determining that the preset loss function is not converged, and to continue performing the training step.

7. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-3.

8. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-3.