WO2021102655A1 - 网络模型训练方法、图像属性识别方法、装置及电子设备 - Google Patents

网络模型训练方法、图像属性识别方法、装置及电子设备 Download PDF

Info

Publication number
WO2021102655A1
WO2021102655A1 PCT/CN2019/120749 CN2019120749W WO2021102655A1 WO 2021102655 A1 WO2021102655 A1 WO 2021102655A1 CN 2019120749 W CN2019120749 W CN 2019120749W WO 2021102655 A1 WO2021102655 A1 WO 2021102655A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
model
neural network
loss function
network model
Prior art date
Application number
PCT/CN2019/120749
Other languages
English (en)
French (fr)
Inventor
高洪涛
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to CN201980100863.7A priority Critical patent/CN114450690A/zh
Priority to PCT/CN2019/120749 priority patent/WO2021102655A1/zh
Publication of WO2021102655A1 publication Critical patent/WO2021102655A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the embodiments of the application relate to computer technology, and in particular to a network model training method, image attribute recognition method, device, and electronic equipment.
  • Image recognition refers to the use of computers to process, analyze, and understand images to identify targets and objects in various patterns. It is a practical application of deep learning algorithms.
  • the current image recognition method is to directly feed the image into the convolutional neural network for feature extraction, and process the extracted features in the fully connected layer of the convolutional neural network to obtain the final prediction result of the image.
  • the image recognition results obtained through such image recognition will ignore many image attributes and the correlation and sequence between image attributes. For example, when recognizing a human body image, if there are hats, glasses, tops, For bags, bottoms, shoes, etc., there will be relevance and order between tops and bottoms. If they are directly identified through the above-mentioned prior art, the relevance and order between tops and bottoms will be ignored, leading to inconsistencies. Accurate recognition effect.
  • This application provides a network model training method, image attribute recognition method, device, and electronic equipment, which can accurately recognize image attributes and the correlation between each attribute.
  • an embodiment of the present application provides a network model training method, and the method includes:
  • the image sample set including a plurality of initial values of image attributes
  • the basic model including a convolutional neural network model and a recurrent neural network model;
  • the convergent basic model is used as a recognition model for recognizing image attributes.
  • an embodiment of the present application also provides an image attribute recognition method, which includes:
  • the image attribute recognition model adopts the image attribute recognition model obtained by training the network model training method provided in the embodiment of the present application.
  • an embodiment of the present application provides a network model training device, including:
  • the first obtaining module is configured to obtain an image sample set, the image sample set including a plurality of initial values of image attributes
  • the first recognition module is configured to input the image sample set into the basic model for image attribute recognition, so as to obtain the first training result obtained according to the recurrent neural network model and the first training result according to the convolutional neural network model The second training result obtained;
  • a training module configured to perform joint training on the convolutional neural network model and the recurrent neural network model according to the first training result, the second training result, the initial value of the image attribute, and the target loss function , Until the basic model converges;
  • the determining module is used to use the converged basic model as a recognition model for recognizing image attributes.
  • an image attribute recognition device including:
  • the receiving module is used to receive the image attribute recognition request
  • the second obtaining module is configured to obtain the image to be recognized according to the image attribute recognition request
  • the calling module is used to call the pre-trained image attribute recognition model
  • the second recognition module is used to input the image to be recognized into a pre-trained image attribute recognition model, and to recognize the image attributes of the image to be recognized to obtain an image attribute recognition result;
  • the image attribute recognition model adopts the image attribute recognition model obtained by the network model training method provided in the embodiment of the present application.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, wherein when the computer program is executed on a computer, the computer is caused to execute the network model training method provided in this embodiment or Image attribute recognition method.
  • an embodiment of the present application provides an electronic device including a memory and a processor, the memory stores a computer program, and the processor invokes the computer program stored in the memory to execute:
  • the image sample set including a plurality of initial values of image attributes
  • the basic model including a convolutional neural network model and a recurrent neural network model;
  • the convergent basic model is used as a recognition model for recognizing image attributes.
  • an embodiment of the present application provides an electronic device including a memory and a processor, the memory stores a computer program, and the processor invokes the computer program stored in the memory to execute:
  • the image attribute recognition model adopts the image attribute recognition model obtained by training the network model training method provided in the embodiment of the present application.
  • FIG. 1 is a schematic diagram of the first process of a network model training method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of the second process of the network model training method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image attribute recognition method provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a network model training device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an image attribute recognition device provided by an embodiment of the present application.
  • Fig. 6 is a first schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of the first network model training method provided by an embodiment of the present application.
  • the process of the network model training method may include:
  • the image sample set includes a variety of images, such as human body images, animal images, plant images, etc.
  • images such as human body images, animal images, plant images, etc.
  • multiple human images in the image sample set may be used as the training of the network model image.
  • the images in the image sample set can choose to include a variety of different clothing combinations Human body image.
  • the correlation between facial features or the correlation between limbs can also be selected as the direction of model training.
  • the images containing facial features in multiple human images can be cropped into different images of the same size.
  • the preset number of feature points of the face in the human image can be obtained.
  • the image containing facial features in the human body image is disassembled and constructed.
  • the human body image is considered not to be an image sample set.
  • each training image in the image sample set has its own corresponding initial value of image attributes.
  • the human body image contains attributes such as tops, bottoms, hats, shoes, etc.
  • the correlation between the top and bottom is 10%, and the 10% correlation between the top and bottom can be used as the initial value of the image attribute.
  • the initial value of the image attribute can be multiple or one, depending on specific needs. It is determined according to the number of attributes in the training image and the degree of association between the attributes.
  • the basic model can be jointly created by using different types of network models.
  • a convolutional neural network model Convolutional Neural Networks, CNN
  • a recurrent neural network Recurrent Neural Network, RNN
  • CNN convolutional Neural Networks
  • RNN recurrent Neural Network
  • an input layer can be set, the input layer is used to input the training images in the image sample set to the basic model, and then the input layer is connected with the convolutional layer, and the convolutional layer is respectively connected with the pooling layer and the recurrent neural network Connection, the cyclic neural network is connected to the first fully connected layer, the pooling layer is connected to the second fully connected layer, and the first fully connected layer and the second fully connected layer are used as the output layer of the basic model.
  • the input layer, the convolutional layer, the pooling layer, and the second fully connected layer are connected in sequence to form a convolutional neural network, and the recurrent neural network is arranged between the convolutional layer and the first fully connected layer.
  • the training sample input is a continuous sequence
  • the length of the sequence is different, such as a time-based sequence: a continuous speech and a continuous handwritten text.
  • the recurrent neural network can handle the problem of uncertain input training values.
  • the recurrent neural network also has the problem of gradient disappearance, it is difficult to process long sequences of data. Therefore, the recurrent neural network can adopt gated recurrent unit networks (Gated Recurrent Unit networks, GRU), and the recurrent unit of the gated recurrent unit network is only Contains two gates: update gate and reset gate. The two gates of update gate and reset gate do not form a self-loop, but directly recurse between system states.
  • the gated recurrent unit network corresponds to the first loss function
  • the convolutional neural network corresponds to the second loss function
  • the first loss function and the second loss function It can be a loss function of different types, or it can be a loss function of the same type, and the target loss function corresponding to the basic model is obtained from the first loss function and the second loss function.
  • the first loss function can be multiplied by the loss coefficient, and then the second loss function can be added to obtain the objective function corresponding to the basic model, where the loss coefficient can be a parameter obtained through experiments and can be set to 0.8 to 1. between.
  • the training image in the image sample set is input into the input layer, where the training image can be a human body image, and the attributes of the image are identified through the constructed basic model and the objective function corresponding to the basic model.
  • the training image is first calculated by the convolutional layer to obtain the first feature value, and the first feature value output by the last layer of the convolutional layer is input to the pooling layer and the recurrent neural network, respectively, where the second feature output by the recurrent neural network The value is input to the first fully connected layer to obtain the first training result, and the third feature value input from the pooling layer is input to the second fully connected layer to obtain the second training result.
  • first training result and the second training result are not completely the same, and the final training result obtained by the basic model is obtained based on the first training result and the second training result.
  • the intersection of the first training result and the second training result can be taken, and the target training result in the intersection is the final training result obtained by the basic model.
  • the final training result you can also choose to add the first training result and the second training result to get the final training result, and you can also select part of the training result from the first training result and the second training result according to a preset rule. The final training result.
  • multiple first training results and second training results are obtained according to the basic model, and then the final training result is obtained according to the first training result and the second training result, and the final training result is obtained by And the initial value of the image attribute is input into the target loss function to get the target loss value.
  • the input training image is a human body image, where the person in the human body image is wearing a coat, a bottom coat, shoes, a hat, and other wearing objects.
  • Each wearing object can be regarded as one of the attributes of the human body image.
  • the two are input into the target loss function to obtain the corresponding target loss value. It can be judged whether the target loss value is close to the preset loss value. If the target loss value is If the distance preset loss value is within the preset range, it is considered that the basic model has been trained and is in a state of convergence.
  • the electronic device inputs the human body image into the basic model, outputs the first training result in the first fully connected layer, and inputs the second training result in the second fully connected layer.
  • the output result of the first fully connected layer contains the correlation between the attributes.
  • the first training result has three attributes: cotton clothes, gloves, and shorts. Among them, there is a correlation between cotton clothes and gloves, and cotton clothes and shorts are related. There is no correlation between them.
  • the second training result has three attributes: cotton clothes, gloves and short skirts. The intersection of the first training result and the second training result can be taken as the final training result.
  • the final training result contains the two attributes of cotton clothes and gloves and the correlation between cotton clothes and gloves.
  • the preset range of the distance from the preset loss value is 1-10
  • the first training result and the initial value of the image attribute can be directly input to the first loss function
  • the second training result and the initial value of the image attribute can be input to the second loss function. Since the target loss function is based on the first loss function, The loss function and the second loss function are obtained, and the target loss value can be directly calculated through the loss function when the first training result and the second training result are obtained. Compare the target loss value with the preset loss value to judge whether the basic model converges. For example, when the target loss value is less than or equal to the preset loss value, it is considered that the target loss value reaches the preset condition, and the basic model is considered to have converged at this time.
  • the number of trainings can be multiple times.
  • the basic model is finally converged and the expected effect of the basic model training is achieved.
  • the convergent basic model is used as the image attribute recognition model for image attribute recognition.
  • the image attribute recognition model can be applied to electronic devices to identify the image attributes stored in the electronic device by the user according to the image attribute recognition model.
  • the attribute recognition results obtain the correlation between the image attributes, which can also improve the accuracy of image attribute recognition.
  • the network model training method obtains an image sample set, which includes multiple initial values of image attributes; constructs a basic model and a target loss function corresponding to the basic model, and the basic model includes convolutional neural Network model and recurrent neural network model; input the image sample set into the basic model for image attribute recognition to obtain the first training result obtained according to the recurrent neural network model and the second training result obtained according to the convolutional neural network model; According to the first training result, the second training result, the initial value of the image attribute and the target loss function, the convolutional neural network model and the recurrent neural network model are jointly trained until the basic model converges; the converged basic model is used as the recognition of the recognition image attribute model.
  • the image attribute recognition model obtained in this way can improve the accuracy of image attribute recognition and can also identify the correlation between image attributes.
  • FIG. 2 is a schematic diagram of the second process of the network model training method provided by an embodiment of the present application.
  • the network model training method may include:
  • the image sample set includes a variety of images, such as human body images, animal images, plant images, etc.
  • images such as human body images, animal images, plant images, etc.
  • multiple human images in the image sample set may be used as training images for the network model.
  • multiple human body images are used as training images, and the attributes of each human body image and the degree of association between the attributes can be extracted.
  • a human body image there are image attributes such as hats, glasses, tops, bottoms, shoes, etc.
  • the correlations between different attributes are different. For example, there is no close correlation between wearing glasses and wearing tops. There is no close correlation between wearing shoes and wearing a hat.
  • the corresponding relevance between the image attribute and the image attribute can be used as an initial value of the image attribute.
  • the input layer is used to input the training image, and then a convolution layer is set, and the convolution layer is used to perform preliminary image feature extraction on the input training image to obtain the first feature value, and then set the first feature value.
  • An eigenvalue is input into the next layer of the basic model structure.
  • the recurrent neural network can process the first eigenvalue output by the convolutional layer and input the second eigenvalue.
  • the other side of the recurrent neural network is connected to the first fully connected layer.
  • the fully connected layer can be used as an output layer of the basic model to process the second feature value and output the first training result.
  • the input layer, convolutional layer, pooling layer, and second fully connected layer are sequentially connected to form a convolutional neural network.
  • the first eigenvalue output by the convolutional layer is processed by the pooling layer to obtain the third eigenvalue, and finally the second
  • the fully connected layer processes the third feature value and outputs the second training result.
  • the entire basic model can be seen as a combination of convolutional neural network model and recurrent neural network.
  • the convolutional layer can be set to multiple layers.
  • the convolutional layer includes conv3, conv6, conv9, etc.
  • the input human body image is processed by the multi-layer convolutional layer to obtain a feature map in a certain dimension (Feature Map) , Use the feature map as the first feature value.
  • the recurrent neural network can be a gated recurrent unit network, that is, a GRU neural network.
  • first loss function and the second loss function may be the same type of loss function, or may be different types of loss functions, for example, both the first loss function and the second loss function may be cross-entropy loss functions.
  • the output of the last layer of the convolutional neural network can be processed by the softmax algorithm. This step is usually to obtain the probability that the output belongs to a certain class. For a single sample, the output is a vector.
  • the formula of softmax is:
  • y 'i represents the actual value of the label of the i-th
  • y i is the output vector softmax [Y1, Y2, Y3 ...] in the i-th element.
  • the first loss function can be multiplied by the loss coefficient, and then the second loss function can be added to obtain the objective function corresponding to the basic model, where the loss coefficient can be a parameter obtained through experiments and can be set to 0.8 to 1. between.
  • the setting of the target loss function also needs to be adjusted according to the training direction 2.
  • Setting the loss coefficient before the first loss function to multiply the first loss function is to adjust the target function a method.
  • the training image in the image sample set is input into the input layer, where the training image can be a human body image, and the attributes of the image are identified through the constructed basic model and the objective function corresponding to the basic model.
  • the training image is first calculated by the convolutional layer to obtain the first feature value, and the first feature value output by the last layer of the convolutional layer is input to the pooling layer and the recurrent neural network, respectively, where the second feature output by the recurrent neural network The value is input to the first fully connected layer to obtain the first training result, and the third feature value input from the pooling layer is input to the second fully connected layer to obtain the second training result.
  • first training result and the second training result are not completely the same, and the final training result obtained by the basic model is obtained based on the first training result and the second training result.
  • the intersection of the first training result and the second training result can be taken, and the target training result in the intersection is the final training result obtained by the basic model.
  • the final training result you can also choose to add the first training result and the second training result to get the final training result, and you can also select part of the training result from the first training result and the second training result according to a preset rule. The final training result.
  • multiple first training results and second training results are obtained according to the basic model, and then the final training result is obtained according to the first training result and the second training result, and the final training result is obtained by And the initial value of the image attribute is input into the target loss function to get the target loss value.
  • the first training result and the initial value of the image attribute can be directly input to the first loss function
  • the second training result and the initial value of the image attribute can be input to the second loss function. Since the target loss function is based on the first loss function, The loss function and the second loss function are derived, and the target loss value can be directly calculated through the loss function when the first training result and the second training result are obtained. Compare the target loss value with the preset loss value to judge whether the basic model converges. For example, when the target loss value is less than or equal to the preset loss value, you are deemed to have reached the preset condition for the target loss value, and the basic model is considered to converge.
  • the target loss value when the target loss value does not meet the preset condition, for example, the target loss value is not within the preset range, or the target loss value does not reach the preset loss value, it can be regarded as basic model training and If it is not completed, the training results output by the basic model cannot reach the expected results, so the model parameters of the basic model need to be adjusted.
  • the parameters of the convolutional neural network model and the cyclic neural network model can be adjusted. Some model parameters are adjusted. Among them, the parameters of the model can be adjusted through the back-propagation algorithm.
  • the convergent basic model is used as the image attribute recognition model for image attribute recognition.
  • the image attribute recognition model can be applied to electronic devices to identify the image attributes stored in the electronic device by the user according to the image attribute recognition model.
  • the attribute recognition results obtain the correlation between the image attributes, which can also improve the accuracy of image attribute recognition.
  • inputting a random human body image into the convergent basic model can accurately identify the type of clothing worn in the human body image and the association between the clothes; or can accurately identify the features of the human body's five sense organs and the relationship between the five sense organs. It shows that the convergent basic model has been able to accurately identify the attributes of the input image and the correlation between the attributes, and the convergent basic model can be used as an image attribute recognition model.
  • the network model training method obtains the image sample set and the initial value of the image attributes included in the image sample set, and then constructs a basic model based on the convolutional neural network and the recurrent neural network.
  • Set the first loss function set the second loss function for the convolutional neural network, and obtain the target loss function according to the first loss function and the second loss function; input the image sample set into the basic model to train the basic model to obtain the first The training result and the second training result.
  • the parameters of the basic model are adjusted according to the initial value of the image attribute, the target loss function, the first training result and the second training result, until the basic model converges, and the converged model is recognized as the image attribute
  • the model is used to accurately identify the image attributes and the degree of association between each attribute.
  • FIG. 3 is a schematic flowchart of an image attribute recognition method provided by an embodiment of the present application.
  • the image attribute recognition method may include the following processes:
  • the image attribute recognition request may be triggered by the electronic device receiving a touch operation, a voice operation, and receiving a start instruction of a corresponding target application.
  • a variety of clothing or accessories can be input into the virtual human body image, where tops, bottoms, shoes, hats, earrings, necklaces, etc. can all be attributes of the human body image Information, the user can input these items into the virtual human body image and wear them in the corresponding position to obtain the new virtual human body image.
  • the user can choose to recognize, and the electronic device receives the image attribute recognition request for the new virtual human body image. Human body image for recognition.
  • the user when the user browsing interface contains multiple images, the user can click on a specific location on the electronic device or use a finger to divide the area to select the image to be recognized. At this time, the electronic device can select the location according to the user's selection. Or the region acquires the image to be recognized.
  • the electronic device can actively obtain the image that the user needs to recognize according to the image recognition request. For example, when the user browses a picture, the electronic device can actively search for the image to be recognized according to the image recognition request. image.
  • the image attribute recognition request contains the specific type of the target subject.
  • the electronic device receives the image attribute recognition request, it can obtain the target subject in the image to be recognized. For example, how much is a group photo of the user? For personal objects, you can extract the subject of the person that needs to be identified; a landscape photo contains a variety of plants or animals, and you can extract the subject of the animal image that needs to be identified. In the process of identifying image attributes, you need to exclude non-recognition Object, keep the target subject.
  • the image where the target subject is located can be cropped to obtain the target image.
  • image attribute recognition it is possible to prevent subjects that do not need to be recognized from interfering with the recognition of the target subject. In the process, the recognition speed is faster and the recognition result is more accurate.
  • the image attribute recognition model adopts the image attribute recognition model trained by the network model training method provided in this embodiment.
  • the network model training method provided in this embodiment.
  • the target image is input into the image attribute recognition model and then the image attribute recognition is performed to obtain the recognition results of multiple attributes in the image and the correlation between the attributes.
  • the image attribute recognition model can recognize the correlation between human clothing, for example, the correlation between shorts and short sleeves is 100%, and the correlation between jeans and sports shoes is 80% , The correlation between hats and glasses is 50%, etc., so as to get the correlation between various wearables and clothes, and users can better refer to how to match clothes.
  • the image attribute recognition method provided by the embodiment of the application receives the image attribute recognition request, obtains the image to be recognized according to the image attribute recognition request, and then calls the pre-trained image attribute recognition model to input the image to be recognized into the pre-training
  • the image attribute recognition model is used to identify the image attributes of the image to be recognized to obtain the result of image attribute recognition, so as to obtain the correlation between the attributes of the image.
  • the network model training device 400 may include: a first acquisition module 410, a construction module 420, a first recognition module 430, a training module 440, and a determination module 450.
  • the first obtaining module 410 is configured to obtain an image sample set, the image sample set including a plurality of initial values of image attributes;
  • the construction module 420 is configured to construct a basic model and a target loss function corresponding to the basic model, where the basic model includes a convolutional neural network model and a recurrent neural network model;
  • the first recognition module 430 is configured to input the image sample set into the basic model for image attribute recognition, so as to obtain the first training result obtained according to the recurrent neural network model and according to the convolutional neural network The second training result obtained by the model;
  • the training module 440 is configured to combine the convolutional neural network model and the recurrent neural network model according to the first training result, the second training result, the initial value of the image attribute, and the target loss function Training until the basic model converges;
  • the determining module 450 is configured to use the converged basic model as a recognition model for recognizing image attributes.
  • the construction module 420 includes a setting sub-module 421, a first connection sub-module 422, and a second connection sub-module 423.
  • the setting sub-module 421 is used to set a convolutional layer, a pooling layer, and a first global A connection layer and a second fully connected layer; a first connection sub-module 422 for connecting the convolutional layer, the pooling layer, and the second fully connected layer to obtain the convolutional neural network model;
  • the second connection sub-module 423 is configured to connect the recurrent neural network to the convolutional layer and to connect the recurrent neural network to the first fully connected layer to obtain the recurrent neural network model.
  • the construction module 420 is specifically configured to construct a first loss function corresponding to the convolutional neural network model, and construct a second loss function corresponding to the recurrent neural network model, according to the first loss function and the second loss function Obtain the target loss function corresponding to the basic model.
  • the second loss function may be multiplied by a loss coefficient to obtain a target second loss function, and the target second loss function and the first loss function may be added to obtain the target loss function.
  • the training module 440 is specifically configured to input the first training result, the second training result, and the initial value of the image attribute into the target loss function to obtain a target loss value;
  • the target loss value adjusts the parameters of the basic model.
  • the determining module 450 is specifically configured to input the image sample set into the convolutional layer to obtain the first feature value; input the first feature value to the recurrent neural network to Obtain the second eigenvalue; input the second eigenvalue to the first fully connected layer to obtain the first training result.
  • the first acquiring module 410 acquires an image sample set, the image sample set includes a plurality of image attribute initial values; the building module 420 builds a basic model and the target loss function corresponding to the basic model, the basic model includes a convolutional neural network model And the recurrent neural network model; the first recognition module 430 inputs the image sample set into the basic model for image attribute recognition, so as to obtain the first training result obtained according to the recurrent neural network model and the first training result obtained according to the convolutional neural network model The second training result; the training module 440 performs joint training on the convolutional neural network model and the recurrent neural network model according to the first training result, the second training result, the initial value of the image attribute and the target loss function, until the basic model converges; the determination module 450
  • the convergent basic model is used as the recognition model for recognizing image attributes.
  • the basic network model after training can improve the accuracy of image attribute recognition and identify the correlation between various image attributes.
  • the network model training device provided in the embodiments of this application and the network model training method in the above embodiments belong to the same concept, and the network model training method can be run on the network model training device provided in the embodiment of the training method
  • the network model training method can be run on the network model training device provided in the embodiment of the training method
  • any method of refer to the embodiment of the training method of the network model for the specific implementation process, which will not be repeated here.
  • the image attribute recognition device 500 may include: a receiving module 510, a second acquiring module 520, a calling module 530, and a second recognition module 540.
  • the receiving module 510 is configured to receive an image attribute recognition request
  • the second obtaining module 520 is configured to obtain the image to be recognized according to the image attribute recognition request
  • the calling module 530 is used to call a pre-trained image attribute recognition model
  • the second recognition module 540 is configured to input the image to be recognized into a pre-trained image attribute recognition model, and to recognize the image attributes of the image to be recognized to obtain an image attribute recognition result.
  • the second obtaining module 520 is specifically configured to recognize the target subject in the image to be recognized according to the image attribute recognition request, and obtain the target image in the image to be recognized according to the target subject.
  • the image attribute recognition device 500 receives the image attribute recognition request through the receiving module 510; the second obtaining module 520 obtains the image to be recognized according to the image attribute recognition request; the calling module 530 calls the pre-trained image attribute recognition Model; the second recognition module 540 inputs the image to be recognized into the pre-trained image attribute recognition model, and recognizes the image attributes of the image to be recognized to obtain the image attribute recognition result.
  • the image attribute recognition device trained by the above network model training method can accurately recognize each attribute in the image and the correlation between each attribute, and improve the accuracy of image attribute recognition.
  • the image attribute recognition device provided in this embodiment of the application belongs to the same concept as the image attribute recognition method in the above embodiment.
  • the image attribute that can be run on the image attribute recognition device is any of the methods provided in the method embodiments.
  • For the specific implementation process of the method please refer to the image processing method embodiment, which will not be repeated here.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer executes the network model training method or image provided in the embodiment of the present application. ⁇ Treatment methods.
  • the storage medium may be a magnetic disk, an optical disc, a read only memory (Read Only Memory, ROM,), or a random access device (Random Access Memory, RAM), etc.
  • An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory.
  • the processor is configured to execute the computer program stored in the memory by calling the computer program stored in the memory.
  • Example provides the training method of the network model or the image attribute recognition method.
  • the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 6 is a schematic diagram of the first structure of an electronic device provided by an embodiment of this application.
  • the electronic device 500 may include components such as a memory 601 and a processor 602. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 7 does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 601 may be used to store software programs and modules.
  • the processor 602 executes various functional applications and data processing by running the computer programs and modules stored in the memory 601.
  • the memory 601 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, a computer program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of electronic equipment, etc.
  • the processor 602 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device, and executes the electronic device by running or executing the application program stored in the memory 601 and calling the data stored in the memory 601
  • the various functions and processing data of the electronic equipment can be used to monitor the electronic equipment as a whole.
  • the memory 601 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 601 may further include a memory controller to provide the processor 602 with access to the memory 601.
  • the processor 602 in the electronic device will load the executable code corresponding to the process of one or more application programs into the memory 601 according to the following instructions, and the processor 602 will run and store the executable code in the memory 601.
  • the image sample set including a plurality of initial values of image attributes
  • the basic model including a convolutional neural network model and a recurrent neural network model;
  • the convergent basic model is used as a recognition model for recognizing image attributes.
  • the processor 602 when the processor 602 executes the construction of the target loss function corresponding to the basic model, it may execute:
  • the processor 602 when the processor 602 executes to obtain the target loss function corresponding to the basic model according to the first loss function and the second loss function, it may execute:
  • the target loss function is obtained by adding the target second loss function and the first loss function.
  • the processor 602 executes the calculation of the convolutional neural network model and the recurrent neural network according to the first training result, the second training result, the initial value of the image attribute, and the target loss function.
  • the network model is jointly trained, until the basic model converges, you can execute:
  • the parameters of the basic model are adjusted according to the target loss value.
  • processor 602 when the processor 602 executes the construction of the basic model, it may execute:
  • the recurrent neural network is connected to the convolutional layer, and the recurrent neural network is connected to the first fully connected layer to obtain the recurrent neural network model.
  • the processor 602 when the processor 602 executes inputting the image sample set into the basic model for image attribute recognition to obtain the first training result obtained according to the recurrent neural network model, it may execute:
  • the second feature value is input to the first fully connected layer to obtain the first training result.
  • the processor 602 in the electronic device will load the executable code corresponding to the process of one or more application programs into the memory 601 according to the following instructions, and the processor 501 will run and store the executable code in the memory.
  • the image to be recognized is input to a pre-trained image attribute recognition model, and the image attributes of the image to be recognized are recognized to obtain an image attribute recognition result.
  • processor 602 when the processor 602 executes to obtain the image to be identified according to the image attribute identification request, it may execute:
  • FIG. 7 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application.
  • the electronic device further includes: a camera component 603, a radio frequency circuit 604, an audio circuit 605, and Power supply 606.
  • the display 603, the radio frequency circuit 604, the audio circuit 605, and the power supply 606 are electrically connected to the processor 602, respectively.
  • the display 603 can be used to display information input by the user or information provided to the user, and various graphical user interfaces. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
  • the display 603 may include a display panel.
  • the display panel may be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the radio frequency circuit 604 may be used to transmit and receive radio frequency signals to establish wireless communication with network equipment or other electronic equipment through wireless communication, and to transmit and receive signals with the network equipment or other electronic equipment.
  • the audio circuit 605 can be used to provide an audio interface between the user and the electronic device through a speaker or a microphone.
  • the power supply 606 can be used to power various components of the electronic device 600.
  • the power supply 606 may be logically connected to the processor 602 through a power management system, so that functions such as management of charging, discharging, and power consumption management can be realized through the power management system.
  • the electronic device 600 may also include a camera component, a Bluetooth module, etc.
  • the camera component may include an image processing circuit, which may be implemented by hardware and/or software components, and may include defining image signal processing (Image Signal Processing) various processing units of the pipeline.
  • the image processing circuit may at least include: multiple cameras, an image signal processor (Image Signal Processor, ISP processor), a control logic, an image memory, a display, and the like.
  • Each camera may include at least one or more lenses and image sensors.
  • the image sensor may include a color filter array (such as a Bayer filter). The image sensor can obtain the light intensity and wavelength information captured by each imaging pixel of the image sensor, and provide a set of raw image data that can be processed by the image signal processor.
  • the network model training method/image processing method and device provided in the embodiments of the present application belong to the same concept as the network model training method/image processing method in the above embodiments.
  • the device can run any of the methods provided in the network model training method/image processing method embodiment.
  • For the specific implementation process please refer to the network model training method/image processing method embodiment. I won't repeat it here.
  • the computer program can be stored in a computer readable storage medium, such as in a memory, and executed by at least one processor. May include the flow of the embodiment of the training method of the network model/the image processing method.
  • the storage medium may be a magnetic disk, an optical disc, a read only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), etc.
  • each functional module can be integrated in a processing chip, or each module can exist alone physically, or two or more The modules are integrated in one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种网络模型训练方法、图像属性识别方法、图像属性识别装置(500)及电子设备(600)。所述网络模型训练方法包括:获取图像样本集(101);构建基础模型及所述基础模型对应的目标损失函数(102);根据图像样本集和损失函数对基础模型进行训练,直至基础模型收敛;将收敛的基础模型作为识别图像属性的识别模型(105)。

Description

网络模型训练方法、图像属性识别方法、装置及电子设备 技术领域
本申请实施例涉及计算机技术,尤其涉及一种网络模型训练方法、图像属性识别方法、装置及电子设备。
背景技术
图像识别,是指利用计算机对图像进行处理、分析和理解,以识别各种不同模式的目标和对象的技术,是应用深度学习算法的一种实践应用。目前的图像识别方法,是直接将图像送入到卷积神经网络中进行特征提取,对提取的特征在卷积神经网络的全连接层中进行处理,从而得到图像的最终预测结果。
但是,通过这样的图像识别得到的图像识别结果会忽略许多图像属性以及图像属性之间的关联性及顺序性,例如,在对人体图像进行识别时,如果人体图像中有帽子、眼镜、上衣、包包、下衣、鞋子等,上衣和下衣之间会存在关联性和顺序性,若直接通过上述现有技术识别,会对上衣和下衣之间的关联性和顺序性忽略,导致不准确的识别效果。
发明内容
本申请提供了一种网络模型训练方法、图像属性识别方法、装置及电子设备,可以准确识别出图像属性及各属性之间的关联性。
第一方面,本申请实施例提供了一种网络模型训练方法,所述方法包括:
获取图像样本集,所述图像样本集包括多个图像属性初始值;
构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
将收敛的所述基础模型作为识别图像属性的识别模型。
第二方面,本申请实施例还提供了一种图像属性识别方法,其中,包括:
接收图像属性识别请求;
根据所述图像属性识别请求获取待识别图像;
调用预先训练的图像属性识别模型;
将所述待识别图像输入至预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果;
其中,所述图像属性识别模型采用本申请实施例提供的网络模型的训练方法训练得到的图像属性识别模型。
第三方面,本申请实施例提供了一种网络模型的训练装置,包括:
第一获取模块,用于获取图像样本集,所述图像样本集包括多个图像属性初始值;
构建模块,用于构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
第一识别模块,用于将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
训练模块,用于根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
确定模块,用于将收敛的所述基础模型作为识别图像属性的识别模型。
第四方面,本申请实施例提供了一种图像属性识别装置,包括:
接收模块,用于接收图像属性识别请求;
第二获取模块,用于根据所述图像属性识别请求获取待识别图像;
调用模块,用于调用预先训练的图像属性识别模型;
第二识别模块,用于将所述待识别图像输入至预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果;
其中,所述图像属性识别模型采用本申请实施例提供的网络模型训练方法得到的图像属性识别模型。
第五方面,本申请实施例提供一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上执行时,使得所述计算机执行本实施例提供的网络模型的训练方法或图像属性识别方法。
第六方面,本申请实施例提供一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
获取图像样本集,所述图像样本集包括多个图像属性初始值;
构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
将收敛的所述基础模型作为识别图像属性的识别模型。
第七方面,本申请实施例提供一种电子设备,包括存储器和处理器,所述存储器中存 储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
接收图像属性识别请求;
根据所述图像属性识别请求获取待识别图像;
调用预先训练的图像属性识别模型;
将所述待识别图像输入至预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果;
其中,所述图像属性识别模型采用本申请实施例提供的网络模型的训练方法训练得到的图像属性识别模型。
附图说明
下面结合附图,通过对本申请的具体实施方式详细描述,将使本申请的技术方案及其有益效果显而易见。
图1是本申请实施例提供的网络模型训练方法的第一流程示意图。
图2是本申请实施例提供的网络模型训练方法的第二流程示意图。
图3是本申请实施例提供的图像属性识别方法的流程示意图。
图4是本申请实施例提供的网络模型训练装置的结构示意图。
图5是本申请实施例提供的图像属性识别装置的结构示意图。
图6是本申请实施例提供的电子设备的第一结构示意图。
图7是本申请实施例提供的电子设备的第二结构示意图。
具体实施方式
请参照图示,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。
请参阅图1,图1是本申请实施例提供的网络模型训练方法的第一种流程示意图。该网络模型训练方法的流程可以包括:
101、获取图像样本集,该图像样本集包括多种图像,例如人体图像、动物图像、植物图像等等,在本申请实施例中,可以采取图像样本集中多张人体图像来作为网络模型的训练图像。
例如,从网络图库中随机下载图片,将包含人体图像的图片筛选出来组合形成一个图像样本集。或者根据模型训练的方向选择多张人体图像来作为图像样本集,比如,模型训练的方向是寻找关于人体图像中衣物之间的关联性,则图像样本集中的图像可以选择包含多种不同穿着搭配的人体图像。
需要说明的是,为了保证在神经网络中输入的向量维数保持固定,保证神经网络不发生动态变化,在人体图像获取的过程中,可以将不同的人体图像进行裁剪,使得裁剪后的每一张人体图像尺寸大小都是相同的,从而实现网络模型训练目的。
可以理解的是,根据网络模型训练的方向,还可以选择面部五官之间的关联性或者肢体之间的关联性等来作为模型训练的方向。此时在图像样本集的获取过程中,可以对多张人体图像中包含面部五官的图像进行裁剪成同一尺寸的不同图像,例如,获取人体图像中脸部的预设数量的特征点,若获取完成,则对该人体图像中包含面部五官的图像进行拆建,反之,若从人体图像中获取的特征点数量未达到预设特征点数量,则认为该人体图像并不能作为图像样本集。
需要说明的是,在获取到图像样本集之后,图像样本集中的每一张训练图像都有各自对应的图像属性初始值,例如,人体图像中包含上衣、下衣、帽子、鞋子等属性,其中上衣和下衣之间的关联性为10%,则上衣和下衣之间的关联性10%就可以作为图像属性初始值,图像属性初始值可以是多个,也可以是一个,具体的需要根据训练图像中的属性多少及属性之间的关联度来决定。
102、构建基础模型及基础模型对应的目标损失函数。
基础模型可以采取不同类型的网络模型来共同创建,例如,可以采取卷积神经网络模型(Convolutional Neural Networks,CNN)和循环神经网络(Recurrent Neural Network,RNN)模型来共同创建基础模型。
在一些实施方式中,可以设置一个输入层,输入层用于输入图像样本集中的训练图像至基础模型中,然后将输入层与卷积层连接,卷积层分别与池化层和循环神经网络连接,循环神经网络与第一全连接层连接,池化层与第二全连接层连接,第一全连接层和第二全连接层作为该基础模型的输出层。
需要说明的是,其中输入层、卷积层、池化层以及第二全连接层依次连接可以形成卷积神经网络,循环神经网络设置在卷积层和第一全连接层之间。
在一些实施方式中,训练样本输入是连续的序列,且序列的长短不一,比如基于时间的序列:一段连续的语音,一段连续的手写文字。此时可以由循环神经网络来处理此类输入训练值不确定的问题。但是由于循环神经网络也有梯度消失的问题,因此很难处理长序列的数据,因此,循环神经网络可以采用门控循环单元网络(Gated Recurrent Unit networks,GRU),门控循环单元网络的循环单元仅包含两个门控:更新门和复位门,其中,更新门和复位门两个门控不形成自循环,而是直接在系统状态间递归。
对于卷积神经网络和门控循环单元网络组合形成的基础模型,其中门控循环单元网络对应着第一损失函数,卷积神经网络对应着第二损失函数,第一损失函数和第二损失函数可以是不同类型的损失函数,也可以是相同类型的损失函数,基础模型对应的目标损失函数由第一损失函数和第二损失函数得到。
在一些实施方式中,可以将第一损失函数乘以损失系数,然后加上第二损失函数得到基础模型对应的目标函数,其中,损失系数可以是通过实验得到的参数,可以设置为0.8到1之间。
103、将图像样本集输入至基础模型之中进行图像属性识别,以获取根据循环神经网络模型得到的第一训练结果和根据卷积神经网络模型得到的第二训练结果。
在输入层中输入图像样本集中的训练图像,其中该训练图像可以为人体图像,通过构建的基础模型及基础模型对应的目标函数对图像的属性进行识别。
其中,训练图像首先经过卷积层进行计算得到第一特征值,卷积层最后一层输出的第一特征值再分别输入至池化层和循环神经网络,其中循环神经网络输出的第二特征值输入至第一全连接层得到第一训练结果,池化层输入的第三特征值输入至第二全连接层得到第二训练结果。
需要说明的是,第一训练结果和第二训练结果不是完全相同的,基础模型得到的最终训练结果是根据第一训练结果和第二训练结果得到的。比如,可以对第一训练结果和第二训练结果取交集,交集中的目标训练结果就是基础模型得到的最终训练结果。
在一些实施方式中,还可以选择将第一训练结果和第二训练结果相加得到最终的训练结果,还可以根据预设规则在第一训练结果中和第二训练结果中选择部分训练结果作为最终的训练结果。
104、根据第一训练结果、第二训练结果、图像属性初始值和目标损失函数对卷积神经网络模型和循环神经网络模型进行联合训练,直至基础模型收敛。
在一些实施例中,通过输入不同的训练图像,根据基础模型得到多个第一训练结果和第二训练结果,再根据第一训练结果和第二训练结果得到最终训练结果,通过将最终训练结果及图像属性初始值输入到目标损失函数中就可以得到目标损失值。
例如,将输入的训练图像为人体图像,其中人体图像中人物穿着有上衣、下衣、鞋子、帽子等多个穿戴物,每个穿戴物可以视为人体图像的属性之一,各属性之间有特定的关联性和关联性对应的关联度,关联度可以视为图像属性初始值。
在一些实施例中,在获取到最终训练结果和图像属性初始值之后,将二者输入到目标损失函数得到对应的目标损失值,可以判断目标损失值是否接近预设损失值,若目标损失值距离预设损失值在预设范围内,则认为基础模型已经训练完成,处于收敛状态。
例如,电子设备将人体图像输入至基础模型中,在第一全连接层输出第一训练结果,在第二全连接层输入第二训练结果。其中,第一全连接层输出的结果中包含各属性之间的关联性,比如第一训练结果有棉服、手套及短裤三个属性,其中棉服和手套之间有关联性,而棉服和短裤之间没有关联性。第二训练结果中有棉服、手套及短裙三个属性。可以取第一训练结果和第二训练结果中的交集为最终训练结果,则最终训练结果中含有棉服和手套两个属性以及棉服和手套之间的关联性。
将最终训练结果输入至目标损失函数得到目标损失值,将目标损失值和预设损失值对比,判断目标损失值是否符合预设条件,例如距离预设损失值的预设范围为1-10,当预设损失值为80,当目标损失值为75时,可以判定目标损失值在预设范围内,则目标损失值 达到预设条件,认为该基础模型已经收敛。
在一些实施例中,可以直接将第一训练结果和图像属性初始值输入到第一损失函数,将第二训练结果和图像属性初始值输入至第二损失函数,由于目标损失函数是根据第一损失函数和第二损失函数得到的,在得到第一训练结果和第二训练结果的情况下可以通过损失函数直接计算出目标损失值。将目标损失值和预设损失值进行对比,来判断基础模型是否收敛。比如,当目标损失值小于或者等于预设损失值时,则认为目标损失值达到预设条件,此时认为基础模型收敛。
需要说明的是,在基础模型的训练中,训练次数可以为多次,通过卷积神经网络模型和循环神经网络模型的联合训练,最终使得基础模型收敛,达到基础模型训练的预期效果。
105、将收敛的基础模型作为识别图像属性的识别模型。
其中,将收敛的基础模型作为图像属性识别的图像属性识别模型,可以将图像属性识别模型应用于电子设备中,以根据图像属性识别模型对用户存储至电子设备中图像属性识别,既可以根据图像属性识别结果得到图像属性之间的关联性,还可以提高对图像属性识别的准确率。
由上述可知,本申请实施例提供的网络模型训练方法,通过获取图像样本集,图像样本集包括多个图像属性初始值;构建基础模型及基础模型对应的目标损失函数,基础模型包括卷积神经网络模型和循环神经网络模型;将图像样本集输入至基础模型之中进行图像属性识别,以获取根据循环神经网络模型得到的第一训练结果和根据卷积神经网络模型得到的第二训练结果;根据第一训练结果、第二训练结果、图像属性初始值和目标损失函数对卷积神经网络模型和循环神经网络模型进行联合训练,直至基础模型收敛;将收敛的基础模型作为识别图像属性的识别模型。
通过此方式获取的图像属性识别模型,能够提高图像属性识别的准确率,还能够识别图像属性之间的关联性。
请参阅图2,图2是本申请实施例提供的网络模型训练方法的第二流程示意图。该网络模型训练方法可以包括:
201、获取图像样本集。
该图像样本集包括多种图像,例如人体图像、动物图像、植物图像等等,在本申请实施例中,可以采取图像样本集中多张人体图像来作为网络模型的训练图像。
在一些实施例中,多张人体图像作为训练图像,可以提取每一张人体图像的属性以及属性之间的关联度。例如,在一张人体图像中,有帽子、眼镜、上衣、下衣、鞋子等图像属性,不同的属性之间关联性是不同的,比如戴眼镜和穿上衣之间没有比较紧密的关联性,穿鞋子和戴帽子之间没有紧密的关联性。此时可以获取具备关联性的属性,比如鞋子和下衣,上衣和下衣之间的关联性以及关联属性之间对应的关联度,具体的,运动鞋、运动裤、运动上衣三者之间是具备关联性的,三者之间也是具备关联度的,可以将图像的属性和图 像属性之间对应的关联度作为一个图像属性初始值。
202、根据卷积神经网络和循环神经网络构建基础模型。
其中,可以先设置一个输入层,输入层用于输入训练图像,然后设置一个卷积层,卷积层用于对输入的训练图像进行初步的图像特征提取以得到第一特征值,再将第一特征值输入至下一层基础模型结构之中。
将卷积层与循环神经网络连接,循环神经网络可以对卷积层输出的第一特征值进行处理并输入第二特征值,循环神经网络的另一侧连接有第一全连接层,第一全连接层可以作为该基础模型的一个输出层,用于对第二特征值进行处理并输出第一训练结果。
将输入层、卷积层、池化层、第二全连接层依次连接,形成卷积神经网络,其中卷积层输出的第一特征值经池化层处理得到第三特征值,最后第二全连接层对第三特征值进行处理并输出第二训练结果。整个基础模型可以看作是卷积神经网络模型和循环神经网络共同构成的。
在一些实施例中,卷积层可以设置多层,例如,卷积层包括conv3、conv6、conv9等,输入的人体图像经过多层卷积层处理之后得到一定维度上的特征图(Feature Map),将该特征图作为第一特征值。其中循环神经网络可以是门控循环单元网络,即GRU神经网络。
203、构建循环神经网络的第一损失函数和卷积神经对应的第二损失函数。
其中,第一损失函数和第二损失函数可以为相同类型的损失函数,也可以为不同类型的损失函数,例如,第一损失函数和第二损失函数都可以为交叉熵损失函数。
比如,可以通过softmax算法对卷积神经网络的最后一层的输出进行处理,这一步通常是求取输出属于某一类的概率,对于单样本而言,输出就是一个向量。其中softmax的公式为:
Figure PCTCN2019120749-appb-000001
然后将softmax后的输出向量和样本的实际标签做一个交叉熵,公式如下:
H y’(y)=-∑ iy’ ilog(y is;
y’ i代表实际的标签中第i个的值,y i就是softmax的输出向量[Y1,Y2,Y3...]中第i个元素。
204、根据第一损失函数和第二损失函数构建基础模型对应的目标损失函数。
在一些实施方式中,可以将第一损失函数乘以损失系数,然后加上第二损失函数得到基础模型对应的目标函数,其中,损失系数可以是通过实验得到的参数,可以设置为0.8到1之间。
可以理解的是,根据基础模型的训练方向,对目标损失函数的设置也需要根据训练方向二调整,在第一损失函数之前设置损失系数与第一损失函数相乘,是对目标函数进行调整的一种方式。
205、将图像样本集输入至基础模型之中进行图像属性识别,以获取根据循环神经网络模型得到的第一训练结果和根据卷积神经网络模型得到的第二训练结果。
在输入层中输入图像样本集中的训练图像,其中该训练图像可以为人体图像,通过构建的基础模型及基础模型对应的目标函数对图像的属性进行识别。
其中,训练图像首先经过卷积层进行计算得到第一特征值,卷积层最后一层输出的第一特征值再分别输入至池化层和循环神经网络,其中循环神经网络输出的第二特征值输入至第一全连接层得到第一训练结果,池化层输入的第三特征值输入至第二全连接层得到第二训练结果。
需要说明的是,第一训练结果和第二训练结果不是完全相同的,基础模型得到的最终训练结果是根据第一训练结果和第二训练结果得到的。比如,可以对第一训练结果和第二训练结果取交集,交集中的目标训练结果就是基础模型得到的最终训练结果。
在一些实施方式中,还可以选择将第一训练结果和第二训练结果相加得到最终的训练结果,还可以根据预设规则在第一训练结果中和第二训练结果中选择部分训练结果作为最终的训练结果。
206、将第一训练结果、第二训练结果和图像属性初始值输入至目标损失函数得到目标损失值。
在一些实施例中,通过输入不同的训练图像,根据基础模型得到多个第一训练结果和第二训练结果,再根据第一训练结果和第二训练结果得到最终训练结果,通过将最终训练结果及图像属性初始值输入到目标损失函数中就可以得到目标损失值。
在一些实施例中,可以直接将第一训练结果和图像属性初始值输入到第一损失函数,将第二训练结果和图像属性初始值输入至第二损失函数,由于目标损失函数是根据第一损失函数和第二损失函数的来的,在得到第一训练结果和第二训练结果的情况下可以通过损失函数直接计算出目标损失值。将目标损失值和预设损失值进行对比,来判断基础模型是否收敛。比如,当目标损失值小于或者等于预设损失值时,则认你为目标损失值达到预设条件,此时认为基础模型收敛。
207、根据目标损失值对基础模型的参数进行调整,直至基础模型收敛。
在一些实施例中,当目标损失值未达到预设条件时,例如,目标损失值未在预设范围之内,或者目标损失值未达到预设损失值的时候,可以视为基础模型训练并未完成,基础模型输出的训练结果不能达到预期结果,所以需要对基础模型的模型参数进行调整。
在一些实施方式中,由于基础模型的建立是根据卷积神经网络模型和循环神经网络模型建立的,在对基础模型的参数进行调整时,可以对卷积神经网络模型和循环神经网络模型中的一些模型参数进行调整。其中,可以通过反向传播算法来对模型的参数进行调整。
208、将收敛的基础模型作为识别图像属性的识别模型。
其中,将收敛的基础模型作为图像属性识别的图像属性识别模型,可以将图像属性识 别模型应用于电子设备中,以根据图像属性识别模型对用户存储至电子设备中图像属性识别,既可以根据图像属性识别结果得到图像属性之间的关联性,还可以提高对图像属性识别的准确率。
例如,在收敛的基础模型中输入随机一张人体图像,能够准确识别出人体图像中穿戴的衣物类型及各衣物之间的关联性;或者能够准确识别人体五官特征及五官之间的关联性。说明收敛的基础模型已经能够准确识别出输入图像的属性及各属性之间的关联性了,收敛的基础模型可以作为图像属性识别模型。
综上所述,本申请实施例提供的网络模型训练方法,通过获取图像样本集及图像样本集包括的图像属性初始值,然后根据卷积神经网络和循环神经网络构建基础模型,对循环神经网络设置第一损失函数,对卷积神经网络设置第二损失函数,根据第一损失函数和第二损失函数得到目标损失函数;将图像样本集输入到基础模型中对基础模型进行训练,得到第一训练结果和第二训练结果,最后根据图像属性初始值、目标损失函数、第一训练结果和第二训练结果来对基础模型的参数进行调整,直至基础模型收敛,将收敛的模型作为图像属性识别模型,用于准确识别出图像属性及各属性之间的关联度。
请参阅图3,图3是本申请实施例提供的图像属性识别方法的流程示意图。该图像属性识别方法可以包括以下流程:
301、接收图像属性识别请求。
图像属性识别请求可以是通过电子设备接收触控操作、语音操作、接收相应目标应用的开启指令等方式进行触发。另外,还可以在间隔预设时长或者基于一定的触发规则去自动触发图像属性识别请求。例如,当检测到电子设备当前显示界面包括多个图像时,如检测到电子设备启动浏览器应用进行浏览包含图像的文章页面时,可以自动触发生成图像属性识别请求,根据图像属性识别模型对多个图像进行图像属性识别。使得电子设备可以准确识别出图像属性以及各属性之间的关联性。
在一些实施方式中,在用户使用电子设备进行网络购物时,可以在虚拟人体图像中输入多种衣物或者饰品,其中上衣、下衣、鞋子、帽子、耳环、项链等都可以是人体图像的属性信息,用户可以将这些物品输入到虚拟人体图像中,并穿戴在对应的位置以得到新的虚拟人体图像,在用户输入完成之后,就可以选择识别,电子设备接收图像属性识别请求对新的虚拟人体图像进行识别。
302、根据图像属性识别请求获取待识别图像。
在一些实施方式中,当用户浏览的界面包含多个图像时,用户可以在电子设备上点击特定位置或者用手指来划分区域来选择想要识别的图像,此时电子设备可以根据用户的选择位置或者区域获取待识别图像。
在一些实施方式中,用户在输入图像属性识别请求之后,电子设备可以根据图像识别请求主动获取用户需要识别的图像,例如,用户在浏览图片时,电子设备可以根据图像识 别请求主动寻找需要待识别图片。
303、根据图像属性识别请求识别待识别图像中的目标主体。
在一些实施例中,图像属性识别请求中包含了目标主体具体类型,当电子设备接收到图像属性识别请求时,能够对待识别图像中的目标主体进行获取,例如,用户一张合影照里面有多个人物,可以对需要识别的人物主体进行主体提取;一张风景照里有多种植物或动物,可以对需要识别的动物图像进行主体提取,在对图像属性识别的过程中,需要排除非识别对象,保留目标主体。
304、根据目标主体获取待识别图像中的目标图像。
在一些实施例中,获取目标主体后可以对目标主体所在的图像进行裁切以得到目标图像,可以在图像属性识别的过程中避免不需要识别的主体干扰目标主体的识别,使得在图像属性识别的过程中识别速度更快,识别结果更加准确。
305、调用预先训练的图像属性识别模型。
其中,图像属性识别模型采用本实施例提供的网络模型的训练方法训练得到的图像属性识别模型。具体网络模型的训练过程可以参见上述实施例的相关描述,在此不再赘述。
306、将目标图像输入至预先训练的图像属性识别模型,以得到图像属性识别结果。
其中,将目标图像输入至图像属性识别模型之后进行图像属性识别,得到图像中多个属性的识别结果以及各属性之间的关联性。
例如,在识别的人体图像中,图像属性识别模型能够识别出人体衣物之间的关联性,例如短裤和短袖之间的关联度为100%,牛仔裤和运动鞋之间的关联性为80%,帽子和眼镜之间的关联度为50%等等,从而得到各穿戴物、衣物之间的关联性,用户能够更好的参考如何对衣物进行搭配。
由上述可知,本申请实施例提供的图像属性识别方法,通过接收图像属性识别请求,根据图像属性识别请求获取待识别图像,然后调用预先训练的图像属性识别模型,将待识别图像输入至预先训练的图像属性识别模型,对待识别图像的图像属性进行识别,以得到图像属性识别结果,从而得到图像各属性之间的关联性。
请参阅图4,图4是本申请实施例提供的网络模型的训练装置的结构示意图。该网络模型的训练装置400可以包括:第一获取模块410、构建模块420、第一识别模块430、训练模块440和确定模块450。
第一获取模块410,用于获取图像样本集,所述图像样本集包括多个图像属性初始值;
构建模块420,用于构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
第一识别模块430,用于将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
训练模块440,用于根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
确定模块450,用于将收敛的所述基础模型作为识别图像属性的识别模型。
在一些实施方式中,构建模块420包括设置子模块421、第一连接子模块422、第二连接子模块423,其中,设置子模块421,用于设置卷积层、池化层、第一全连接层及第二全连接层;第一连接子模块422,用于将所述卷积层、所述池化层及所述第二全连接层连接,以得到所述卷积神经网络模型;第二连接子模块423,用于将所述循环神经网络与所述卷积层连接,以及将所述循环神经网络与所述第一全连接层连接,以得到所述循环神经网络模型。
构建模块420,具体用于构建所述卷积神经网络模型对应的第一损失函数,构建所述循环神经网络模型对应的第二损失函数,根据所述第一损失函数和所述第二损失函数得到所述基础模型对应的目标损失函数。例如,可以将所述第二损失函数乘以损失系数以得到目标第二损失函数,将所述目标第二损失函数与所述第一损失函数相加得到所述目标损失函数。
在一些实施例中,训练模块440,具体用于将所述第一训练结果、所述第二训练结果和所述图像属性初始值输入至所述目标损失函数,以得到目标损失值;根据所述目标损失值对所述基础模型的参数进行调整。
在一些实施例中,确定模块450,具体用于将所述图像样本集输入至所述卷积层以得到所述第一特征值;将所述第一特征值输入至所述循环神经网络以得到所述第二特征值;将所述第二特征值输入至所述第一全连接层以得到所述第一训练结果。
由上述可知,第一获取模块410获取图像样本集,图像样本集包括多个图像属性初始值;构建模块420构建基础模型及所述基础模型对应的目标损失函数,基础模型包括卷积神经网络模型和循环神经网络模型;第一识别模块430将所述图像样本集输入至基础模型之中进行图像属性识别,以获取根据循环神经网络模型得到的第一训练结果和根据卷积神经网络模型得到的第二训练结果;训练模块440根据第一训练结果、第二训练结果、图像属性初始值和目标损失函数对卷积神经网络模型和循环神经网络模型进行联合训练,直至基础模型收敛;确定模块450将收敛的基础模型作为识别图像属性的识别模型。训练后的基础网络模型,可以提高图像属性识别的准确率,以及识别出各个图像属性之间的关联性。
应当说明的是,本申请实施例提供的网络模型的训练装置与上文实施例中的网络模型的训练方法属于同一构思,在网络模型的训练装置上可以运行网络模型的训练方法实施例中提供的任一方法,其具体实现过程详见网络模型的训练方法实施例,此处不再赘述。
请参阅图5为本申请实施例提供的图像属性识别装置的结构示意图。该图像属性识别装置500可以包括:接收模块510、第二获取模块520、调用模块530和第二识别模块540。
接收模块510,用于接收图像属性识别请求;
第二获取模块520,用于根据所述图像属性识别请求获取待识别图像;
调用模块530,用于调用预先训练的图像属性识别模型;
第二识别模块540,用于将所述待识别图像输入至预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果。
在一些实施方式中,第二获取模块520具体用于根据图像属性识别请求识别所述待识别图像中的目标主体,根据所述目标主体获取所述待识别图像中的目标图像。
由上述可知,本申请实施例提供的图像属性识别装置500通过接收模块510接收图像属性识别请求;第二获取模块520根据图像属性识别请求获取待识别图像;调用模块530调用预先训练的图像属性识别模型;第二识别模块540将待识别图像输入至预先训练的图像属性识别模型,对待识别图像的图像属性进行识别,以得到图像属性识别结果。通过上述网络模型训练方法训练的图像属性识别装置,能够准确识别出图像中的各个属性以及各属性之间的关联性,提高了图像属性识别的准确率。
应当说明的是,本申请实施例提供的图像属性识别装置与上文实施例中的图像属性识别方法属于同一构思,在图像属性识别装置上可以运行图像属性是被方法实施例中提供的任一方法,其具体实现过程详见图像的处理方法实施例,此处不再赘述。
本申请实施例提供一种计算机可读的存储介质,其上存储有计算机程序,当其存储的计算机程序在计算机上执行时,使得计算机执行如本申请实施例提供的网络模型的训练方法或图像的处理方法。
其中,存储介质可以是磁碟、光盘、只读存储器(Read Only Memory,ROM,)或者随机存取器(Random Access Memory,RAM)等。
本申请实施例还提供一种电子设备,包括存储器,处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行如本申请实施例提供的网络模型的训练方法或图像属性识别方法。
例如,上述电子设备可以是诸如平板电脑或者智能手机等移动终端。请参阅图6,图6为本申请实施例提供的电子设备的第一种结构示意图。
该电子设备500可以包括存储器601、处理器602等部件。本领域技术人员可以理解,图7中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
存储器601可用于存储软件程序以及模块,处理器602通过运行存储在存储器601的计算机程序以及模块,从而执行各种功能应用以及数据处理。存储器601可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据等。
处理器602是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器601内的应用程序,以及调用存储在存储器601内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
此外,存储器601可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器601还可以包括存储器控制器,以提供处理器602对存储器601的访问。
在本实施例中,电子设备中的处理器602会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器601中,并由处理器602来运行存储在存储器601中的应用程序,从而实现流程:
获取图像样本集,所述图像样本集包括多个图像属性初始值;
构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
将收敛的所述基础模型作为识别图像属性的识别模型。
在一些实施方式中,处理器602执行构建所述基础模型对应的目标损失函数时,可以执行:
构建所述卷积神经网络模型对应的第一损失函数;
构建所述循环神经网络模型对应的第二损失函数;
根据所述第一损失函数和所述第二损失函数得到所述基础模型对应的目标损失函数。
具体的,处理器602在执行根据所述第一损失函数和所述第二损失函数得到所述基础模型对应的目标损失函数时,可以执行:
将所述第二损失函数乘以损失系数以得到目标第二损失函数;
将所述目标第二损失函数与所述第一损失函数相加得到所述目标损失函数。
在一些实施方式中,处理器602执行根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛时,可以执行:
将所述第一训练结果、所述第二训练结果和所述图像属性初始值输入至所述目标损失函数,以得到目标损失值;
根据所述目标损失值对所述基础模型的参数进行调整。
在一些实施方式中,处理器602执行构建基础模型时,可以执行:
设置卷积层、池化层、第一全连接层及第二全连接层;
将所述卷积层、所述池化层及所述第二全连接层连接,以得到所述卷积神经网络模型;
将所述循环神经网络与所述卷积层连接,以及将所述循环神经网络与所述第一全连接层连接,以得到所述循环神经网络模型。
在一些实施方式中,处理器602执行将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果时,可以执行:
将所述图像样本集输入至所述卷积层以得到所述第一特征值;
将所述第一特征值输入至所述循环神经网络以得到所述第二特征值;
将所述第二特征值输入至所述第一全连接层以得到所述第一训练结果。
在本实施例中,电子设备中的处理器602会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器601中,并由处理器501来运行存储在存储器601中的应用程序,从而实现流程:
接收图像属性识别请求;
根据所述图像属性识别请求获取待识别图像;
调用预先训练的图像属性识别模型;
将所述待识别图像输入至预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果。
在一些实施方式中,处理器602执行根据所述图像属性识别请求获取待识别图像时,可以执行:
根据图像属性识别请求识别所述待识别图像中的目标主体;
根据所述目标主体获取所述待识别图像中的目标图像。
请参照图7,图7为本申请实施例提供的电子设备的第二结构示意图,与图6所示电子设备的区别在于,电子设备还包括:摄像组件603、射频电路604、音频电路605以及电源606。其中,显示器603、射频电路604、音频电路605以及电源606分别与处理器602电性连接。
该显示器603可以用于显示由用户输入的信息或提供给用户的信息以及各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示器603可以包括显示面板,在某些实施方式中,可以采用液晶显示器(Liquid Crystal Display,LCD)、或者有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。
射频电路604可以用于收发射频信号,以通过无线通信与网络设备或其他电子设备建立无线通讯,与网络设备或其他电子设备之间收发信号。
音频电路605可以用于通过扬声器、传声器提供用户与电子设备之间的音频接口。
电源606可以用于给电子设备600的各个部件供电。在一些实施例中,电源606可以通过电源管理系统与处理器602逻辑相连,从而通过电源管理系统实现管理充电、放电、 以及功耗管理等功能。
尽管图7中未示出,电子设备600还可以包括摄像组件、蓝牙模块等,摄像组件可以包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义图像信号处理(Image Signal Processing)管线的各种处理单元。图像处理电路至少可以包括:多个摄像头、图像信号处理器(Image Signal Processor,ISP处理器)、控制逻辑器、图像存储器以及显示器等。其中每个摄像头至少可以包括一个或多个透镜和图像传感器。图像传感器可包括色彩滤镜阵列(如Bayer滤镜)。图像传感器可获取用图像传感器的每个成像像素捕捉的光强度和波长信息,并提供可由图像信号处理器处理的一组原始图像数据。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对网络模型的训练方法/图像的处理方法的详细描述,此处不再赘述。
本申请实施例提供的所述网络模型的训练方法/图像的处理方法装置与上文实施例中的网络模型的训练方法/图像的处理方法属于同一构思,在所述网络模型的训练方法/图像的处理方法装置上可以运行所述网络模型的训练方法/图像的处理方法实施例中提供的任一方法,其具体实现过程详见所述网络模型的训练方法/图像的处理方法实施例,此处不再赘述。
需要说明的是,对本申请实施例所述网络模型的训练方法/图像的处理方法而言,本领域普通技术人员可以理解实现本申请实施例所述网络模型的训练方法/图像的处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在存储器中,并被至少一个处理器执行,在执行过程中可包括如所述网络模型的训练方法/图像的处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。
对本申请实施例的所述网络模型的训练方法/图像的处理方法装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
以上对本申请实施例所提供的一种网络模型的训练方法、图像的处理方法、装置、存储介质及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种网络模型训练方法,其中,所述方法包括:
    获取图像样本集,所述图像样本集包括多个图像属性初始值;
    构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
    将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
    根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
    将收敛的所述基础模型作为识别图像属性的识别模型。
  2. 根据权利要求1所述的网络模型训练方法,其中,所述构建所述基础模型对应的目标损失函数,包括:
    构建所述卷积神经网络模型对应的第一损失函数;
    构建所述循环神经网络模型对应的第二损失函数;
    根据所述第一损失函数和所述第二损失函数得到所述基础模型对应的目标损失函数。
  3. 根据权利要求2所述的网络模型训练方法,其中,所述根据所述第一损失函数和所述第二损失函数得到所述基础模型对应的目标损失函数,包括:
    将所述第二损失函数乘以损失系数以得到目标第二损失函数;
    将所述目标第二损失函数与所述第一损失函数相加得到所述目标损失函数。
  4. 根据权利要求3所述的网络模型训练方法,其中,所述根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛,包括:
    将所述第一训练结果、所述第二训练结果和所述图像属性初始值输入至所述目标损失函数,以得到目标损失值;
    根据所述目标损失值对所述基础模型的参数进行调整。
  5. 根据权利要求1所述的图像属性识别方法,其中,所述构建基础模型,包括:
    设置卷积层、池化层、第一全连接层及第二全连接层;
    将所述卷积层、所述池化层及所述第二全连接层连接,以得到所述卷积神经网络模型;
    将所述循环神经网络与所述卷积层连接,以及将所述循环神经网络与所述第一全连接层连接,以得到所述循环神经网络模型。
  6. 根据权利要求5所述的网络模型,其中,所述将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果,包括:
    将所述图像样本集输入至所述卷积层以得到所述第一特征值;
    将所述第一特征值输入至所述循环神经网络以得到所述第二特征值;
    将所述第二特征值输入至所述第一全连接层以得到所述第一训练结果。
  7. 一种图像属性识别方法,其中,所述方法包括:
    接收图像属性识别请求;
    根据所述图像属性识别请求获取待识别图像;
    调用预先训练的图像属性识别模型;
    将所述待识别图像输入至所述预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果;
    其中,所述图像属性识别模型为采用权利要求1至6任一项所述的所述网络模型的训练方法训练得到的图像属性识别模型。
  8. 根据权利要求7所述的图像属性识别方法,其中,所述根据所述图像属性识别请求获取待识别图像,包括:
    根据图像属性识别请求识别所述待识别图像中的目标主体;
    根据所述目标主体获取所述待识别图像中的目标图像。
  9. 一种网络模型的训练装置,其中,包括:
    第一获取模块,用于获取图像样本集,所述图像样本集包括多个图像属性初始值;
    构建模块,用于构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
    第一识别模块,用于将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
    训练模块,用于根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
    确定模块,用于将收敛的所述基础模型作为识别图像属性的识别模型。
  10. 根据权利要求9所述的训练装置,其中,所述构建模块包括:
    设置子模块,用于设置卷积层、池化层、第一全连接层及第二全连接层;
    第一连接子模块,用于将所述卷积层、所述池化层及所述第二全连接层连接,以得到所述卷积神经网络模型;
    第二连接子模块,用于将所述循环神经网络与所述卷积层连接,以及将所述循环神经网络与所述第一全连接层连接,以得到所述循环神经网络模型。
  11. 一种图像属性的识别装置,其中,包括:
    接收模块,用于接收图像属性识别请求;
    第二获取模块,用于根据所述图像属性识别请求获取待识别图像;
    调用模块,用于调用预先训练的图像属性识别模型;
    第二识别模块,用于将所述待识别图像输入至所述预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果;
    其中,所述图像属性识别模型为采用权利要求1至6任一项所述的网络模型训练方法得到的图像属性识别模型。
  12. 一种存储介质,其中,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行权利要求1至6任一项所述的网络模型训练方法或权利要求7、8所述的图像属性识别方法。
  13. 一种电子设备,其中,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
    获取图像样本集,所述图像样本集包括多个图像属性初始值;
    构建基础模型及所述基础模型对应的目标损失函数,所述基础模型包括卷积神经网络模型和循环神经网络模型;
    将所述图像样本集输入至所述基础模型之中进行图像属性识别,以获取根据所述循环神经网络模型得到的第一训练结果和根据所述卷积神经网络模型得到的第二训练结果;
    根据所述第一训练结果、所述第二训练结果、所述图像属性初始值和所述目标损失函数对所述卷积神经网络模型和所述循环神经网络模型进行联合训练,直至所述基础模型收敛;
    将收敛的所述基础模型作为识别图像属性的识别模型。
  14. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    构建所述卷积神经网络模型对应的第一损失函数;
    构建所述循环神经网络模型对应的第二损失函数;
    根据所述第一损失函数和所述第二损失函数得到所述基础模型对应的目标损失函数。
  15. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:
    将所述第二损失函数乘以损失系数以得到目标第二损失函数;
    将所述目标第二损失函数与所述第一损失函数相加得到所述目标损失函数。
  16. 根据权利要求15所述的电子设备,其中,所述处理器用于执行:
    将所述第一训练结果、所述第二训练结果和所述图像属性初始值输入至所述目标损失函数,以得到目标损失值;
    根据所述目标损失值对所述基础模型的参数进行调整。
  17. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    设置卷积层、池化层、第一全连接层及第二全连接层;
    将所述卷积层、所述池化层及所述第二全连接层连接,以得到所述卷积神经网络模型;
    将所述循环神经网络与所述卷积层连接,以及将所述循环神经网络与所述第一全连接 层连接,以得到所述循环神经网络模型,所述循环神经网络模型和所述卷积神经网络模型组合形成所述基础模型。
  18. 根据权利要求17所述的电子设备,其中,所述处理器用于执行:
    将所述图像样本集输入至所述卷积层以得到所述第一特征值;
    将所述第一特征值输入至所述循环神经网络以得到所述第二特征值;
    将所述第二特征值输入至所述第一全连接层以得到所述第一训练结果。
  19. 一种电子设备,其中,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
    接收图像属性识别请求;
    根据所述图像属性识别请求获取待识别图像;
    调用预先训练的图像属性识别模型;
    将所述待识别图像输入至所述预先训练的图像属性识别模型,对所述待识别图像的图像属性进行识别,以得到图像属性识别结果;
    其中,所述图像属性识别模型为采用权利要求1至6任一项所述的所述网络模型的训练方法训练得到的图像属性识别模型。
  20. 根据权利要求19所述的电子设备,其中,所述处理器用于执行:
    根据图像属性识别请求识别所述待识别图像中的目标主体;
    根据所述目标主体获取所述待识别图像中的目标图像。
PCT/CN2019/120749 2019-11-25 2019-11-25 网络模型训练方法、图像属性识别方法、装置及电子设备 WO2021102655A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980100863.7A CN114450690A (zh) 2019-11-25 2019-11-25 网络模型训练方法、图像属性识别方法、装置及电子设备
PCT/CN2019/120749 WO2021102655A1 (zh) 2019-11-25 2019-11-25 网络模型训练方法、图像属性识别方法、装置及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/120749 WO2021102655A1 (zh) 2019-11-25 2019-11-25 网络模型训练方法、图像属性识别方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2021102655A1 true WO2021102655A1 (zh) 2021-06-03

Family

ID=76129757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120749 WO2021102655A1 (zh) 2019-11-25 2019-11-25 网络模型训练方法、图像属性识别方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN114450690A (zh)
WO (1) WO2021102655A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505848A (zh) * 2021-07-27 2021-10-15 京东科技控股股份有限公司 模型训练方法和装置
CN113570707A (zh) * 2021-07-30 2021-10-29 集美大学 三维人体重建方法、装置、计算机设备及存储介质
CN113673498A (zh) * 2021-07-28 2021-11-19 浙江大华技术股份有限公司 目标检测方法、装置、设备和计算机可读存储介质
CN113822199A (zh) * 2021-09-23 2021-12-21 浙江大华技术股份有限公司 对象属性识别方法、装置、存储介质和电子装置
CN114092746A (zh) * 2021-11-29 2022-02-25 北京易华录信息技术股份有限公司 一种多属性识别方法、装置、存储介质及电子设备
CN114358692A (zh) * 2022-01-07 2022-04-15 拉扎斯网络科技(上海)有限公司 配送时长调整方法、装置及电子设备
CN114463559A (zh) * 2022-01-29 2022-05-10 新疆爱华盈通信息技术有限公司 图像识别模型的训练方法、装置、网络和图像识别方法
CN114494055A (zh) * 2022-01-19 2022-05-13 西安交通大学 基于循环神经网络的鬼成像方法、系统、设备及存储介质
CN114528977A (zh) * 2022-01-24 2022-05-24 北京智源人工智能研究院 一种等变网络训练方法、装置、电子设备及存储介质
US20220172462A1 (en) * 2020-02-13 2022-06-02 Tencent Technology (Shenzhen) Company Limited Image processing method, apparatus, and device, and storage medium
CN116698408A (zh) * 2023-06-09 2023-09-05 西安理工大学 轴承润滑状态检测方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596090A (zh) * 2018-04-24 2018-09-28 北京达佳互联信息技术有限公司 人脸图像关键点检测方法、装置、计算机设备及存储介质
CN109522942A (zh) * 2018-10-29 2019-03-26 中国科学院深圳先进技术研究院 一种图像分类方法、装置、终端设备和存储介质
CN109871444A (zh) * 2019-01-16 2019-06-11 北京邮电大学 一种文本分类方法及系统
CN110298266A (zh) * 2019-06-10 2019-10-01 天津大学 基于多尺度感受野特征融合的深度神经网络目标检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596090A (zh) * 2018-04-24 2018-09-28 北京达佳互联信息技术有限公司 人脸图像关键点检测方法、装置、计算机设备及存储介质
CN109522942A (zh) * 2018-10-29 2019-03-26 中国科学院深圳先进技术研究院 一种图像分类方法、装置、终端设备和存储介质
CN109871444A (zh) * 2019-01-16 2019-06-11 北京邮电大学 一种文本分类方法及系统
CN110298266A (zh) * 2019-06-10 2019-10-01 天津大学 基于多尺度感受野特征融合的深度神经网络目标检测方法

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220172462A1 (en) * 2020-02-13 2022-06-02 Tencent Technology (Shenzhen) Company Limited Image processing method, apparatus, and device, and storage medium
US12033374B2 (en) * 2020-02-13 2024-07-09 Tencent Technology (Shenzhen) Company Limited Image processing method, apparatus, and device, and storage medium
CN113505848A (zh) * 2021-07-27 2021-10-15 京东科技控股股份有限公司 模型训练方法和装置
CN113505848B (zh) * 2021-07-27 2023-09-26 京东科技控股股份有限公司 模型训练方法和装置
CN113673498A (zh) * 2021-07-28 2021-11-19 浙江大华技术股份有限公司 目标检测方法、装置、设备和计算机可读存储介质
CN113570707A (zh) * 2021-07-30 2021-10-29 集美大学 三维人体重建方法、装置、计算机设备及存储介质
CN113822199A (zh) * 2021-09-23 2021-12-21 浙江大华技术股份有限公司 对象属性识别方法、装置、存储介质和电子装置
CN114092746A (zh) * 2021-11-29 2022-02-25 北京易华录信息技术股份有限公司 一种多属性识别方法、装置、存储介质及电子设备
CN114358692A (zh) * 2022-01-07 2022-04-15 拉扎斯网络科技(上海)有限公司 配送时长调整方法、装置及电子设备
CN114494055A (zh) * 2022-01-19 2022-05-13 西安交通大学 基于循环神经网络的鬼成像方法、系统、设备及存储介质
CN114494055B (zh) * 2022-01-19 2024-02-06 西安交通大学 基于循环神经网络的鬼成像方法、系统、设备及存储介质
CN114528977A (zh) * 2022-01-24 2022-05-24 北京智源人工智能研究院 一种等变网络训练方法、装置、电子设备及存储介质
CN114463559A (zh) * 2022-01-29 2022-05-10 新疆爱华盈通信息技术有限公司 图像识别模型的训练方法、装置、网络和图像识别方法
CN114463559B (zh) * 2022-01-29 2024-05-10 芯算一体(深圳)科技有限公司 图像识别模型的训练方法、装置、网络和图像识别方法
CN116698408A (zh) * 2023-06-09 2023-09-05 西安理工大学 轴承润滑状态检测方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114450690A (zh) 2022-05-06

Similar Documents

Publication Publication Date Title
WO2021102655A1 (zh) 网络模型训练方法、图像属性识别方法、装置及电子设备
CN109902548B (zh) 一种对象属性识别方法、装置、计算设备及系统
WO2019174439A1 (zh) 图像识别方法、装置、终端和存储介质
KR102425578B1 (ko) 객체를 인식하는 방법 및 장치
US9875258B1 (en) Generating search strings and refinements from an image
WO2021227726A1 (zh) 面部检测、图像检测神经网络训练方法、装置和设备
WO2022016556A1 (zh) 一种神经网络蒸馏方法以及装置
US20180088677A1 (en) Performing operations based on gestures
CN110009052A (zh) 一种图像识别的方法、图像识别模型训练的方法及装置
KR20230141688A (ko) 디바이스가 이미지를 보정하는 방법 및 그 디바이스
US10346893B1 (en) Virtual dressing room
WO2020078119A1 (zh) 模拟用户穿戴服装饰品的方法、装置和系统
CN107679447A (zh) 面部特征点检测方法、装置及存储介质
KR102056806B1 (ko) 영상 통화 서비스를 제공하는 단말과 서버
Zhang et al. Facial smile detection based on deep learning features
WO2021092808A1 (zh) 网络模型的训练方法、图像的处理方法、装置及电子设备
CN113449573A (zh) 一种动态手势识别方法及设备
WO2019231130A1 (ko) 전자 장치 및 그의 제어방법
WO2021047587A1 (zh) 手势识别方法、电子设备、计算机可读存储介质和芯片
WO2019120031A1 (zh) 服饰搭配推荐方法、装置、存储介质及移动终端
CN112257645B (zh) 人脸的关键点定位方法和装置、存储介质及电子装置
KR102637342B1 (ko) 대상 객체를 추적하는 방법과 장치 및 전자 장치
CN111104911A (zh) 一种基于大数据训练的行人重识别方法及装置
CN115223239A (zh) 一种手势识别方法、系统、计算机设备以及可读存储介质
KR102476619B1 (ko) 전자 장치 및 이의 제어 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19953924

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19953924

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.07.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19953924

Country of ref document: EP

Kind code of ref document: A1