CN112116589A - Method, device and equipment for evaluating virtual image and computer readable storage medium - Google Patents

Method, device and equipment for evaluating virtual image and computer readable storage medium Download PDF

Info

Publication number
CN112116589A
CN112116589A CN202011060263.3A CN202011060263A CN112116589A CN 112116589 A CN112116589 A CN 112116589A CN 202011060263 A CN202011060263 A CN 202011060263A CN 112116589 A CN112116589 A CN 112116589A
Authority
CN
China
Prior art keywords
image
score
training
network model
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011060263.3A
Other languages
Chinese (zh)
Other versions
CN112116589B (en
Inventor
卢建东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011060263.3A priority Critical patent/CN112116589B/en
Publication of CN112116589A publication Critical patent/CN112116589A/en
Application granted granted Critical
Publication of CN112116589B publication Critical patent/CN112116589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/92Dynamic range modification of images or parts thereof based on global image properties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an evaluation method, an evaluation device, evaluation equipment and a computer-readable storage medium of an avatar, wherein the method comprises the following steps: obtaining at least one image to be evaluated, wherein the at least one image corresponds to the same virtual image; inputting the at least one image into a trained neural network model to obtain an image evaluation score of the virtual image; training data of the neural network model during training at least comprise a training image of an avatar and a distorted image obtained by performing distortion processing on the training image; and outputting the image evaluation score of the virtual image. Through the application, the accurate evaluation of the virtual image can be realized.

Description

Method, device and equipment for evaluating virtual image and computer readable storage medium
Technical Field
The embodiment of the application relates to the technical field, in particular to but not limited to a method, a device and equipment for evaluating an avatar and a computer-readable storage medium.
Background
The appearance of a virtual character in a game, caricature, or animation can greatly affect the attractiveness of the game, caricature, or animation. The cool and attractive appearance design can further stimulate the interest of people in playing games or watching cartoons and cartoons, so that the user population of the games, the cartoons or the cartoons can be expanded. At present, after a character designer designs a virtual character, the appearance design of the virtual character is generally evaluated manually. But the aesthetic tendencies of different people are different, so that the aesthetic performance of the evaluating personnel cannot be ensured to be matched with the preference of the public, thereby influencing the attractiveness of games, cartoons or animations to users.
Disclosure of Invention
The embodiment of the application provides an evaluation method, an evaluation device, evaluation equipment and a computer-readable storage medium of an avatar, which can realize accurate evaluation of the avatar. .
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides an evaluation method of an avatar, which comprises the following steps:
obtaining at least one image to be evaluated, wherein the at least one image corresponds to the same virtual image;
inputting the at least one image into a trained neural network model to obtain an evaluation score of the virtual image;
training data of the neural network model during training at least comprise a training image of an avatar and a distorted image obtained by performing distortion processing on the training image;
and outputting the evaluation score of the virtual image.
The embodiment of the application provides an evaluation device of an avatar, which comprises:
the first acquisition module is used for acquiring at least one image to be evaluated, and the at least one image corresponds to the same virtual image;
the prediction module is used for inputting the at least one image into the trained neural network model to obtain the image evaluation score of the virtual image;
training data of the neural network model during training at least comprise a training image of an avatar and a distorted image obtained by performing distortion processing on the training image;
and the first output module is used for outputting the image evaluation score of the virtual image.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring a plurality of training images of the virtual image and image scores corresponding to the training images;
the distortion processing module is used for respectively carrying out distortion processing on the training images to obtain a plurality of distorted images corresponding to the training images;
and the model training module is used for training a preset neural network model based on the training images, the image score and the distortion images to obtain the trained neural network model.
In some embodiments, the apparatus further comprises:
the third acquisition module is used for acquiring sales information corresponding to each training image, and the sales information at least comprises sales price and sales volume;
the fourth acquisition module is used for acquiring the mark scores of the training images;
and the weighting processing module is used for carrying out weighted average processing on the sale price, the sale quantity and the mark score of each training image to obtain the image score of each training image.
In some embodiments, the distortion handling module is further configured to:
determining at least one candidate region from each training image;
determining a target region from the at least one candidate region;
and setting the pixel value of the pixel point in the target area as a preset pixel value to obtain a distortion image corresponding to each training image.
In some embodiments, the neural network model includes a first sub-network model and a second sub-network model, and the model training module is further configured to:
sequentially inputting each training image into the first sub-network model to obtain a first prediction score of each training image;
sequentially inputting at least one distorted image corresponding to each training image into the second sub-network model to obtain a second prediction score of each distorted image;
and performing back propagation training on the neural network model by using the first prediction score, the second prediction score and the image score so as to adjust parameters of the neural network model and obtain the trained neural network model.
In some embodiments, the model training module is further configured to:
determining a first score difference value between a training image and a distorted image corresponding to the training image based on the first prediction score and the second prediction score;
determining a second score difference based on the first prediction score and the avatar score;
back propagating the first score difference and the second score difference to the neural network model,
performing joint training on the neural network model by using the first loss function and the second loss function so as to adjust parameters of the neural network model to obtain a trained neural network model;
the first loss function is used to constrain the first prediction score to be higher than the second prediction score, and the second loss function is used to constrain the second score difference to be less than the difference threshold.
In some embodiments, the model training module is further configured to:
determining a first target training image and a second target training image, wherein the image score of the first target training image is higher than that of the second target training image;
acquiring a first prediction score of the first target training image and a second prediction score of a first distorted image corresponding to the first target training image, and acquiring a first prediction score of the second target training image and a second prediction score of a distorted image corresponding to the second target training image;
jointly training the neural network model by utilizing the first prediction score of the first target training image, the image score of the first target training image, the second prediction score of the first distorted image, the first prediction score of the second target training image, the image score of the second target training image, the second prediction score of the second distorted image, the first loss function and the second loss function so as to adjust the parameters of the neural network model and obtain a trained neural network model;
the first loss function is used for constraining the second prediction score of the first distorted image to be higher than the second prediction score of the second distorted image, the first prediction score of the first target training image is higher than the second prediction score of the first distorted image, and the first prediction score of the second target training image is higher than the second prediction score of the second distorted image.
In some embodiments, the first obtaining module is further configured to:
acquiring image design data of the virtual image;
based on the avatar design data, at least one avatar image is obtained when the avatar is in a different pose.
In some embodiments, the prediction module is further to:
inputting the at least one image into a trained neural network model to obtain the evaluation score of each image;
and determining the image evaluation score of the virtual image based on the evaluation score of each image.
In some embodiments, the apparatus further comprises:
a second output module for outputting a first prompt message when the image evaluation score of the avatar is lower than a preset score threshold,
the first prompt message is used for prompting that the image of the virtual image needs to be optimized.
The embodiment of the application provides an evaluation device of an avatar, which comprises:
a memory for storing executable instructions; a processor, configured to execute the executable instructions stored in the memory, to implement the method described above.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to implement the above-mentioned method when executed.
The embodiment of the application has the following beneficial effects:
after at least one image to be evaluated aiming at the same virtual image is obtained, the at least one image is input to a trained neural network model to obtain an image evaluation score of the virtual image, and the image evaluation score of the virtual image is output.
Drawings
FIG. 1 is a schematic diagram of a network model in skin evaluation using a SER-FIQ method in the related art;
fig. 2 is a schematic network architecture diagram of an image evaluation system 20 for an avatar according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an evaluation server 300 according to an embodiment of the present application;
fig. 4 is a schematic flow chart of an implementation of the method for evaluating an avatar according to the embodiment of the present application;
fig. 5A is a schematic flow chart of an implementation of parameter adjustment on a neural network model according to an embodiment of the present application;
fig. 5B is a schematic flow chart of another implementation of parameter adjustment on a neural network model according to the embodiment of the present application;
fig. 6 is a schematic flow chart of another implementation of the method for evaluating an avatar according to the embodiment of the present application;
FIG. 7 is a schematic diagram of a skin image collected according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a distorted skin image generated by an embodiment of the present application;
FIG. 9 is a schematic diagram of a training process of a neural network model provided in an embodiment of the present application;
fig. 10 is a schematic view illustrating an implementation process of skin evaluation by using a skin beauty evaluation system according to an embodiment of the present application;
fig. 11 is a schematic interface diagram for evaluation by using a skin beauty evaluation system according to an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments of the present application belong. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) The twin neural network comprises two sub-networks, the network structures and parameters of the two sub-networks are the same, the twin neural network takes two samples as input and outputs the characterization of the embedded high-dimensional space so as to compare the similarity degree of the two samples.
2) The Loss Function (Loss Function), also called cost Function (cost Function), is a Function that maps the value of a random event or its related random variables to non-negative real numbers to represent the "risk" or "Loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function. Parameter estimation, which is used for models in statistics and machine learning, for example, is an optimization goal of machine learning models.
In order to better understand the method for evaluating the appearance of the virtual character provided in the embodiment of the present application, a method for evaluating the appearance of the virtual character in the related art is first described:
in the related art, when performing Image evaluation of a virtual character, the skin beauty of the virtual character is generally evaluated, and a method for referring to the Quality of a Face Image during implementation, such as an Unsupervised Face Quality evaluation (SER-FIQ) method, is used. The SER-FIQ method first encodes the face image using a trained face recognition model and then generates m sub-networks using m different Dropout patterns in the middle layer. Wherein, the structure diagram of the sub-network is shown in fig. 1:
since different sub-networks will generate different random face features xsThus, the randomly embedded features of the face image I are represented as a set x (I) ═ { x @s}s∈{1,2,...,m}. The face image quality score can be determined by the formula (1-1):
Figure BDA0002712124950000071
wherein d (x)i,xj) Is a characteristic xiAnd xjThe euclidean distance between them. sigmoid function σ (-) ensures a quality score q ∈ [0,1]]。
The technical scheme is based on the relation between the robustness of the random embedded features and the quality of the face image when being realized. Increasing the number of random subnetworks results in a significant increase in the computational load of the model, which, in turn, significantly reduces the accuracy of the prediction. Meanwhile, the SER-FIQ method is used for extracting features and calculating feature distances by utilizing a pre-trained face recognition model, and the distribution difference between a skin image and a face image is large, so that the method cannot be directly used for predicting the attractiveness of the appearance design of the virtual character.
Based on this, in the embodiment of the present application, a method for evaluating the appearance of a virtual character is provided, which is different from the SER-FIQ method, and the skin beauty evaluation method based on the pair-wise ordering can directly use the skin image for model training when being implemented. Secondly, the original hero skin image should have a higher aesthetic score than the distorted hero skin image, so this a priori information is used to constrain the learning of skin aesthetic features. Finally, a large number of distorted skin images are generated by using a random area erasing method, so that the problem of insufficient training samples can be well relieved.
An exemplary application of the evaluation apparatus for an avatar provided in the embodiments of the present application is described below, and the evaluation apparatus for an avatar provided in the embodiments of the present application may be implemented as a terminal such as a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent robot, or may be implemented as a server. Next, an exemplary application when the evaluation apparatus of the avatar is implemented as a server will be described.
Referring to fig. 2, fig. 2 is a schematic network architecture diagram of an image evaluation system 20 of an avatar according to an embodiment of the present application. As shown in fig. 2, the avatar evaluation system 20 includes a design terminal 100, a network 200, an evaluation server 300, a service server 400, and a user terminal 500. An application program runs on the design terminal 100, and when the evaluation method of the virtual image according to the embodiment of the present application is implemented, an image designer of the virtual character designs the image of the virtual character through the design terminal 100, and further, can design skin color, clothes, decorations, equipment, and the like. After the image design of the virtual character is completed to obtain the image, the design terminal 100 responds to the evaluation instruction, the image to be evaluated is sent to the evaluation server 300, the evaluation server 300 evaluates the image by using the trained neural network model and obtains the aesthetic evaluation score, and the evaluation server 300 returns the obtained aesthetic evaluation score to the design terminal 100, so that an image designer determines whether to optimize the image or confirm the design draft according to the aesthetic evaluation score. When determining the design finalization, the avatar design data of the avatar may be transmitted to the service server 400 through the design terminal 100, and when transmitting the avatar design data, the avatar evaluation score may also be simultaneously transmitted, so that the service server 400 determines a recommendation order of the avatars according to the avatar evaluation score, and recommends each avatar to the user terminal 500 based on the recommendation order.
The evaluation server 300 may be a server dedicated to image evaluation, or may be the same server as the service server, for example, a game server, a video server, etc. corresponding to the virtual image.
In some embodiments, based on the network architecture of fig. 2, the evaluation server 300 may also determine a training image based on the collected labeled original image and the collected distorted image, train the initial neural network model by using the training image to obtain a trained neural network model, and then the evaluation server 300 sends the trained neural network model to the design terminal 100. After the image design of the virtual character is completed to obtain the image, the design terminal 100 evaluates the image to be evaluated by using the trained neural network model in response to the evaluation instruction to obtain and output the aesthetic evaluation score. And the image designer determines whether to optimize the image or confirm the design draft according to the aesthetic evaluation score.
In this embodiment of the application, the evaluation server 300 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
In some embodiments, the design terminal may further acquire a training image and a distorted image, determine the training image, and train the initial neural network model with the training image to obtain a trained neural network model. After the image design of the virtual character is completed to obtain the image, the design terminal 100 evaluates the image to be evaluated by using the trained neural network model in response to the evaluation instruction to obtain and output the aesthetic evaluation score. And the image designer determines whether to optimize the image or confirm the design draft according to the aesthetic evaluation score.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an evaluation server 300 according to an embodiment of the present application, where the evaluation server 300 shown in fig. 3 includes: at least one processor 310, memory 350, at least one network interface 320, and a user interface 330. The various components in profile server 300 are coupled together by a bus system 340. It will be appreciated that the bus system 340 is used to enable communications among the components connected. The bus system 340 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 340 in fig. 3.
The Processor 310 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 330 includes one or more output devices 331, including one or more speakers and/or one or more visual display screens, that enable presentation of media content. The user interface 330 also includes one or more input devices 332, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 350 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 350 optionally includes one or more storage devices physically located remote from processor 310. The memory 350 may include either volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 350 described in embodiments herein is intended to comprise any suitable type of memory. In some embodiments, memory 350 is capable of storing data, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below, to support various operations.
An operating system 351 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 352 for communicating to other computing devices via one or more (wired or wireless) network interfaces 320, exemplary network interfaces 320 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
an input processing module 353 for detecting one or more user inputs or interactions from one of the one or more input devices 332 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided by the embodiments of the present application may be implemented in software, and fig. 3 illustrates an avatar evaluation apparatus 354 stored in the memory 350, where the avatar evaluation apparatus 354 may be an avatar evaluation apparatus in the evaluation server 300, which may be software in the form of programs and plug-ins, and includes the following software modules: the first obtaining module 3541, the predicting module 3542, and the first outputting module 3543, which are logical and thus may be arbitrarily combined or further separated depending on the functions implemented. The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to perform the method for evaluating the avatar provided in the embodiments of the present application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The following describes an evaluation method of an avatar provided by the embodiment of the present application, with reference to an exemplary application and implementation of the evaluation server 300 provided by the embodiment of the present application.
In order to better understand the method provided by the embodiment of the present application, artificial intelligence, each branch of artificial intelligence, and the application field related to the method provided by the embodiment of the present application are explained first.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The scheme provided by the embodiment of the application mainly relates to artificial intelligence natural language processing and machine learning technologies, and the two technologies are respectively explained below.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
Referring to fig. 4, fig. 4 is a schematic flow chart of the method for evaluating an avatar according to the embodiment of the present application, and the method is applied to an evaluation device of the avatar, where the evaluation device may be a server or a design terminal. In the embodiment of the application, an image evaluating device is taken as an example, and the method for evaluating the virtual image is explained by combining the steps shown in fig. 4.
Step S101, at least one image to be evaluated is obtained.
Here, the at least one character image corresponds to the same avatar, and the at least one character image corresponds to the same avatar of the same virtual object. The virtual object can be a virtual character in a game, and can also be a virtual character in an animation video or a cartoon; the avatar image may include the skin, hairstyle, apparel, props, etc. of the virtual object. When two or more character images are acquired, the postures of the virtual objects in different character images are different.
After the character designer designs the character of the virtual object through the design terminal, the character data of the virtual object, which may be a three-dimensional character model of the virtual character, may be transmitted to the server. Step S101, when the method is realized, firstly, image design data of the virtual image is obtained; and then based on the image design data, at least one image of the virtual image in different postures is obtained.
Here, the avatar design data of the avatar, that is, the three-dimensional avatar model of the avatar, "acquiring at least one avatar image when the avatar is in different postures based on the avatar design data" may be implemented to change a rotation angle parameter of the three-dimensional avatar model to acquire two-dimensional avatar images of different postures.
For example, the default initial rotation angle of the three-dimensional image model is 0 degree, the face is opposite to the outside, an image of the current three-dimensional image model is captured, then the rotation angles of the three-dimensional image model are respectively set to 60 degrees, 120 degrees, 240 degrees and 300 degrees, and image images of different angles of the three-dimensional image model are sequentially obtained.
Step S102, inputting the at least one image into the trained neural network model to obtain the image evaluation score of the virtual image.
In the embodiment of the present application, the training data of the neural network model during training at least includes a training image of an avatar and a distorted image obtained by distorting the training image. One training image can correspond to one distorted image and can also correspond to a plurality of distorted images, so that the training image set can be multiplied, and the diversity of the training image set can be increased.
When the step S102 is implemented, at least one image may be input into the trained neural network model, the predicted image score of each image is obtained correspondingly, and then the predicted image score of each image is subjected to average calculation processing to obtain the image evaluation score of the virtual image, so as to improve the stability of the image evaluation result.
And step S103, outputting the image evaluation score of the virtual image.
Here, when the step is implemented by the server, the image evaluation score of the avatar may be output by the server to transmit the image evaluation score of the avatar to the design terminal to display the image evaluation score of the avatar in a display device of the design terminal, or output by a voice output device of the design terminal to enable an image designer to obtain an image evaluation result in time.
In the method for evaluating an avatar provided in the embodiment of the present application, after at least one avatar image for the same avatar to be evaluated is acquired, inputting the at least one image into the trained neural network model to obtain an image evaluation score of the virtual image, and outputting the image evaluation score of the virtual image, in the embodiment of the application, the training data of the neural network model during training at least comprises a training image of an avatar and a distorted image obtained by distorting the training image, and performing image evaluation based on at least one image, so as to multiply expand the training set image and the test set image of the neural network model, therefore, the overfitting trend of the network model can be inhibited, and the accuracy of the aesthetic measure evaluation is improved by expanding the diversity of the training set images.
In some embodiments, a trained neural network model needs to be obtained before step S101. In an actual implementation process, the training process of the neural network can be implemented by the following steps:
and S001, acquiring a plurality of training images of the virtual image and corresponding image scores of the training images.
Here, when the avatar is an avatar of an avatar in the network game, and one avatar may have a variety of avatars, different avatars may be different in skin, different in clothing, and even different in props. The plurality of training images of the avatar are acquired from the game server, the plurality of training images of the plurality of avatars are acquired in step S001, and each avatar has a corresponding plurality of training images. For example, the virtual objects are respectively a piece of aluminium cloth, a little joker and a mink cicada, the aluminium cloth corresponds to an image A1, an image A2 and an image A3, the little joker corresponds to an image B1 and an image B2, the mink cicada corresponds to an image C1 and an image C2, and a plurality of image images of the image A1, the image A2 and the image A3 of the aluminium cloth are respectively obtained from the game server; a plurality of character images of a character B1 and a character B2 of a little joe; a character C1 of mink cicada and a plurality of character images of character C2.
In step S001, it is further required to obtain avatar scores corresponding to a plurality of training images, where the avatar scores may be determined according to the selling price, selling amount and labeled scores of the avatar corresponding to the training images.
In step S002, the plurality of training images are subjected to distortion processing, respectively, to obtain a plurality of distorted images corresponding to the plurality of training images.
When the step S002 is implemented, the training images may be respectively subjected to one or more times of distortion processing to obtain one or more distorted images, that is, one training image may correspond to one or more distorted images. For example, four times of distortion processing may be performed on each training image, i.e. there are four distorted images for each training image, so that the training image set may be increased to five times of the original training image.
And S003, training a preset neural network model based on the plurality of training images, the image score and the plurality of distorted images to obtain the trained neural network model.
Here, the neural network model may be a twin network model, that is, the neural network model includes two sub-network models, and the two sub-network models share the same model parameters, and during training, an original training image is input to one of the sub-network models, a distorted image is input to the other sub-network model, and parameters of the neural network model are adjusted to obtain a trained neural network model by using a constraint condition that an image score of the original training image is higher than an image score of the distorted image.
In the embodiment of the foregoing steps S001 to S003, after the training image of the original complete avatar is acquired, the corresponding distorted image is obtained by performing distortion processing on the training image, and the distorted image is also used as training data of the neural network, so that not only the number of the training image sets can be multiplied, but also the distorted image is different from the training image, so that diversity of the training image sets can be increased, and since the distorted image is obtained by performing distortion processing on the training image, an evaluation score of the distorted image should be lower than an evaluation score of the training image theoretically, a contrast loss function can be determined based on the constraint condition, and the neural network model is trained, so that accuracy of evaluation of the trained neural network model in time sharing of the avatar can be improved.
In the embodiment of the present application, the avatar score of each training image may be determined by the following steps S0011 to S0013, which are described below in conjunction with each step.
And S0011, obtaining sales information corresponding to each training image.
In the embodiment of the present application, the sales information corresponding to each training image may be obtained from the service server, and the sales information at least includes a sales price and a sales volume.
And the sales information corresponding to each training image is also the sales information of the virtual image corresponding to each training image. When a plurality of different training images correspond to the same avatar, the sales information corresponding to the plurality of different training images is the same. For example, the training images P1, P2, P3, and P4 all correspond to the image of a pubic book a1, and the sales information corresponding to the training images P1, P2, P3, and P4 are the same and are the sales information of the image of a pubic book a1, which may be a sales price of 50 dollars and a sales volume of 3000.
And S0012, acquiring the mark score of each training image.
Here, the annotation score of each training image may be artificially annotated, the annotation scores of training images at different angles of the same avatar are the same, and the annotation scores of training images corresponding to different virtual objects may be the same or different. In actual implementation, a value range of the annotation score may be preset, for example, may be 0 to 100, where the higher the annotation score is, the more beautiful the avatar is.
And S0013, carrying out weighted average processing on the sale price, the sale quantity and the mark score of each training image to obtain the image score of each training image.
Here, when step S0013 is implemented, the selling price, the selling amount, and the labeling score of each training image may be normalized by the maximum and minimum values, and then the selling price, the selling amount, and the labeling score after the normalization processing may be weighted and averaged, so as to obtain the image score of each training image. The weights corresponding to the selling price, the sales volume and the mark score may be preset, and the sum of the three weights is 1, for example, the weight of the selling price may be 0.2, the weight of the sales volume may be 0.3, and the weight of the mark score may be 0.5.
Since the selling price, sales volume and mark score of different training images of the same avatar are the same, the avatar score of different training images of the same avatar is the same.
In the embodiment of the foregoing steps S0011 to S0013, the selling price, the sales volume, and the label score corresponding to the training image are comprehensively processed, so as to determine the image score of each training image, where the image score can reflect the beauty of the virtual image, thereby ensuring the accuracy of the image score.
In some embodiments, the above step S002 "of performing distortion processing on each of the plurality of training images to obtain a plurality of distorted images corresponding to the plurality of training images" can be implemented by:
step S0021, at least one candidate region is determined from each training image.
Here, in implementation, in step S0021, at least one candidate region may be randomly determined from each training image, or one or more specific regions in the training images may be determined as candidate regions.
Step S0022, determining a target region from the at least one candidate region.
Here, when implemented, step S0022 may be to determine one candidate region of the at least one candidate region as the target region, or may be to determine two or more candidate regions of the at least one candidate region as the target regions.
And step S0023, setting the pixel value of the pixel point in the target area as a random pixel value, and obtaining a distortion image corresponding to each training image.
Here, setting the pixel values of the pixel points in the target region to random pixel values can hide the image information of the target region, that is, the image information in the distorted image is less than the image information of the training image.
In some embodiments, the pixel value of the pixel point in the target region may also be set to a preset pixel value, for example, the preset pixel value is 0, so that the local details of the target region can also be hidden.
Step S0021 step S0023 may be implemented to generate such a distorted image with local loss of detail using a Random erasure enhancement algorithm (REA). During characterization, local details are lost in the distorted image, so that the distorted image is used as training data to train the neural network model, the risk of overfitting can be reduced, and the robustness of the model can be improved.
In some embodiments, the neural network model may include a first sub-network model and a second sub-network model, and correspondingly, the step S003 "training a preset neural network model based on the training images, the image score and the distorted images to obtain a trained neural network model" may be implemented by:
and step S0031, sequentially inputting each training image into the first sub-network model to obtain a first prediction score of each training image.
In an embodiment of the application, the first sub-network model and the second sub-network model share model parameters, i.e. the model parameters of the first sub-network model are the same as the model parameters of the second sub-network model.
And S0032, sequentially inputting at least one distorted image corresponding to each training image into the second sub-network model to obtain a second prediction score of each distorted image.
Here, when the training image is subjected to the distortion processing, the distortion processing may be performed a plurality of times to obtain a plurality of distorted images, and therefore one training image may correspond to a plurality of distorted images. In step S0032, at least one distorted image corresponding to each training image input in step S0031 may be input to the second sub-network model, and in actual implementation, one training image corresponding to all distorted images is generally input to the second sub-network model, so as to obtain a second prediction score of each distorted image.
And S0033, performing back propagation training on the neural network model by using the first prediction score, the second prediction score and the image score so as to adjust parameters of the neural network model to obtain the trained neural network model.
In practical application, step S0033 can be implemented in the following two ways:
the first implementation mode comprises the following steps:
since the figure score of the training image is higher than that of the distorted image, the first implementation may be to train the neural network model by pair-wise ordering of the training image and the distorted image according to the constraint condition, as shown in fig. 5A, and the first implementation of step S0033 may be implemented by the following steps:
in step S331A, a first score difference between a training image and a distorted image corresponding to the training image is determined based on the first prediction score and the second prediction score.
Because the distorted image is obtained by performing online distortion processing on the training image, one training image usually corresponds to a plurality of distorted images, and the labor cost is greatly increased by labeling a large number of distorted images, the distorted image is not labeled with information in the embodiment of the application. However, since the distorted image is missing local details, the neural network model can be trained by using the constraint condition that the prediction score of the distorted image is smaller than that of the training image.
When implemented, step S331A may be to subtract the second prediction score from the first prediction score to obtain a first score difference.
In step S332A, a second score difference is determined based on the first prediction score and the character score.
Here, when implemented, step S332A may be to subtract the image score from the first predicted score to obtain a second score difference.
Step S333A, back-propagating the first score difference and the second score difference to the neural network model, and performing joint training on the neural network model by using the first loss function and the second loss function to adjust parameters of the neural network model, so as to obtain a trained neural network model.
Here, the first loss function is a contrast loss function for constraining the first prediction score to be higher than the second prediction score, and the second loss function is a regression loss function for constraining the second score difference to be smaller than the difference threshold, that is, when the training image P1 and the distorted image S1 corresponding to the training image P1 are predicted by using the trained neural network model, the first prediction score of the training image P1 is higher than the second prediction score of the distorted image S1.
The second implementation mode comprises the following steps:
for the same avatar, if the avatar score of a training image (first target training image) is higher than that of another training image (second target training image), then the predicted score of the first distorted image corresponding to the first target training image should be higher than that of the second distorted image corresponding to the second target training image, so in the second implementation, based on this constraint condition and another constraint condition that the avatar score of the training image is higher than that of the distorted image, the neural network model can be trained by pairwise ordering of the training image and the distorted image, as shown in fig. 5B, and the second implementation of step S0033 can be implemented by the following steps:
in step S331B, a first target training image and a second target training image are determined.
Here, the first target training image has a higher avatar score than the second target training image;
step S332B, acquiring a first prediction score of the first target training image and a second prediction score of the first distorted image corresponding to the first target training image, and acquiring a first prediction score of the second target training image and a second prediction score of the distorted image corresponding to the second target training image;
step S333B, performing joint training on the neural network model by using the first prediction score of the first target training image, the image score of the first target training image, the second prediction score of the first distorted image, the first prediction score of the second target training image, the image score of the second target training image, the second prediction score of the second distorted image, and the first loss function and the second loss function, so as to adjust parameters of the neural network model, thereby obtaining a trained neural network model.
Here, when implemented, step S333B may be determining a third score difference between the second predicted score of the first distorted image and the second predicted score of the second distorted image, and determining a fourth score difference between the first predicted score of the first target training image and the second predicted score of the first distorted image, a fifth score difference between the first predicted score of the second target training image and the second predicted score of the second distorted image, a sixth score difference between the first predicted score of the first target training image and the image score of the first target training image, and a seventh score difference between the first predicted score of the second target training image and the image score of the second target training image; and then, the third score difference value, the fourth score difference value, the fifth score difference value, the sixth score difference value and the seventh score difference value are reversely propagated to the neural network model, and the first loss function and the second loss function are utilized to carry out combined training on the neural network model so as to adjust the parameters of the neural network model and obtain the trained neural network model.
The first loss function is a contrast loss function for constraining the second prediction score of the first distorted image to be higher than the second prediction score of the second distorted image, and the first prediction score of the first target training image is higher than the second prediction score of the first distorted image, and the first prediction score of the second target training image is higher than the second prediction score of the second distorted image. The second loss function is a regression loss function for constraining a difference between the first pre-measured value of the first target training image and the avatar score of the first target training image to be less than a difference threshold, and a difference between the first pre-measured value of the second target training image and the avatar score of the second target training image to be less than a difference threshold.
In the second implementation, not only the image beauty relationship between the original training image and the distorted image is considered, but also the beauty relationship between the distorted images is considered. If the image score of the first target training image is higher than that of the second target training image, under the condition of carrying out the same distortion treatment, namely that the distortion images lose local regions at the same positions, the requirement that the image aesthetic measure of the first distortion image is higher than that of the second distortion image is met, at the moment, the contrast loss function is expanded into the sequencing relation between the original image and the distortion images and the relation between the distortion images and the distortion images, and therefore the prediction accuracy of the trained neural network model is further improved.
Based on the foregoing embodiments, an embodiment of the present application further provides an avatar evaluation method, which is applied to the network architecture shown in fig. 2, and fig. 6 is another schematic implementation flow diagram of the avatar evaluation method provided in the embodiment of the present application, as shown in fig. 6, where the implementation flow includes:
step S601, the design terminal carries out virtual image design based on the received image design operation.
Step S602, the design terminal responds to the received image evaluation operation and sends the image design data of the virtual image to be evaluated to the evaluation server.
Here, after the image designer completes the design of the virtual image through the design terminal, before the virtual image is officially released to the online, the aesthetic measure of the virtual image needs to be evaluated, so that the image designer can trigger the operation of image evaluation on the virtual image through the design terminal at the moment, and after receiving the image evaluation operation, the design terminal responds to the image evaluation operation and sends the image design data of the virtual image to be evaluated to the evaluation server.
Step S603, the evaluation server obtains at least one image of the virtual image in different postures based on the received image design data.
Here, the avatar design data is generally a three-dimensional avatar model, and the step S603 may intercept a plurality of avatar images when the avatar is in different postures by changing a rotation angle parameter of the three-dimensional avatar model when implemented. That is, the at least one character image corresponds to the same avatar.
Step S604, the evaluation server inputs the at least one image to the trained neural network model to obtain the evaluation score of each image.
Here, the evaluation score of each avatar image can reflect the avatar beauty of the avatar. When the neural network model is trained, the training image set comprises an original training image and a distorted image obtained by distorting the training image, and the distorted image is compared with the training image and is real and local detail, so that the neural network model can be trained by taking the image aesthetic measure of the distorted image lower than that of the training image as an additional optimization target, and the prediction accuracy of the trained neural network model is improved.
Step S605, the evaluation server determines the image evaluation score of the virtual image based on the evaluation score of each image.
Here, in the step S605, the evaluation score of each character image of the avatar may be averaged to obtain the evaluation score of the avatar.
Step S606, the evaluation server sends the image evaluation score of the virtual image to the design terminal.
Step S607, the design terminal determines whether the image evaluation score is lower than a preset score threshold.
Here, when the figure evaluation score is lower than the score threshold, step S608 is entered to prompt that the figure optimization is required; when the evaluation score is higher than or equal to the score threshold, the process proceeds to step S610, and prompts whether to confirm finalization.
Step S608, the design terminal outputs the first prompt information.
Here, the first prompt message is used to prompt that the avatar needs to be optimized.
And step S609, the design terminal updates the image design data based on the received image optimization operation to obtain the updated image design data.
Here, after step S609, step S602 is entered again to perform visual evaluation on the updated visual design data.
And step S610, the design terminal outputs second prompt information.
Here, the second prompt information is used to prompt the avatar designer to perform secondary confirmation of the avatar design data to confirm whether the avatar design of the avatar is completed. The secondary confirmation is mainly a manual confirmation of the detail optimization part of the virtual image.
In some embodiments, the evaluation server may further determine whether the image evaluation score is lower than a preset score threshold, and when the image evaluation score is lower than the score threshold, the evaluation server sends a first prompt message to the design terminal, where the first prompt message is used to prompt that the virtual image needs to be optimized, and the first prompt message may also carry the image evaluation score of the virtual image; when the image evaluation score is higher than or equal to the score threshold, the evaluation server sends second prompt information to the design terminal, the second prompt information is used for prompting secondary confirmation of the image design data, and the second prompt information can also carry the image evaluation score of the virtual image.
Step S611, the design terminal saves the image design data of the avatar based on the received design confirmation operation.
And step S612, the design terminal sends the image design data and the image evaluation score to the service server based on the received data uploading operation.
Step S613, the service server determines an avatar recommendation sequence based on the received avatar design data and the avatar evaluation score.
And step S614, the service server sends the recommended virtual image to the user terminal based on the virtual image recommendation sequence.
In the evaluation method of the virtual image provided by the embodiment of the application, after a design terminal finishes the design of the virtual image based on the operation of a designer, the received image evaluation operation is responded, the image design data is sent to an evaluation server, the evaluation server intercepts at least one image to be evaluated based on the image design data, the at least one image is subjected to prediction processing through a trained neural network model to obtain an evaluation score corresponding to each image, and then the evaluation scores of each image are subjected to average calculation processing to obtain a comprehensive image evaluation score of the virtual image, so that the stability of image evaluation can be improved; the evaluation server determines an image evaluation score and then sends the image evaluation score to the design terminal, so that the design terminal determines whether the virtual image needs to be optimized, when the virtual image does not need to be optimized, the design terminal can upload the virtual image to the service server so as to publish the virtual image on line, and can simultaneously upload the image evaluation score of the virtual image, so that the service server can determine an image recommendation sequence according to the image evaluation score of the virtual image, and accurate recommendation is realized.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
In the embodiment of the present application, a virtual character is described as an example of a game character.
In the method for evaluating an avatar provided in the embodiment of the present application, an avatar image of a virtual character to be evaluated is input into a trained neural network model (such as a deep convolutional neural network), so as to obtain an aesthetic score of the avatar image of the virtual character.
The trained neural network model may be an aesthetic evaluation model based on image ranking. In the embodiment of the present application, the image evaluation may refer to skin evaluation of a virtual character, and the image evaluation method of the virtual character may include several steps of skin beauty degree definition, skin image collection, distorted skin generation, beauty degree sorting, and beauty degree regression when implemented, and each step is explained below.
1, skin aesthetics definition: skin aesthetics is a subjective variable in that it has a large relationship to skin price a, skin sales b, and annotator score c. To facilitate control of the size range of the aesthetic score, in the embodiment of the present application, a maximum and minimum normalization method may be used to normalize the skin price a, the skin sales b, and the annotator score c to a, b, c ∈ [0,1 ]. In practice, the aesthetic score may be defined by a weighted average as shown in equation (2-1):
s=αaa+αbb+αcc (2-1);
wherein, three weights αabcSatisfies alphaabc1 and αabcIs more than or equal to 0. According to the formula (2-1), the higher the price of the skin, the better the skin sales and the higher the designer's score, the higher the skin beauty score.
2, collecting skin images: two main parts are involved, namely training skin image collection and testing skin image collection. In order to enrich the diversity of skin images of virtual characters (which may be heroes in a game), multiple images of different poses of each virtual character may be collected in embodiments of the present application. Fig. 7 is a schematic diagram of skin images collected according to an embodiment of the present application, and as shown in fig. 7, 701, 702, 703, 704, 705, 706, 707, and 708, 8 different whole body posture images are collected for the virtual character luban seven.
After the skin images of each virtual character in different postures are collected, the skin images of the virtual characters need to be labeled with the beauty scores. In the process of marking the beauty degree scores, the marker marks the comprehensive beauty degree of the skin images, so that the skin images can obtain the same beauty degree scores.
The distorted skin generation means that the original skin image is subjected to artificial distortion operation to generate the distorted skin, and the aesthetic score of the skin image is reduced. As can be seen from the 8 skin images shown in fig. 7, the skin beauty is mainly represented by local details of the skin, such as luban seven hat, pistol, rocket tube, clothing, shoes, and the like. If local details of the skin are manually removed, the aesthetic appearance of the distorted skin should be less than the aesthetic appearance of the original skin. To this end, in implementation, a Random erasure enhancement algorithm (REA) may be employed to generate such a distorted skin image with a loss of local detail. During training, REA randomly selects a region in the original skin image, and replaces the pixel value of the region with a random value. A generated distorted skin image. Fig. 8 is a schematic diagram of a distorted skin image generated by an embodiment of the present application, and 801, 802, 803 and 804 shown in fig. 8 are images obtained by randomly erasing 701 of fig. 7, wherein 801 is that the roban seven pistol is erased, 802 is that the background area is erased, 803 is that the face and clothes of roban seven are erased, and 804 is that the background area is erased.
As can be seen from fig. 8, REA performs different degrees of occlusion on the training skin image, which can reduce the risk of overfitting and improve the robustness of the model. REA does not have any learning parameters and therefore can be integrated into any convolutional neural network-based recognition model.
4, aesthetic degree ranking, which refers to the aesthetic degree ranking of the original skin and the distorted skin. In implementation, a twin Network (Siamese Network) can be utilized to learn the aesthetic ranking relationship between the original skin and the distorted skin. The twin network has two identical branches, i.e. two sub-networks, whose model parameters are shared. Inputting a skin image x into a sub-network, and outputting the beauty degree f (x; theta) of the skin image from the last layer of the sub-network, wherein f (x; theta) belongs to [0,1]]. In general, the original skin x is more distorted than the skin
Figure BDA0002712124950000241
Has more aesthetic degree details, so the relationship of the aesthetic degree of the two satisfies
Figure BDA0002712124950000242
Since the distorted image is randomly generated and lacks the label information, in the embodiment of the present application, the aesthetic ranking relationship is constrained by the pairwise ordered contrast Loss (contrast Loss) shown in formula (2-2):
Figure BDA0002712124950000243
wherein m represents a distance interval, and m > 0.
In some embodiments, aesthetic relationships between distorted images may also be considered in pair-wise ordering. Suppose a skin image x1Is more aesthetically pleasing than the skin image x2In the case of local area loss for the same REA, the distorted image should be satisfied
Figure BDA0002712124950000244
More aesthetic than distorted images
Figure BDA0002712124950000245
Based on the above ordering relationship, the loss can be reducedFunction(s)
Figure BDA0002712124950000246
And expanding the sequence relation of the original image and the distorted image and the relation of the distorted image and the distorted image.
And 5, the beauty regression means that the beauty of the original skin image with the label is regressed. Suppose the original image x has an aesthetic label of y, y ∈ [0,1]]When implemented, L shown in formula (2-3) can be adopted2Norm regression to skin beauty score:
Figure BDA0002712124950000251
finally, in the training phase, the network parameters are learned by combining two loss functions by equations (2-4):
Figure BDA0002712124950000252
wherein λ isrAnd λcAre two lost weights.
In the testing stage, 8 posture images of the same skin are utilized according to the formula (2-5)
Figure BDA0002712124950000253
To find the average aesthetic score:
Figure BDA0002712124950000254
here, in the embodiment of the present application, the average beauty score is used as the skin beauty of the final output, which can greatly improve the robustness of the beauty evaluation to the skin posture change.
Fig. 9 is a schematic diagram of a training process of a neural network model provided in an embodiment of the present application, and as shown in fig. 9, first, an acquired original skin image 901 is randomly erased to obtain a distorted skin image 902, then, the original skin image 901 and the distorted skin image 902 are input into the neural network model, and the neural network model is jointly trained according to a contrast loss function and a regression loss function, so as to obtain a trained neural network model.
Skin beauty assessment based on pairwise ranking is a key technology for avatar design and recommendation. The designer can evaluate the designed skin beauty score by this method. If the aesthetic score is found to be insufficient, outputting a first prompt message to remind a skin designer to beautify the skin and improve the quality of the skin product. For hero skin personalized recommendation, relevant hero skin can be recommended to a player according to the skin aesthetic score.
Based on the image design method of the virtual image provided by the embodiment of the application, a skin beauty evaluation system can be constructed, so that the design work is assisted. Fig. 10 is a schematic diagram of an implementation process of skin evaluation by using a skin beauty evaluation system according to an embodiment of the present application, and as shown in fig. 10, the implementation process includes:
step S1001, obtaining an art hero skin design drawing.
And step S1002, inputting the art hero skin design drawing into a skin beauty evaluation system, and outputting the beauty y of the hero skin.
Fig. 11 is an interface schematic diagram for evaluation by using a skin beauty evaluation system according to an embodiment of the present application, and as shown in fig. 11, a skin image 1101 to be evaluated is input to a skin beauty evaluation system 1102, and an evaluation score 1103 of the skin image is output.
In step S1003, if the aesthetic score is lower than 85, the design drawing needs to be re-optimized until the aesthetic score reaches 85 or more.
And step S1004, if the skin beauty score is larger than 85, performing manual secondary confirmation.
The evaluation method of the avatar provided in the embodiment of the present application evaluates the skin beauty based on the pair-wise ordering to multiply expand the training set and the test set images by a plurality of (8 in the embodiment of the present application as an example) poses of the avatar, thus greatly inhibiting the network overfitting trend. Meanwhile, the aesthetic degree sequencing relation is learned by adopting pairwise contrast loss in consideration of the aesthetic degree relation between the original skin and the distorted skin. The method can not only expand the diversity of the training set images, but also improve the accuracy of the aesthetic measure evaluation. In testing, the average aesthetic score is used to improve the stability and accuracy of the aesthetic prediction.
Continuing with the exemplary structure of the avatar evaluation device 354 implemented as a software module provided in the embodiment of the present application, in some embodiments, as shown in fig. 3, the software module stored in the avatar evaluation device 354 of the memory 350 may be an avatar evaluation device in the evaluation server 300, including:
the first acquisition module is used for acquiring at least one image to be evaluated, and the at least one image corresponds to the same virtual image;
the prediction module is used for inputting the at least one image into the trained neural network model to obtain the image evaluation score of the virtual image;
training data of the neural network model during training at least comprise a training image of an avatar and a distorted image obtained by performing distortion processing on the training image;
and the first output module is used for outputting the image evaluation score of the virtual image.
In some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring a plurality of training images of the virtual image and image scores corresponding to the training images;
the distortion processing module is used for respectively carrying out distortion processing on the training images to obtain a plurality of distorted images corresponding to the training images;
and the model training module is used for training a preset neural network model based on the training images, the image score and the distortion images to obtain the trained neural network model.
In some embodiments, the apparatus further comprises:
the third acquisition module is used for acquiring sales information corresponding to each training image, and the sales information at least comprises sales price and sales volume;
the fourth acquisition module is used for acquiring the mark scores of the training images;
and the weighting processing module is used for carrying out weighted average processing on the sale price, the sale quantity and the mark score of each training image to obtain the image score of each training image.
In some embodiments, the distortion handling module is further configured to:
determining at least one candidate region from each training image;
determining a target region from the at least one candidate region;
and setting the pixel value of the pixel point in the target area as a preset pixel value to obtain a distortion image corresponding to each training image.
In some embodiments, the neural network model includes a first sub-network model and a second sub-network model, and the model training module is further configured to:
sequentially inputting each training image into the first sub-network model to obtain a first prediction score of each training image;
sequentially inputting at least one distorted image corresponding to each training image into the second sub-network model to obtain a second prediction score of each distorted image;
and performing back propagation training on the neural network model by using the first prediction score, the second prediction score and the image score so as to adjust parameters of the neural network model and obtain the trained neural network model.
In some embodiments, the model training module is further configured to:
determining a first score difference value between a training image and a distorted image corresponding to the training image based on the first prediction score and the second prediction score;
determining a second score difference based on the first prediction score and the avatar score;
back propagating the first score difference and the second score difference to the neural network model,
performing joint training on the neural network model by using the first loss function and the second loss function so as to adjust parameters of the neural network model to obtain a trained neural network model;
the first loss function is used for restricting the first prediction score to be higher than the second prediction score, and the second loss function is used for restricting the difference value of the second score to be smaller than the difference threshold value.
In some embodiments, the model training module is further configured to:
determining a first target training image and a second target training image, wherein the image score of the first target training image is higher than that of the second target training image;
acquiring a first prediction score of the first target training image and a second prediction score of a first distorted image corresponding to the first target training image, and acquiring a first prediction score of the second target training image and a second prediction score of a distorted image corresponding to the second target training image;
jointly training the neural network model by utilizing the first prediction score of the first target training image, the image score of the first target training image, the second prediction score of the first distorted image, the first prediction score of the second target training image, the image score of the second target training image, the second prediction score of the second distorted image, the first loss function and the second loss function so as to adjust the parameters of the neural network model and obtain a trained neural network model;
the first loss function is used for constraining the second prediction score of the first distorted image to be higher than the second prediction score of the second distorted image, the first prediction score of the first target training image is higher than the second prediction score of the first distorted image, and the first prediction score of the second target training image is higher than the second prediction score of the second distorted image.
In some embodiments, the first obtaining module is further configured to:
acquiring image design data of the virtual image;
based on the avatar design data, at least one avatar image is obtained when the avatar is in a different pose.
In some embodiments, the prediction module is further to:
inputting the at least one image into a trained neural network model to obtain the evaluation score of each image;
and determining the image evaluation score of the virtual image based on the evaluation score of each image.
In some embodiments, the apparatus further comprises:
a second output module for outputting a first prompt message when the image evaluation score of the avatar is lower than a preset score threshold,
the first prompt message is used for prompting that the image of the virtual image needs to be optimized.
It should be noted that the description of the apparatus in the embodiment of the present application is similar to the description of the method embodiment, and has similar beneficial effects to the method embodiment, and therefore, the description is not repeated. For technical details not disclosed in the embodiments of the apparatus, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a storage medium having stored therein executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method as illustrated in fig. 4.
In some embodiments, the storage medium may be a computer-readable storage medium, such as a Ferroelectric Random Access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), a charged Erasable Programmable Read Only Memory (EEPROM), a flash Memory, a magnetic surface Memory, an optical disc, or a Compact disc Read Only Memory (CD-ROM), and the like; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (13)

1. An avatar evaluation method, comprising:
obtaining at least one image to be evaluated, wherein the at least one image corresponds to the same virtual image;
inputting the at least one image into a trained neural network model to obtain an image evaluation score of the virtual image;
training data of the neural network model during training at least comprise training images of an avatar and distorted images obtained by performing distortion processing on the training images;
and outputting the image evaluation score of the virtual image.
2. The method of claim 1, further comprising:
acquiring a plurality of training images of the virtual image and image scores corresponding to the training images;
respectively carrying out distortion processing on the training images to obtain a plurality of distorted images corresponding to the training images;
and training a preset neural network model based on the training images, the image scores and the distortion images to obtain the trained neural network model.
3. The method of claim 2, further comprising:
obtaining sales information corresponding to each training image, wherein the sales information at least comprises a sales price and a sales volume;
acquiring the labeling score of each training image;
and carrying out weighted average processing on the sale price, the sale quantity and the mark score of each training image to obtain the image score of each training image.
4. The method according to claim 2, wherein the performing distortion processing on each of the plurality of training images to obtain a plurality of distorted images corresponding to the plurality of training images comprises:
determining at least one candidate region from each training image;
determining a target region from the at least one candidate region;
setting the pixel values of the pixel points in the target area as preset pixel values to obtain distortion images corresponding to the training images.
5. The method of claim 2, wherein the neural network model comprises a first sub-network model and a second sub-network model, and wherein, correspondingly,
training a preset neural network model based on the training images, the image score and the distortion images to obtain a trained neural network model, comprising:
sequentially inputting each training image into the first sub-network model to obtain a first prediction score of each training image;
sequentially inputting at least one distorted image corresponding to each training image into the second sub-network model to obtain a second prediction score of each distorted image;
and carrying out back propagation training on the neural network model by utilizing the first prediction score, the second prediction score and the image score so as to adjust the parameters of the neural network model and obtain the trained neural network model.
6. The method of claim 5, wherein the back propagation training of the neural network model using the first predictive score, the second predictive score, and the image score to adjust parameters of the neural network model to obtain a trained neural network model comprises:
determining a first score difference value between a training image and a distorted image corresponding to the training image based on the first prediction score and the second prediction score;
determining a second score difference based on the first prediction score and the avatar score;
back-propagating the first score difference and the second score difference to the neural network model,
performing joint training on the neural network model by using a first loss function and a second loss function so as to adjust parameters of the neural network model to obtain a trained neural network model;
the first loss function is used for restricting the first prediction score to be higher than the second prediction score, and the second loss function is used for restricting the difference value of the second score to be smaller than the difference threshold value.
7. The method of claim 5, wherein the back propagation training of the neural network model using the first predictive score, the second predictive score, and the image score to adjust parameters of the neural network model to obtain a trained neural network model comprises:
determining a first target training image and a second target training image, the first target training image having a higher avatar score than the second target training image;
acquiring a first prediction score of the first target training image and a second prediction score of a first distorted image corresponding to the first target training image, and acquiring a first prediction score of the second target training image and a second prediction score of a distorted image corresponding to the second target training image;
jointly training the neural network model by using the first prediction score of the first target training image, the image score of the first target training image, the second prediction score of the first distorted image, the first prediction score of the second target training image, the image score of the second target training image, the second prediction score of the second distorted image, the first loss function and the second loss function so as to adjust the parameters of the neural network model and obtain a trained neural network model;
the first loss function is used to constrain the second prediction score of the first distorted image to be higher than the second prediction score of the second distorted image, and the first prediction score of the first target training image is higher than the second prediction score of the first distorted image, and the first prediction score of the second target training image is higher than the second prediction score of the second distorted image.
8. The method according to claim 1, wherein said obtaining at least one visual image to be evaluated comprises:
acquiring image design data of the virtual image;
and acquiring at least one image of the virtual image in different postures based on the image design data.
9. The method of claim 1, wherein inputting at least one avatar image into the trained neural network model to obtain an avatar evaluation score for the avatar comprises:
inputting the at least one image into a trained neural network model to obtain an evaluation score of each image;
and determining the image evaluation score of the virtual image based on the evaluation score of each image.
10. The method according to any one of claims 1 to 9, further comprising:
when the image evaluation score of the virtual image is determined to be lower than a preset score threshold value, outputting first prompt information,
the first prompt message is used for prompting that the virtual image needs to be optimized.
11. An avatar evaluation apparatus, comprising:
the first acquisition module is used for acquiring at least one image to be evaluated, and the at least one image corresponds to the same virtual image;
the prediction module is used for inputting the at least one image into a trained neural network model to obtain an image evaluation score of the virtual image;
training data of the neural network model during training at least comprise training images of an avatar and distorted images obtained by performing distortion processing on the training images;
and the first output module is used for outputting the image evaluation score of the virtual image.
12. An avatar evaluation apparatus, comprising:
a memory for storing executable instructions; a processor for implementing the method of any one of claims 1 to 10 when executing executable instructions stored in the memory.
13. A computer-readable storage medium having stored thereon executable instructions for causing a processor, when executing, to implement the method of any one of claims 1 to 10.
CN202011060263.3A 2020-09-30 2020-09-30 Method, device, equipment and computer readable storage medium for evaluating virtual image Active CN112116589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011060263.3A CN112116589B (en) 2020-09-30 2020-09-30 Method, device, equipment and computer readable storage medium for evaluating virtual image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011060263.3A CN112116589B (en) 2020-09-30 2020-09-30 Method, device, equipment and computer readable storage medium for evaluating virtual image

Publications (2)

Publication Number Publication Date
CN112116589A true CN112116589A (en) 2020-12-22
CN112116589B CN112116589B (en) 2024-02-27

Family

ID=73798013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011060263.3A Active CN112116589B (en) 2020-09-30 2020-09-30 Method, device, equipment and computer readable storage medium for evaluating virtual image

Country Status (1)

Country Link
CN (1) CN112116589B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643417A (en) * 2021-08-17 2021-11-12 腾讯科技(深圳)有限公司 Image adjusting method and device, electronic equipment and storage medium
CN115809696A (en) * 2022-12-01 2023-03-17 支付宝(杭州)信息技术有限公司 Virtual image model training method and device
CN116152403A (en) * 2023-01-09 2023-05-23 支付宝(杭州)信息技术有限公司 Image generation method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1816375A (en) * 2003-06-30 2006-08-09 微软公司 Personalized behavior of computer controlled avatars in a virtual reality environment
US20080163054A1 (en) * 2006-12-30 2008-07-03 Pieper Christopher M Tools for product development comprising collections of avatars and virtual reality business models for avatar use
CN108932697A (en) * 2017-05-26 2018-12-04 杭州海康威视数字技术股份有限公司 A kind of distorted image removes distortion methods, device and electronic equipment
CN109145956A (en) * 2018-07-26 2019-01-04 上海慧子视听科技有限公司 Methods of marking, device, computer equipment and storage medium
CN109871124A (en) * 2019-01-25 2019-06-11 华南理工大学 Emotion virtual reality scenario appraisal procedure based on deep learning
CN111081371A (en) * 2019-11-27 2020-04-28 昆山杜克大学 Virtual reality-based early autism screening and evaluating system and method
CN111192258A (en) * 2020-01-02 2020-05-22 广州大学 Image quality evaluation method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1816375A (en) * 2003-06-30 2006-08-09 微软公司 Personalized behavior of computer controlled avatars in a virtual reality environment
US20080163054A1 (en) * 2006-12-30 2008-07-03 Pieper Christopher M Tools for product development comprising collections of avatars and virtual reality business models for avatar use
CN108932697A (en) * 2017-05-26 2018-12-04 杭州海康威视数字技术股份有限公司 A kind of distorted image removes distortion methods, device and electronic equipment
CN109145956A (en) * 2018-07-26 2019-01-04 上海慧子视听科技有限公司 Methods of marking, device, computer equipment and storage medium
CN109871124A (en) * 2019-01-25 2019-06-11 华南理工大学 Emotion virtual reality scenario appraisal procedure based on deep learning
CN111081371A (en) * 2019-11-27 2020-04-28 昆山杜克大学 Virtual reality-based early autism screening and evaluating system and method
CN111192258A (en) * 2020-01-02 2020-05-22 广州大学 Image quality evaluation method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘刚田;曹慧敏;: "基于模糊神经网络和灰色理论的产品造型设计方法", 煤矿机械, no. 05, 15 May 2010 (2010-05-15) *
刘翠娟;刘箴;柴艳杰;刘婷婷;陈效奕;: "严肃游戏中虚拟角色行为建模综述", 中国图象图形学报, no. 07, 16 July 2020 (2020-07-16) *
陈雪峰;李树刚;: "基于BP神经网络的虚拟物品个性化设计推荐", 计算机工程, no. 10, 20 May 2008 (2008-05-20) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643417A (en) * 2021-08-17 2021-11-12 腾讯科技(深圳)有限公司 Image adjusting method and device, electronic equipment and storage medium
CN113643417B (en) * 2021-08-17 2023-06-27 腾讯科技(深圳)有限公司 Image adjustment method, device, electronic equipment and storage medium
CN115809696A (en) * 2022-12-01 2023-03-17 支付宝(杭州)信息技术有限公司 Virtual image model training method and device
CN115809696B (en) * 2022-12-01 2024-04-02 支付宝(杭州)信息技术有限公司 Virtual image model training method and device
CN116152403A (en) * 2023-01-09 2023-05-23 支付宝(杭州)信息技术有限公司 Image generation method and device, storage medium and electronic equipment
CN116152403B (en) * 2023-01-09 2024-06-07 支付宝(杭州)信息技术有限公司 Image generation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112116589B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN112116589B (en) Method, device, equipment and computer readable storage medium for evaluating virtual image
Zhang et al. Computer models for facial beauty analysis
CN110378372A (en) Diagram data recognition methods, device, computer equipment and storage medium
CN112395979B (en) Image-based health state identification method, device, equipment and storage medium
CN109657554A (en) A kind of image-recognizing method based on micro- expression, device and relevant device
CN106295591A (en) Gender identification method based on facial image and device
CN110363086A (en) Diagram data recognition methods, device, computer equipment and storage medium
Solomon et al. Interactive evolutionary generation of facial composites for locating suspects in criminal investigations
CN116097320A (en) System and method for improved facial attribute classification and use thereof
Elmahmudi et al. A framework for facial age progression and regression using exemplar face templates
Duong et al. Learning from longitudinal face demonstration—where tractable deep modeling meets inverse reinforcement learning
Lin et al. R 2-resnext: A resnext-based regression model with relative ranking for facial beauty prediction
CN114611720A (en) Federal learning model training method, electronic device and storage medium
CN114549291A (en) Image processing method, device, equipment and storage medium
CN110598097B (en) Hair style recommendation system, method, equipment and storage medium based on CNN
CN116704085A (en) Avatar generation method, apparatus, electronic device, and storage medium
Xu et al. Text-guided human image manipulation via image-text shared space
CN111400525A (en) Intelligent fashionable garment matching and recommending method based on visual combination relation learning
CN116701706B (en) Data processing method, device, equipment and medium based on artificial intelligence
Liu et al. Multimodal face aging framework via learning disentangled representation
CN115392216B (en) Virtual image generation method and device, electronic equipment and storage medium
Liang et al. Controlled autoencoders to generate faces from voices
CN110489634A (en) A kind of build information recommended method, device, system and terminal device
CN112102304A (en) Image processing method, image processing device, computer equipment and computer readable storage medium
CN113744012A (en) Information processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant