CN116580445B

CN116580445B - Large language model face feature analysis method, system and electronic equipment

Info

Publication number: CN116580445B
Application number: CN202310861560.5A
Authority: CN
Inventors: 刘雨飏; 李婷
Original assignee: Jiangxi Brain Control Technology Co ltd
Current assignee: Jiangxi Brain Control Technology Co ltd
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2024-01-09
Anticipated expiration: 2043-07-14
Also published as: CN116580445A

Abstract

The invention provides a large language model face feature analysis method, a large language model face feature analysis system and electronic equipment, wherein the large language model face feature analysis method comprises the steps of carrying out image preprocessing on a face image to obtain a processed image data set; inputting the test sample set into the trained convolutional neural network CNN for key point detection so as to obtain a plurality of basic key points; carrying out data labeling and filtering on the first edge key points to obtain dense key points of the face; carrying out face feature analysis and description on the dense key points of the faces to obtain feature recognition results; the method and the device establish a result threshold set and corpus matching rule, train a large language model according to the result threshold set and corpus matching rule, analyze and describe dense key point characteristics of the face, thereby improving the accuracy of the face recognition, avoiding the conditions of false recognition and missing recognition, and being capable of recognizing the expression and emotion of the face, and being beneficial to more comprehensively understanding the information of the face.

Description

Large language model face feature analysis method, system and electronic equipment

Technical Field

The invention belongs to the technical field of large language models, and particularly relates to a large language model face feature analysis method, a large language model face feature analysis system and electronic equipment.

Background

A large language model is an artificial intelligence model aimed at understanding and generating human language from the text level. Existing artificial intelligence techniques are limited to text interactions and lack the ability to understand facial multimodal information, such as facial features, micro-expressions, speech speed, emotion, etc.

Face analysis and recognition is a biological recognition technology for carrying out identity recognition based on facial feature information of people. A series of related technologies, commonly referred to as image recognition and face recognition, are used to capture images or video streams containing faces with a camera or cameras, and automatically detect and track the faces in the images, thereby recognizing the detected faces.

The existing face analysis identifies several characteristics of common sense other faces, such as: important characteristics such as nose, eyes, mouth, eyebrows and the like, but the identified face information is not rich enough, fine enough and low in accuracy, and the expression and emotion of the face in the face image cannot be identified by the existing large language, so that the method has a large limitation in the field of face analysis and identification.

Disclosure of Invention

In order to solve the technical problems, the invention provides a large language model face feature analysis method, a large language model face feature analysis system and electronic equipment, which are used for solving the technical problems in the prior art.

In a first aspect, the present invention provides the following technical solutions, a method for analyzing features of a face of a large language model, where the method includes:

acquiring a face image data set, and performing image preprocessing on face images in the face image data set to obtain a processed image data set;

dividing the processed image data set into a training sample set and a test sample set according to a preset proportion, inputting the training sample set into a convolutional neural network CNN for training, and inputting the test sample set into the convolutional neural network CNN after training for key point detection so as to obtain a plurality of basic key points;

performing edge point identification on the basic key points to obtain a plurality of first edge key points, performing data marking and filtering on the first edge key points to obtain second edge key points, and introducing a plurality of second edge key points into the plurality of basic key points to obtain dense key points of the face;

carrying out face feature analysis and description on the face dense key points based on the feature information of the face dense key points so as to obtain feature recognition results;

and based on the feature recognition result, establishing a result threshold set and a corpus matching rule, and training a large language model according to the result threshold set and the corpus matching rule.

Compared with the prior art, the beneficial effects of this application are: firstly, acquiring a face image data set, and carrying out image preprocessing on face images in the face image data set to obtain a processed image data set; dividing the processed image data set into a training sample set and a test sample set according to a preset proportion, inputting the training sample set into a convolutional neural network CNN for training, and inputting the test sample set into the trained convolutional neural network CNN for key point detection so as to obtain a plurality of basic key points; then, carrying out edge point identification on the basic key points to obtain a plurality of first edge key points, carrying out data marking and filtering on the first edge key points to obtain second edge key points, and introducing a plurality of second edge key points into the basic key points to obtain dense key points of the face; then carrying out face feature analysis and description on the face dense key points based on the feature information of the face dense key points so as to obtain feature recognition results; finally, based on the feature recognition result, a result threshold set and a corpus matching rule are established, and a large language model is trained according to the result threshold set and the corpus matching rule.

Preferably, the step of performing image preprocessing on the face image in the face image dataset to obtain a processed image dataset includes:

and sequentially carrying out image denoising, image enhancement and image alignment on the face images in the face image data set to obtain a processed image data set.

Preferably, the step of identifying the edge points of the basic key points to obtain a plurality of first edge key points includes:

two convolution kernels with Sobel operatorAnd->Calculating gradient of each pixel point outside the basic key point>：

；

Selecting the gradientPixel points with a gradient larger than a preset gradient are used as edge points;

and introducing a Canny operator to carry out non-maximum suppression and screening on the edge points so as to obtain a plurality of first edge key points.

Preferably, the step of introducing a Canny operator to perform non-maximum suppression and screening on the edge points to obtain a plurality of first edge key points includes:

performing non-maximum suppression on the edge points through a Canny operator and redefining the edge points:

；

in the method, in the process of the invention,for inhibiting the gradient after that->、/>Point +.>Adjacent points in the tangential direction of the edge, < >>For->Gradient of->、/>Point +. >Point->Is a gradient of (2);

setting a first gradient threshold value and a second gradient threshold value through a hysteresis threshold value method, taking pixel points with gradients larger than the second gradient threshold value as first strong edge points, taking pixel points with gradients larger than the first gradient threshold value and smaller than the second gradient threshold value as weak edge points, and eliminating pixel points with gradients smaller than the first gradient threshold value;

and taking the weak edge points connected with the strong edge points and the weak edge points connected with the strong edge points through other adjacent weak edge points as second strong edge points, and obtaining a plurality of first edge key points based on the first strong edge points and the second strong edge points.

Preferably, the step of performing face feature analysis and description on the face dense key points based on the feature information of the face dense key points to obtain feature recognition results includes:

acquiring feature point coordinates and RGB values of key parts in the dense key points of the face, and calculating color features of the face according to the RGB values and the feature point coordinates;

acquiring pixel resolution of a face image in the face image data set and shooting parameters of corresponding shooting equipment, determining a scale factor between the face image and an actual size based on the resolution and the shooting parameters, and determining facial features of a face based on the scale factor and the face dense key points;

And sequentially determining straight-curved features and quantity-sensing features of the face based on the facial features of the face, and outputting feature recognition results based on the color features, the facial features, the straight-curved features and the quantity-sensing features.

Preferably, the step of obtaining feature point coordinates and RGB values of key parts in the dense key points of the face and calculating color features of the face according to the RGB values and the feature point coordinates includes:

calculating the brightness of the face based on the RGB values and the feature point coordinates：

；

In the method, in the process of the invention,、/>、/>respectively representing red, green and blue color values;

calculating the color difference degree of the face based on the RGB value and the characteristic point coordinates：

；

In the method, in the process of the invention,、/>、/>respectively represent a first color difference weight and a second color difference weightThird color difference weight, and，/>、/>、/>respectively representing a first color difference, a second color difference and a third color difference,>、/>、/>、/>respectively representing skin color, color development, pupil color and lip color;

converting the RGB value into an hsv representation mode through a preset formula, judging the tone characteristic of the skin color according to the tone h, if the tone h is smaller than the preset value, determining the color as cold tone, and if the tone h is not smaller than the preset value, determining the color as warm tone, so as to obtain the tone characteristic of the face;

based on the brightness Said color difference->And the hue feature is used for obtaining the color feature of the human face.

Preferably, the step of establishing a result threshold set and a corpus matching rule based on the feature recognition result, and training the large language model according to the result threshold set and the corpus matching rule includes:

establishing a result threshold set according to the feature recognition result, establishing a corpus matching rule based on an application scene of the feature recognition result and the result threshold set, and determining a corpus based on the corpus matching rule;

encoding the feature recognition result into feature vectors by data in the corpus to obtain a plurality of input sequences;

calculating a first attention weight for each position in the input sequence：

；

In the method, in the process of the invention,query vector for feature recognition results and corpus data, < +.>Key vector for feature recognition result and corpus data, < +.>Value vectors for feature recognition results and corpus data, < +.>Is->Transposed matrix of>Is vector dimension;

and carrying out nonlinear transformation on the input sequence of each position, recalculating the second attention weight of each position, and training the large language model based on the input sequence and the second attention weight to obtain a trained large language model.

In a second aspect, the present invention provides a large language model face feature analysis system, the system comprising:

the acquisition module is used for acquiring a face image data set, and carrying out image preprocessing on face images in the face image data set to obtain a processed image data set;

the first detection module is used for dividing the processed image data set into a training sample set and a test sample set according to a preset proportion, inputting the training sample set into a convolutional neural network CNN for training, and inputting the test sample into the trained convolutional neural network CNN for key point detection so as to obtain a plurality of basic key points;

the second detection module is used for carrying out edge point identification on the basic key points to obtain a plurality of first edge key points, carrying out data marking and filtering on the first edge key points to obtain second edge key points, and introducing a plurality of second edge key points into the basic key points to obtain dense key points of the face;

the analysis module is used for carrying out face feature analysis and description on the face dense key points based on the feature information of the face dense key points so as to obtain feature recognition results;

And the training module is used for establishing a result threshold set and a corpus matching rule based on the feature recognition result, and training the large language model according to the result threshold set and the corpus matching rule.

Preferably, the obtaining module is specifically configured to:

In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the above-mentioned large language model face feature analysis method when executing the computer program.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a face feature analysis method for a large language model according to a first embodiment of the present invention;

FIG. 2 is a detailed flowchart of step S3 in the large language model face feature analysis method according to the first embodiment of the present invention;

FIG. 3 is a detailed flowchart of step S33 in the large language model face feature analysis method according to the first embodiment of the present invention;

FIG. 4 is a detailed flowchart of step S4 in the large language model face feature analysis method according to the first embodiment of the present invention;

FIG. 5 is a detailed flowchart of step S41 in the large language model face feature analysis method according to the first embodiment of the present invention;

FIG. 6 is a detailed flowchart of step S5 in the large language model face feature analysis method according to the first embodiment of the present invention;

FIG. 7 is a block diagram of a large language model face feature analysis system according to a second embodiment of the present invention;

fig. 8 is a block diagram of a hardware structure of an electronic device according to another embodiment of the present invention.

Embodiments of the present invention will be further described below with reference to the accompanying drawings.

Detailed Description

In order that the invention may be readily understood, a more complete description of the invention will be rendered by reference to the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Example 1

As shown in fig. 1, in a first embodiment of the present invention, the present invention provides a method for analyzing facial features of a large language model, where the method includes:

s1, acquiring a face image data set, and performing image preprocessing on face images in the face image data set to obtain a processed image data set;

specifically, the face image data set comprises face images under various conditions such as different ages, sexes, expressions, postures and the like, and after the face images are subjected to image preprocessing, the accuracy and the efficiency of a subsequent processing process can be improved;

the image preprocessing of the face image in the face image dataset specifically comprises the following steps: and sequentially carrying out image denoising, image enhancement and image alignment on the face images in the face image data set to obtain a processed image data set.

S2, dividing the processed image data set into a training sample set and a test sample set according to a preset proportion, inputting the training sample set into a convolutional neural network CNN for training, and inputting the test sample set into the trained convolutional neural network CNN for key point detection so as to obtain a plurality of basic key points;

specifically, in the model training process, the model needs to be trained by using a training set, in this embodiment, a processed image dataset is divided according to a preset proportion, wherein 30% of the processed image dataset is used as a test sample set, 70% of the processed image dataset is used as a training sample set, meanwhile, in the actual training process, a 300-W, MVFW, OCFW dataset can also be used for training a convolutional neural network CNN, the processed image dataset is used as the test sample set, so that the training process of the model is completed, and meanwhile, after the trained convolutional neural network CNN is obtained, the test sample set is input into the trained model, so that 68 basic key points of the face of a person can be obtained;

the 68 basic key points are basic characteristics of the face, so that in order to further mine the characteristic information of the face, a plurality of second key points are needed to be determined to enrich the characteristic information of the face.

S3, carrying out edge point identification on the basic key points to obtain a plurality of first edge key points, carrying out data marking and filtering on the first edge key points to obtain second edge key points, and introducing a plurality of second edge key points into the basic key points to obtain dense key points of the face;

as shown in fig. 2, the step S3 includes:

s31, two convolution kernels using Sobel operatorAnd->Calculating gradient of each pixel point outside the basic key point>：

；

Specifically, the Sobel operator is a gradient-based edge detection algorithm, in an actual image, the gradient is larger at the position where the image edge is located, and the Sobel operator convolves with the image by using two convolution kernels of 3×3 to obtain the gradient (the two convolution kernels are in the x direction respectively)And>）。

s32, selecting the gradientPixel points with a gradient larger than a preset gradient are used as edge points;

specifically, the Sobel operator can perform smoothing and gradient calculation on the image in one step, and a place where a result value obtained by the image through the Sobel operator is larger is an edge place, so that a preset gradient is set in the step, and a pixel point larger than the preset gradient is used as an edge point.

S33, introducing a Canny operator to carry out non-maximum suppression and screening on the edge points so as to obtain a plurality of first edge key points;

specifically, since the Sobel operator has poor locality and is sensitive to a specific direction, the Canny operator needs to be reintroduced, and localization of the Sobel operator is not good enough, which is problematic in that we consider pixels with gradient intensity greater than the threshold value as edges, but in reality, only at the position of the maximum value should be judged as edges, but there are some points with non-maximum values but greater than the preset gradient as edge points, so that non-maximum value suppression and screening are required to obtain the first edge key points.

As shown in fig. 3, the step S33 includes:

s331, performing non-maximum suppression on the edge points through a Canny operator and redetermining the edge points:

；

in the method, in the process of the invention,for inhibiting the gradient after that->、/>Point +.>Adjacent points in the tangential direction of the edge, < >>For->Gradient of->、/>Point +.>Point->Is a gradient of (a).

S332, setting a first gradient threshold value and a second gradient threshold value through a hysteresis threshold value method, taking pixel points with gradients larger than the second gradient threshold value as first strong edge points, taking pixel points with gradients larger than the first gradient threshold value and smaller than the second gradient threshold value as weak edge points, and eliminating pixel points with gradients smaller than the first gradient threshold value;

Specifically, the above judgment on the edge is through a single threshold, namely, the point with the processed gradient value larger than the preset gradient is regarded as the edge, but the single threshold is difficult to select, the discontinuity of the edge is easily caused by the fact that the threshold is too high, and some other noise is introduced when the threshold is too low, so that two thresholds (a first gradient threshold is a low threshold and a second gradient threshold is a high threshold) are set through a hysteresis threshold method in the step, and the problem is solved.

S333, taking a weak edge point connected with the strong edge point and a weak edge point connected with the strong edge point through other adjacent weak edge points as a second strong edge point, and obtaining a plurality of first edge key points based on the first strong edge point and the second strong edge point;

specifically, strong edge points can be directly regarded as edge key points, but some points in weak edge points can be selected as edge key points, and the conditions that the weak edge points are selected as edge key points are as follows: directly connected to the strong edge point and connected to the strong edge point by adjacent other weak edge points.

It is worth to say that after the first edge key points are obtained, the first edge key points marked by the algorithm are classified and processed through manual data marking, key points with practical significance are obtained through filtering, so that second edge key points are obtained, and then the second edge key points are added with the original 68 basic key points, so that more accurate face dense key points can be obtained.

S4, carrying out face feature analysis and description on the face dense key points based on the feature information of the face dense key points so as to obtain feature recognition results;

specifically, the step is used for carrying out feature analysis on dense key points of the human face, including analysis on the number, position, direction, distribution and the like of the features, so as to determine the image features of the human face, and carrying out recognition description on the image features of the human face according to the feature recognition result, including description on aspects of facial contours, five-sense organ features, skin colors, hairstyles and the like.

As shown in fig. 4, the step S4 includes:

s41, obtaining feature point coordinates and RGB values of key parts in the dense key points of the face, and calculating color features of the face according to the RGB values and the feature point coordinates;

specifically, the key parts herein include skin, mouth, eyes, eyebrows, hair, etc.

As shown in fig. 5, the step S41 includes:

s411, calculating the brightness of the face based on the RGB values and the feature point coordinates：

；

the brightness is 0-1, the brightness mainly reflects the brightness degree of each key part, the brightness is divided into different degrees from white to dark black, the brightness value is multiplied by 10 times, and the brightness value is represented by numbers 0-10;

Note that the 2.2 and 2.2 root here, RGB color values cannot simply be added directly, but must be converted to physical optical power with 2.2, because RGB values are not simply linear with power, but are power functions, the exponent of which is called Gamma value, typically 2.2, and this conversion process is called Gamma correction.

S412, calculating the color difference degree of the face based on the RGB value and the feature point coordinates：

；

In the method, in the process of the invention,、/>、/>respectively represents a first color difference weight, a second color difference weight and a third color difference weight, and，/>、/>、/>respectively representing a first color difference, a second color difference and a third color difference,>、/>、/>、/>respectively representing skin color, color development, pupil color and lip color;

specifically, in the image design professional field, the color difference feeling of the whole user is generally identified by the difference degree of skin color, color development, pupil color and lip color, so as to determine the color difference situation of the whole user, wherein the first color difference weight, the second color difference weight and the third color difference weight can be obtained according to test fitting, and in the embodiment, the first color difference weight, the second color difference weight and the third color difference weight are respectively 80%, 15% and 5%.

S413, converting the RGB value into an hsv representation mode through a preset formula, judging the tone characteristic of the skin color according to the tone h, if the tone h is smaller than the preset value, determining the color as a cold tone, and if the tone h is not smaller than the preset value, determining the color as a warm tone, so as to obtain the tone characteristic of the face.

S414, based on the brightnessSaid color difference->And the hue feature is used for obtaining the color feature of the human face.

S42, acquiring pixel resolution of a face image in the face image data set and shooting parameters of corresponding shooting equipment, determining a scale factor between the face image and the actual size based on the resolution and the shooting parameters, and determining facial features of the face based on the scale factor and the face dense key points;

specifically, the facial features include facial detail parameters such as face shape, eyebrow shape, eye shape, etc., specifically as follows:

(1) Face type

L1 face length = k (face edge vertex y coordinate-face edge nadir y coordinate);

l2 temporal bone width = k (upper left face x-upper right face x-coordinate);

l3 cheekbone width = k (face left middle x coordinate-face right middle x coordinate);

l4 jaw width = k (face lower left x coordinate-face lower right x coordinate);

a1 mandible angle= angle (lower left face point, bottom right face point);

ha face global radian = face edge keypoints approximate elliptical radian;

the calculation formula of the ellipse radian is as follows:

；

wherein k is a scale factor, And->Major and minor axes of ellipse, respectively, +.>Is the rotation angle of the ellipse.

(2) Eyebrow type

Lmw eyebrow width=k (M0 eyebrow right end x coordinate—m1 eyebrow left end x coordinate);

lmh eyebrow height=k (M0 eyebrow top end x coordinate-M1 eyebrow bottom end x coordinate);

am eyebrow radian= angle (left end of eyebrow, top end of eyebrow, right end of eyebrow);

(3) Eye-type

Lyw eye width = k (Y0 eye right endpoint x coordinate-Y1 eye left endpoint x coordinate);

lyh eye height = k (Y0 eye top x coordinate-Y1 eye bottom x coordinate);

ay eye radian= = -angle (left eye, top eye, right eye);

s43, determining straight-curve features and quantity sense features of the face in sequence based on the facial features of the face, and outputting feature recognition results based on the color features, the facial features, the straight-curve features and the quantity sense features.

Specifically, the process of determining the straight-curved characteristic and the quantity-sensing characteristic of the face is as follows:

according to the comprehensive comparison of the whole radian of the Ha face, the eyebrow radian and the eye radian, the whole sense of the user image can be determined to belong to a straight line and a curve so as to obtain a straight-curve characteristic;

the quantity sensing parameter mainly represents the proportion of the whole image occupied by the face type eye of the user and is used for judging the collocation mode of the clothing ornament, so that the invention judges by taking the face length as the parameter, and the specific value range is as follows:

Male: a sense of mass (more than 18.5 cm), a sense of medium mass (17-18.5 cm), a sense of small mass (less than 17 cm);

female: a sense of mass (more than 17 cm), a sense of medium mass (16-17.5 cm), and a sense of small mass (less than 16 cm).

S5, based on the feature recognition result, establishing a result threshold set and a corpus matching rule, and training a large language model according to the result threshold set and the corpus matching rule;

specifically, the large language model in the invention specifically selects the ancient model, the religion and the chatgpt3.5, and the invention can carry out compatible docking synchronization with the API of the large language model.

As shown in fig. 6, the step S5 includes:

s51, establishing a result threshold set according to the feature recognition result, establishing a corpus matching rule based on an application scene of the feature recognition result and the result threshold set, and determining a corpus based on the corpus matching rule;

specifically, the result threshold set in this step is shown in table 1 below:

TABLE 1

Specifically, the corpus in this step is shown in Table 2 below:

TABLE 2

S52, encoding the feature recognition result into feature vectors by data in the corpus to obtain a plurality of input sequences.

S53, calculating a first attention weight of each position in the input sequence：

；

In the method, in the process of the invention,query vector for feature recognition results and corpus data, < +.>Key vector for feature recognition result and corpus data, < +.>Value vectors for feature recognition results and corpus data, < +.>Is->Transposed matrix of>Is the vector dimension.

S54, carrying out nonlinear transformation on an input sequence of each position, and recalculating a second attention weight of each position, and training a large language model based on the input sequence and the second attention weight to obtain a trained large language model;

specifically, before the user communicates with the large language model, the features of the user, such as the image, the skin, the color, the face shape, the straight curve, the emotion and the like, are analyzed in advance to obtain a feature recognition result, the feature recognition result is converted into a Prompt which can be understood by the large language model, the large model can communicate with the user in a targeted manner, and the auxiliary links are as follows: in the communication process, NLP emotion recognition algorithm is synchronously introduced, interaction question-answering texts of the NLP emotion recognition algorithm and the NLP emotion recognition algorithm are subjected to semantic processing, a Prompt in language emotion is further added, so that the large language model can further understand the emotion capacity of the current user, the interaction between the user and the large language model can be more intelligent and natural through the application of the technology, the trained large language model can reorganize language from corpus training results in subdivision areas by the large language model according to the feature recognition results of the faces of the user and the problems proposed by the feature recognition results, and the user can carry out personified question-answering interaction, product recommendation and the like by combining the semantics of the whole-network large model;

Meanwhile, the invention introduces a data filter, a semantic analyzer and a prompt modifier in the communication layer of the user and the trained large language model to realize the processing process of multi-mode multi-feature concurrence.

Data Filter (Data Filter): the data filter is mainly used for carrying out preliminary data filtering on the user questions and removing invalid data, and can screen and clean the user questions according to a predefined rule or algorithm so as to ensure the quality and accuracy of the data of subsequent processing. The data filter has the characteristics of high speed and high efficiency, and can improve the effect of subsequent processing.

Semantic analyzer (Semantic Analyzer): the semantic analyzer is used for performing deep semantic understanding and analysis on the user questions, and can convert the user questions into a machine-understandable form to identify important information such as keywords, entities, intentions and the like of the questions. The semantic analyzer is characterized by being capable of understanding the real intention of the user and matching the user questions with the related corpus and model scheme data.

Prompt modifier (Question Optimizer): the main function of the prompt improver is to optimize the user question so as to improve the quality and accuracy of the question, the prompt improver can reorganize, supplement or rewrite the question according to the semantics and the context information of the user question so as to better meet the requirements of the user, and the prompt improver is characterized by being capable of optimizing the user question, so that the question is clearer and can interact with a large language model better.

According to the invention, through the synergistic effect of the data filter, the semantic analyzer and the prompt modifier, the user question can be accurately understood and processed, so that more accurate and accurate answers can be given, which means that for the user, the answers meeting the demands of the user can be obtained more quickly, the interactive experience and efficiency are improved, meanwhile, the true intention of the user can be better understood, personalized answers can be carried out according to the questions of the user, the personified question-answering interaction can increase the participation feeling and satisfaction of the user, the acceptance degree and the trust degree of the user on an artificial intelligent system are improved, the demands and the preference of the user are better understood, and the products or services meeting the demands of the user can be recommended according to the question and the context information of the user, the accuracy and the personalized degree of the product recommendation can be improved, and the purchase intention and satisfaction degree of the user are increased.

The first advantage of this embodiment is: firstly, acquiring a face image data set, and carrying out image preprocessing on face images in the face image data set to obtain a processed image data set; dividing the processed image data set into a training sample set and a test sample set according to a preset proportion, inputting the training sample set into a convolutional neural network CNN for training, and inputting the test sample set into the trained convolutional neural network CNN for key point detection so as to obtain a plurality of basic key points; then, carrying out edge point identification on the basic key points to obtain a plurality of first edge key points, carrying out data marking and filtering on the first edge key points to obtain second edge key points, and introducing a plurality of second edge key points into the basic key points to obtain dense key points of the face; then carrying out face feature analysis and description on the face dense key points based on the feature information of the face dense key points so as to obtain feature recognition results; finally, based on the feature recognition result, a result threshold set and a corpus matching rule are established, and a large language model is trained according to the result threshold set and the corpus matching rule.

Example two

As shown in fig. 7, in a second embodiment of the present invention, there is provided a large language model face feature analysis system, the system including:

the acquisition module 1 is used for acquiring a face image data set, and carrying out image preprocessing on face images in the face image data set to obtain a processed image data set;

the first detection module 2 is configured to divide the processed image dataset into a training sample set and a test sample set according to a preset proportion, input the training sample set into a convolutional neural network CNN for training, and input the test sample into the trained convolutional neural network CNN for key point detection, so as to obtain a plurality of basic key points;

the second detection module 3 is configured to identify edge points of the basic key points to obtain a plurality of first edge key points, perform data labeling and filtering on the first edge key points to obtain second edge key points, and introduce a plurality of second edge key points into a plurality of basic key points to obtain dense key points of the face;

the analysis module 4 is used for carrying out face feature analysis and description on the face dense key points based on the feature information of the face dense key points so as to obtain feature recognition results;

And the training module 5 is used for establishing a result threshold set and a corpus matching rule based on the feature recognition result, and training the large language model according to the result threshold set and the corpus matching rule.

The acquiring module 1 is specifically configured to:

The second detection module 3 includes:

gradient computation sub-module for two convolution kernels with Sobel operatorAnd->Calculating gradient of each pixel point outside the basic key point>：

；

An edge point selection sub-module for selecting the gradientPixel points with a gradient larger than a preset gradient are used as edge points;

and the suppression submodule is used for introducing a Canny operator to perform non-maximum suppression and screening on the edge points so as to obtain a plurality of first edge key points.

The suppression submodule includes:

an edge point determining unit, configured to perform non-maximum suppression on the edge point by using a Canny operator, and redetermine the edge point:

；

in the method, in the process of the invention,to suppressGradient after preparation, ">、/>Point +.>Adjacent points in the tangential direction of the edge, < >>For->Gradient of- >、/>Point +.>Point->Is a gradient of (2);

the rejecting unit is used for setting a first gradient threshold value and a second gradient threshold value through a hysteresis threshold value method, taking pixel points with gradients larger than the second gradient threshold value as first strong edge points, taking pixel points with gradients larger than the first gradient threshold value and smaller than the second gradient threshold value as weak edge points, and rejecting pixel points with gradients smaller than the first gradient threshold value;

and the edge key point determining unit is used for taking the weak edge points connected with the strong edge points and the weak edge points connected with the strong edge points through other adjacent weak edge points as second strong edge points, and obtaining a plurality of first edge key points based on the first strong edge points and the second strong edge points.

The analysis module 4 comprises:

the color feature calculation sub-module is used for acquiring feature point coordinates and RGB values of key parts in the dense key points of the human face and calculating color features of the human face according to the RGB values and the feature point coordinates;

the facial feature calculation sub-module is used for acquiring the pixel resolution of the face image in the face image data set and the shooting parameters of the corresponding shooting equipment, determining a scale factor between the face image and the actual size based on the resolution and the shooting parameters, and determining facial features of the face based on the scale factor and the dense key points of the face;

And the result output sub-module is used for sequentially determining straight and curved features and quantity sense features of the face based on the facial features of the face and outputting feature recognition results based on the color features, the facial features, the straight and curved features and the quantity sense features.

The color feature calculation submodule includes:

a brightness calculation unit for calculating the brightness of the face based on the RGB values and the feature point coordinates：

；

a difference calculating unit for calculating the color difference of the face based on the RGB value and the feature point coordinates：

；

the hue calculation unit is used for converting the RGB value into a representation mode of hsv through a preset formula, judging hue characteristics of skin color according to hue h, if the hue h is smaller than the preset value, the hue is cold hue, and if the hue h is not smaller than the preset value, the hue is warm hue, so that hue characteristics of the face are obtained;

a color feature determination unit for determining the brightness based on the brightness Said color difference->And the hue feature is used for obtaining the color feature of the human face.

The training module 5 comprises:

the corpus determining submodule is used for establishing a result threshold set according to the feature recognition result, establishing a corpus matching rule based on an application scene of the feature recognition result and the result threshold set, and determining a corpus based on the corpus matching rule;

a sequence determination submodule, configured to encode the feature recognition result into feature vectors by using data in the corpus, so as to obtain a plurality of input sequences;

a weight calculation sub-module for calculating a first attention weight for each position in the input sequence：

；

and the training sub-module is used for carrying out nonlinear transformation on the input sequence of each position and recalculating the second attention weight of each position, and training the large language model based on the input sequence and the second attention weight so as to obtain the trained large language model.

In other embodiments of the present invention, an electronic device includes a memory 102, a processor 101, and a computer program stored in the memory 102 and executable on the processor 101, where the processor 101 implements the above-mentioned large language model face feature analysis method when executing the computer program.

In particular, the processor 101 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 102 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 102 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, solid state Drive (Solid State Drive, SSD), flash memory, optical Disk, magneto-optical Disk, tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. Memory 102 may include removable or non-removable (or fixed) media, where appropriate. The memory 102 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 102 is a Non-Volatile (Non-Volatile) memory. In a particular embodiment, the Memory 102 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (Electrically Erasable Programmable Read-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (Electrically Alterable Read-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be Static Random-Access Memory (SRAM) or dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory FPMDRAM), extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory EDODRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory SDRAM), or the like, as appropriate.

Memory 102 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 101.

The processor 101 reads and executes the computer program instructions stored in the memory 102 to implement the above-described large language model face feature analysis method.

In some of these embodiments, the electronic device may also include a communication interface 103 and a bus 100. As shown in fig. 8, the processor 101, the memory 102, and the communication interface 103 are connected to each other via the bus 100 and perform communication with each other.

The communication interface 103 is used to implement communication between modules, devices, units, and/or units in the embodiments of the present application. The communication interface 103 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.

Bus 100 includes hardware, software, or both, coupling components of a computer to each other. Bus 100 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 100 may include a graphics acceleration interface (Accelerated Graphics Port), abbreviated AGP, or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated MCa) Bus, a peripheral component interconnect (Peripheral Component Interconnect, abbreviated PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (Serial Advanced Technology Attachment, abbreviated SATA) Bus, a video electronics standards association local (Video Electronics Standards Association Local Bus, abbreviated VLB) Bus, or other suitable Bus, or a combination of two or more of the foregoing. Bus 100 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

The computer can execute the large language model face feature analysis method based on the obtained large language model face feature analysis system, so that analysis of the large language model face features is realized.

In still other embodiments of the present invention, in combination with the above-mentioned large language model face feature analysis method, the embodiments of the present invention provide a technical solution, a readable storage medium storing a computer program thereon, where the computer program when executed by a processor implements the above-mentioned large language model face feature analysis method.

Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method for analyzing characteristics of a face of a large language model, the method comprising:

based on the feature recognition result, establishing a result threshold set and a corpus matching rule, and training a large language model according to the result threshold set and the corpus matching rule;

the step of establishing a result threshold set and a corpus matching rule based on the feature recognition result, and training a large language model according to the result threshold set and the corpus matching rule comprises the following steps:

encoding the feature recognition result and the data in the corpus into feature vectors to obtain a plurality of input sequences;

Calculating a first attention weight for each position in the input sequenceWherein the first attention weight +.>The calculation formula of (2) is as follows:

；

non-linear transformation of each position of the input sequence and based on a first attention weightRecalculating a first attention weight for each location, training a large language model based on the input sequence and the recalculated first attention weight to obtain a trained large language model;

before the user communicates with the large language model, converting the feature recognition result into a Prompt which can be understood by the large language model, so that the large model can communicate with the user in a targeted manner, wherein the auxiliary links are as follows: in the communication process, NLP emotion recognition algorithm is synchronously introduced, semantic processing is carried out on the interactive question-answering text of the NLP emotion recognition algorithm and the interactive question-answering text, and a Prompt in language emotion is further added, so that the trained large language model further understands the emotion of the current user;

A data filter, a semantic analyzer and a prompt modifier are introduced in the communication layer of the user and the trained large language model so as to realize the processing process of multi-mode multi-feature concurrence;

wherein, the data filter: the data filter is mainly used for carrying out preliminary data filtering on the user questions and removing invalid data, and can screen and clean the user questions according to a predefined rule or algorithm;

semantic analyzer: the semantic analyzer is used for performing deep semantic understanding and analysis on the user questions, and can convert the user questions into a machine-understandable form to identify keywords, entities and intentions of the questions;

the prompt improver: the main function of the prompt improver is to optimize the question of the user so as to improve the quality and accuracy of the question, and the prompt improver can reorganize, supplement or rewrite the question according to the semantics and the context information of the question of the user.

2. The method of claim 1, wherein the step of image preprocessing the face images in the face image dataset to obtain a processed image dataset comprises:

3. The method of claim 1, wherein the step of performing edge point recognition on the basic key points to obtain a plurality of first edge key points comprises:

；

4. A face feature analysis method of a large language model according to claim 3, wherein the step of introducing a Canny operator to perform non-maximum suppression and screening on the edge points to obtain a plurality of first edge key points comprises:

；

in the method, in the process of the invention,for inhibiting the gradient after that->、/>Point +.>Adjacent points in the tangential direction of the edge, < >>For- >Gradient of->、/>Point +.>Point(s)Is a gradient of (2);

5. The method for analyzing facial features of a large language model according to claim 1, wherein the step of analyzing and describing facial features of the dense key points of the face based on the feature information of the dense key points of the face to obtain feature recognition results comprises:

6. The method for analyzing the features of a face of a large language model according to claim 5, wherein the step of obtaining feature point coordinates and RGB values of key parts in the dense key points of the face and calculating the color features of the face according to the RGB values and the feature point coordinates comprises:

；

Based on the brightnessSaid color difference->And the hue feature is used for obtaining the color feature of the human face.

7. A large language model face feature analysis system, the system comprising:

The training module is used for establishing a result threshold set and a corpus matching rule based on the feature recognition result, and training a large language model according to the result threshold set and the corpus matching rule;

the training module comprises:

the sequence determination submodule is used for encoding the feature recognition result and the data in the corpus into feature vectors so as to obtain a plurality of input sequences;

a weight calculation sub-module for calculating a first attention weight for each position in the input sequenceWherein the first attention weight +.>The calculation formula of (2) is as follows:

；

a training sub-module for performing nonlinear transformation on each position of the input sequence and based on a first attention weight The first attention weight of each position is recalculated, and the large language model is trained based on the input sequence and the recalculated first attention weight, so that the trained large language model is obtained.

8. The large language model face feature analysis system of claim 7, wherein the obtaining module is specifically configured to:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the large language model face feature analysis method of any one of claims 1 to 6 when the computer program is executed by the processor.