WO2023151289A1 - 情感识别方法、训练方法、装置、设备、存储介质及产品 - Google Patents

情感识别方法、训练方法、装置、设备、存储介质及产品 Download PDF

Info

Publication number
WO2023151289A1
WO2023151289A1 PCT/CN2022/122788 CN2022122788W WO2023151289A1 WO 2023151289 A1 WO2023151289 A1 WO 2023151289A1 CN 2022122788 W CN2022122788 W CN 2022122788W WO 2023151289 A1 WO2023151289 A1 WO 2023151289A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
emotion recognition
recognition model
current
emotion
Prior art date
Application number
PCT/CN2022/122788
Other languages
English (en)
French (fr)
Inventor
赵雅倩
王斌强
董刚
李仁刚
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023151289A1 publication Critical patent/WO2023151289A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Definitions

  • the embodiments of the present application relate to the technical field of emotion recognition, and in particular, relate to an emotion recognition method, an emotion recognition model training method, a device, electronic equipment, a computer non-volatile readable storage medium, and a computer program product.
  • ANN Artificial Neural Network
  • ANNs Artificial Neural Networks
  • Artificial Neural Networks the reasoning of the emotion recognition model needs to consume a large amount of energy on the mobile device, and the emotion recognition method of this high-energy ANNs mode hinders the application of emotion recognition on embedded and mobile devices .
  • the low-power pulse neural network (SNN, Spiking Neural Network, or SNNs, Spiking Neural Networks) is a potential solution for implementing emotion recognition algorithms suitable for embedded and mobile terminals.
  • SNN Spiking Neural Network
  • SNNs Spiking Neural Networks
  • the structure of a single neuron in an SNN is more similar to the structure of a neuron in the brain.
  • SNNs are usually used to complete emotion recognition tasks to extract emotional information from speech, cross-modality, or EEG.
  • emotional information has not been extracted from video clips, which limits the way of emotion recognition. Therefore, how to extract emotional information from video clips is a problem to be solved by those skilled in the art.
  • the purpose of the embodiment of the present application is to provide an emotion recognition method, an emotion recognition model training method, a device, an electronic device, a computer non-volatile readable storage medium, and a computer program product, which can realize recognition based on video information during use.
  • Emotion categories increase the ways of emotion recognition, which is conducive to better emotion recognition.
  • an emotion recognition method including:
  • the impulse neural network emotion recognition model is used to identify the pulse sequence to be recognized, and the corresponding emotion category is obtained.
  • the method before acquiring the pulse sequence to be identified corresponding to the video information, the method further includes:
  • the pre-established spiking neural network emotion recognition model is trained to obtain the trained spiking neural network emotion recognition model.
  • the pre-established spiking neural network emotion recognition model is trained, and the trained spiking neural network emotion recognition model includes:
  • the test set is used to test and train the pre-established spiking neural network emotion recognition model, and the trained spiking neural network emotion recognition model is obtained.
  • the pre-established spiking neural network emotion recognition model is trained to obtain the trained spiking neural network emotion recognition model, including:
  • the pre-established spiking neural network emotion recognition model is trained by using the dynamic visual data set, and the trained spiking neural network emotion recognition model is obtained.
  • the process of pre-establishing a dynamic visual dataset based on emotion recognition includes:
  • the process of simulating and processing the original visual data by using a dynamic visual sensor simulation method to obtain multiple pulse sequences corresponding to the original visual data includes:
  • N frames of video frame images in the original dynamic video data are traversed sequentially; wherein, N represents the total number of video frame images contained in the original visual data;
  • the video frame image of the current i-th frame is converted from the RGB color space to the grayscale space, and the converted video frame data is used as the current video frame data; wherein, the numerical range of i is 1 to N;
  • the process of simulating and processing the original visual data by using the dynamic visual sensor simulation method to obtain the corresponding pulse sequence also includes:
  • the process of simulating and processing the original visual data by using the dynamic visual sensor simulation method to obtain the corresponding pulse sequence also includes:
  • the traversal of N frames of video frame images in the original dynamic video data is completed, and a pulse sequence composed of the first output channel and the second output channel is obtained.
  • the position corresponding to the first output channel is assigned a value of 1;
  • the position corresponding to the second output channel is assigned a value of 1.
  • the spiking neural network includes groups of voting neurons
  • the process of using the dynamic visual data set to train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model includes:
  • the training is ended, and the trained spiking neural network emotion recognition model is obtained.
  • the process of using the dynamic visual data set to train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model also includes:
  • the process of processing the original visual data by using a dynamic visual sensor simulation method to obtain a pulse sequence corresponding to the original visual data includes:
  • N represents the total number of video frame images contained in the original visual data.
  • the corresponding position of the first output channel is assigned a value of 1; when the grayscale difference value is smaller than the preset threshold value, the second output channel The corresponding position is assigned a value of 1.
  • the spiking neural network includes a feature extraction module, a voting neuron group module, and an emotion mapping module;
  • the process of obtaining the trained spiking neural network emotion recognition model includes:
  • the dynamic visual data set is used as the input of the current pulse neural network, and the forward propagation of the network obtains the output frequency of the voting neuron group of each emotion category;
  • the step of forward propagation is to obtain the output frequency of the voting neuron group of each emotion category for the next round of training.
  • the spiking neural network also includes: a feature extraction module, wherein the feature extraction module includes a single forward extraction unit composed of convolution, normalization, parameterized leaky integrated distribution model PLIF and average pooling, and A network unit composed of two layers of fully connected and PLIF arranged at intervals.
  • a feature extraction module includes a single forward extraction unit composed of convolution, normalization, parameterized leaky integrated distribution model PLIF and average pooling, and A network unit composed of two layers of fully connected and PLIF arranged at intervals.
  • the embodiment of the present application also provides a method for training an emotion recognition model, including:
  • Pre-establish a dynamic visual data set based on emotion recognition use the dynamic visual data set to train the pre-established spiking neural network emotion recognition model, and obtain the trained spiking neural network emotion recognition model;
  • test set of various emotion categories use the test set to test and train the pre-established spiking neural network emotion recognition model, and obtain the trained spiking neural network emotion recognition model.
  • the embodiment of the present application also provides an emotion recognition device, including:
  • An acquisition module configured to acquire a pulse sequence to be identified corresponding to the video information
  • the recognition module is used to use the pre-established pulse neural network emotion recognition model to recognize the pulse sequence to be recognized to obtain the corresponding emotion category.
  • the device also includes:
  • the training module is used to train the spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • training modules include:
  • the test set acquisition module is used to obtain the test sets of multiple emotional categories after the first building module pre-establishes the impulse neural network emotion recognition model;
  • the first training module is used to use the test set obtained by the test set acquisition module to test and train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • training modules include:
  • the second training module is used to train the pre-established spiking neural network emotion recognition model by using the dynamic visual data set to obtain the trained spiking neural network emotion recognition model.
  • the establishment module includes: a pulse sequence acquisition module, a data set establishment module, and at least one of a simulation processing module and a pulse sequence acquisition module, wherein,
  • a data acquisition module configured to acquire raw visual data based on emotion recognition
  • the simulation processing module is used to simulate and process the original visual data by means of a dynamic visual sensor simulation method to obtain a plurality of pulse sequences corresponding to the original visual data;
  • the first pulse sequence acquisition module is used to directly acquire a plurality of pulse sequences corresponding to the original visual data by using a dynamic vision camera;
  • the data set establishment module is used to establish a dynamic visual data set based on emotion recognition based on multiple pulse sequences obtained by the simulation processing module or the pulse sequence acquisition module.
  • the simulation processing module includes:
  • the traversal module is used to traverse the N frames of video frame images in the original dynamic video data in turn; wherein, N represents the total number of video frame images included in the original visual data;
  • the first conversion module is used to convert the video frame image of the current i-th frame from the RGB color space to the gray-scale space when the traversal module traverses to the current i-th frame, and use the converted video frame data as the current video frame Data; where, the value of i ranges from 1 to N;
  • the first assignment module is used to assign all floating-point data of the current video frame data to the first output channel of the first time step of the analog data when the value of i is equal to 1, so as to obtain a pulse sequence composed of the first output channel .
  • the simulation processing module also includes:
  • the second assignment module is used to assign values to the first output channel and the second output channel respectively according to the gray difference value and the preset threshold value between the current video frame and the previous video frame when i is not equal to 1, and
  • the current video frame data is used as the previous video frame;
  • the first update module is used to update the value of i plus 1;
  • the second conversion module is also used to convert the video frame image of the current i frame from the RGB color space to the gray space when the i updated by the update module is less than N, and use the converted video frame data as the current Video frame data.
  • the simulation processing module also includes:
  • the second pulse sequence acquisition module is used to complete the traversal of N frames of video frame images in the original dynamic video data when the updated i of the update module is not less than N, and obtain the image composed of the first output channel and the second output channel pulse train.
  • the assignment module includes: a first calculation module, and at least one of the first location assignment module and the first location assignment module, wherein,
  • the first calculation module is used to calculate the gray scale difference value between the current video frame and the previous video frame at the pixel for each pixel;
  • the first position assignment module is used to assign a value of 1 at the position corresponding to the first output channel when the gray scale difference value is greater than the preset threshold;
  • the second position assignment module is configured to assign a value of 1 to the position corresponding to the second output channel when the gray level difference value is smaller than the preset threshold.
  • the spiking neural network includes groups of voting neurons
  • the second training module includes:
  • the initialization module is used to initialize the parameter weights of the pre-established spiking neural network emotion recognition model
  • the first propagation module is used to use the dynamic visual data set as the input of the current spiking neural network in the spiking neural network emotion recognition model, through the forward propagation of the current spiking neural network, to obtain the output frequency of the voting neuron group of each emotion category ;
  • An error calculation module is used to calculate the error between the output frequency of the voting neuron group of the emotional category and the true label of the corresponding emotional category for each emotional category;
  • the gradient calculation module is used to calculate the gradient corresponding to the parameter weight according to the error
  • the second update module is used to update the parameter weights of the current spike neural network using the gradient calculated by the gradient calculation module;
  • Judgment module for judging whether the current spiking neural network after the update module updates the parameter weights converges
  • the model training determination module is used for ending the training when the judging module judges that the current impulse neural network after updating the parameter weights has converged, and obtains the trained impulse neural network emotion recognition model.
  • the second training module also includes:
  • the second propagation module is used to use the dynamic visual data set as the input of the current pulse neural network in the pulse neural network emotion recognition model through the current pulse neural network when the judging module judges that the current pulse neural network after updating the parameter weight does not converge. Forward propagation, the output frequency of the voting neuron group of each emotion category is obtained.
  • the judging module includes at least one of the following:
  • the first judging module is used to determine whether the current spiking neural network converges by judging whether the current training times of the current spiking neural network after updating the parameter weight reaches a preset number of times;
  • the second judging module is used to determine whether the current spiking neural network converges by judging whether the error decrease degree of the current spiking neural network after updating the parameter weight is stable within a preset range;
  • the third judging module is used to determine whether the current pulse neural network converges by judging whether the error of the current pulse neural network after updating the parameter weight is less than the error threshold;
  • the fourth judging module is used for judging whether the current spiking neural network after updating the parameter weights converges according to the verification set in the dynamic vision data set.
  • the spiking neural network also includes: a feature extraction module, wherein the feature extraction module includes a single forward extraction unit composed of convolution, normalization, parameterized leaky integrated distribution model PLIF and average pooling, and A network unit composed of two layers of fully connected and PLIF arranged at intervals.
  • a feature extraction module includes a single forward extraction unit composed of convolution, normalization, parameterized leaky integrated distribution model PLIF and average pooling, and A network unit composed of two layers of fully connected and PLIF arranged at intervals.
  • the embodiment of the present application also provides an emotion recognition model training device, which includes:
  • the training module is used to train the pre-established spiking neural network emotion recognition model by using the dynamic visual data set to obtain the trained spiking neural network emotion recognition model.
  • the embodiment of the present application also provides an emotion recognition model training device, which includes:
  • Obtain module be used for obtaining the test set of multiple emotion categories
  • the training module is used to test and train the pre-established spiking neural network emotion recognition model by using the test set to obtain the trained spiking neural network emotion recognition model.
  • the embodiment of the present application also provides an emotion recognition device, including:
  • the processor is configured to implement the above emotion recognition method or the above emotion recognition model training method when executing the computer program.
  • the embodiment of the present application also provides an electronic device, including:
  • the processor is configured to implement the above emotion recognition method or the above emotion recognition model training method when executing the computer program.
  • the embodiment of the present application also provides a computer non-volatile readable storage medium.
  • a computer program is stored on the computer non-volatile readable storage medium.
  • the computer program is executed by a processor, the above-mentioned emotion recognition method or the above-mentioned Emotion recognition model training method.
  • the present application also provides a computer program product, including computer programs or instructions.
  • the computer programs or instructions are executed by a processor, the above convolution feature caching method or the above emotion recognition method or the above emotion recognition model training method is implemented.
  • the pulse sequence to be recognized corresponding to the video information is obtained first; then, the pulse sequence to be recognized is identified by using the pulse neural network emotion recognition model to obtain the corresponding emotion category. That is to say, in the embodiment of the present application, the acquired pulse sequence to be recognized can be identified by the pulse neural network emotion recognition model to obtain the corresponding emotion category. That is, the present application can identify emotion categories based on video information, which increases the way of emotion recognition, and is beneficial to better emotion recognition.
  • the pulse neural network can also be trained by pre-establishing a dynamic visual data set to obtain the emotion recognition model of the pulse neural network, and then obtain the pulse sequence to be recognized corresponding to the video information, and input the pulse sequence to be recognized
  • the spiking neural network emotion recognition model is used to identify the pulse sequence to be recognized to obtain the corresponding emotion category. That is to say, in the embodiment of the present application, the pulse neural network can be trained by establishing a dynamic visual data set in advance, and the pulse sequence to be recognized corresponding to the video information can be identified by using the pulse neural network emotion recognition model obtained after training, Obtaining the corresponding emotion category increases the ways of emotion recognition, which is beneficial to better emotion recognition of video information.
  • FIG. 1 is a schematic flow diagram of an emotion recognition method provided in an embodiment of the present application
  • Fig. 2 is a schematic flow chart of a method for converting the original dynamic visual number into a pulse sequence provided by the embodiment of the present application;
  • FIG. 3 is a schematic structural diagram of a spiking neural network provided in an embodiment of the present application.
  • Fig. 4 is a schematic flow chart of a method for establishing a pulse neural network emotion recognition model provided by an embodiment of the present application
  • Fig. 5 is the flow chart of a kind of emotion recognition model training method provided by the embodiment of the present application.
  • FIG. 6 is another flow chart of an emotion recognition model training method provided in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an emotion recognition device provided in an embodiment of the present application.
  • FIG. 8 is another structural schematic diagram of an emotion recognition device provided by an embodiment of the present application.
  • FIG. 9 is a structural block diagram of an emotion recognition model training device provided in an embodiment of the present application.
  • FIG. 10 is another structural block diagram of an emotion recognition model training method provided by an embodiment of the present application.
  • Fig. 11 is a block diagram of an electronic device provided by an embodiment of the present application.
  • Fig. 12 is a block diagram of an apparatus for emotion recognition or emotion recognition model training provided by an embodiment of the present application.
  • the embodiment of the present application provides an emotion recognition method, an emotion recognition model training method, a device, a computer non-volatile readable storage medium, and a computer program product, which can recognize emotion categories based on video information during use, and make emotion There are more ways to recognize, which is conducive to better emotion recognition.
  • FIG. 1 is a schematic flowchart of an emotion recognition method provided in an embodiment of the present application.
  • the method including:
  • a dynamic vision camera can be used to directly acquire the pulse sequence to be identified corresponding to the video information, or the simulation data of the video information can be used. It should be noted that due to the high cost of the dynamic vision camera, in order to reduce the cost in the embodiment of the present application, the video information may be obtained first, and then the video information may be simulated to obtain the corresponding pulse sequence to be recognized.
  • S120 Using the pulse neural network emotion recognition model to recognize the pulse sequence to be recognized, and obtain the corresponding emotion category.
  • the spiking neural network emotion recognition model is obtained by training the spiking neural network with a pre-established dynamic visual data set.
  • the specific process of identifying the pulse sequence to be recognized includes: inputting the pulse sequence to be recognized into the emotion recognition model of the pulse neural network, identifying the pulse sequence to be recognized through the emotion recognition model of the pulse neural network, and obtaining the corresponding emotional category.
  • the spiking neural network emotion recognition model in this step is pre-established, optional, after setting up the spiking neural network emotion recognition model, and before identifying the pulse sequence to be identified, first carry out the pre-established spiking neural network emotion recognition model training to obtain the trained spiking neural network emotion recognition model.
  • the specific training process includes:
  • a training method obtaining a test set of multiple emotion categories; using the test set to test and train a pre-established spiking neural network emotion recognition model to obtain a trained spiking neural network emotion recognition model.
  • Another training method pre-establish a dynamic visual data set based on emotion recognition; use the dynamic visual data set to train the pre-established spiking neural network emotion recognition model, and obtain the trained spiking neural network emotion recognition model.
  • a dynamic visual dataset based on emotion recognition and a spike neural network are pre-established, and then the dynamic visual dataset is used to train the spike neural network to obtain a trained spike network emotion recognition model.
  • the above-mentioned process of pre-establishing the dynamic visual data set may specifically include:
  • the original visual data is simulated and processed by a dynamic visual sensor simulation method, and multiple pulse sequences corresponding to the original visual data are obtained;
  • the dynamic vision camera can be used to directly acquire the pulse sequence to be recognized corresponding to the video information, but the cost of the dynamic vision camera is relatively high.
  • the original visual data based on emotion recognition can be collected through ordinary video acquisition equipment, and then the dynamic visual sensor simulation method is used to simulate the original visual data to obtain the pulse corresponding to the original visual data data, realizing the conversion of original visual data into pulse data, saving equipment cost.
  • the pulse sequence corresponding to one original visual data is actually a pulse sequence array formed by the pulse sequence of each pixel position of each video picture in the entire original visual data.
  • the pulse sequence array is referred to as It is the pulse sequence corresponding to the original visual data, and in practical application, by using the above-mentioned dynamic visual sensor simulation method to simulate and process multiple original visual data, multiple pulse sequences are obtained, and based on the multiple pulse sequences, the emotion recognition based dynamic vision dataset.
  • the above-mentioned process of using the dynamic visual sensor simulation method to simulate the original visual data to obtain multiple pulse sequences corresponding to the original visual data may specifically include:
  • S200 Traverse the N frames of video frame images in the original dynamic video data in sequence; when traversing to the current i-th frame, convert the video frame image of the current i-th frame from RGB color space to grayscale space, and convert The subsequent video frame data is used as the current video frame data, where N represents the total number of video frame images contained in the original visual data, and the value of i ranges from 1 to N. That is, start traversing from the first video frame image of the original visual data, convert the i-th video frame image from the RGB color space to the gray space, and obtain the converted current video frame data; where RGB is the image color
  • RGB is the image color
  • the three primary colors, namely red, green and blue, are expressed in English: red, green, blue.
  • S210 judge whether i is equal to 1; if it is equal to 1, enter S220; if not equal to 1, enter S230;
  • S220 assign all floating-point data of the current video frame data to the first output channel of the first time step of the analog data, obtain a pulse sequence composed of the first output channel, and use the current video frame data as the previous frame video frame ;
  • S230 Assign values to the first output channel and the second output channel respectively according to the grayscale difference value between the current video frame and the previous video frame and the preset threshold, and use the current video frame data as the previous video frame;
  • S240 Add 1 to the value of i, and judge whether the updated i is less than N; if it is less than N, return to the step of converting the i-th frame video frame image from the RGB color space to the gray space, namely S200; if not less than N, enter S250;
  • S250 Complete the traversal of N frames of video frame images in the original dynamic video data, end the operation, and obtain a pulse sequence composed of the first output channel and the second output channel.
  • the characteristic of dynamic vision is that the camera no longer captures all the information in the entire scene, especially when the scene changes little, which can greatly reduce the amount of data recording and transmission.
  • the grayscale information between adjacent picture frames in the video data is differentiated, and the difference result is judged according to a preset threshold, so as to determine whether to record data to complete the simulation conforming to the dynamic visual characteristics.
  • the recording feature of dynamic visual data is to only record changes, which are defined by formalized symbolic descriptions, generally expressed by E[ xi , y i , t i , p i ], where E represents an event, and an event has only two attributes, (x i , y i ) represents the position where the event occurs in the scene, t i represents the time when the event occurs, and pi represents the polarity of the event, for example, for the light intensity in the scene recorded by the event
  • the change of light intensity has two directions, from strong to weak or from weak to strong. Both of these changes indicate the occurrence of events. In order to distinguish these two events, the dimension of polarity is defined.
  • the method provided by the embodiment of this application is to generate similar dynamic visual data through computer simulation.
  • the continuous recording of a scene is represented by video data here.
  • the The data is the original visual data for emotion recognition.
  • a piece of original visual data contains a total of N frames of video frame images.
  • These video frame images are the input of the dynamic visual sensor simulation method.
  • the calculation can be performed according to the following simulation steps to generate simulated dynamic visual data. :
  • the analog visual data representation of all zero values can be defined: E[ xi , y i , ti , p i ], where the value of i ranges from 1 to N, and the size of E is H ⁇ W ⁇ N ⁇ 2, where H and W are the height and width of the video frame image respectively; the intermediate variable is initialized to record the data of the previous frame, marked as F pre , and the sensitivity between frames (that is, the preset threshold) is defined as Sens, specifically Simulates an event for when the difference between two frames exceeds the sensitivity.
  • N frames of video frame images in the entire original dynamic video data may be traversed from the first video frame image.
  • the video frame image is converted from the RGB color space to the gray scale space, represented by V gray , and the converted video frame data is used as the current video frame data, and then i size is judged.
  • all floating-point data of the current video frame data can be assigned to the first output of the first time step of the analog data Channel (can be realized by code E[:,:,i, 0] ⁇ V gray ), and use the current video frame data as the previous frame video frame (can be realized by code F pre ⁇ V gray ).
  • the corresponding position of the first output channel is assigned a value of 1; when the grayscale difference value is smaller than the preset threshold value, the second output channel The corresponding position is assigned a value of 1.
  • the grayscale difference value between the current video frame and the previous video frame at the pixel is calculated, and then the grayscale difference value is compared with Preset thresholds are compared, and two types of events are assigned values according to the comparison results.
  • the corresponding position of the first output channel is assigned a value of 1, which can be passed through the code E[: ,:, i, 0] ⁇ int(V gray -F pre >Sens)
  • the position corresponding to the second output channel is assigned a value of 1, which can be passed through the code E[:, :, i, 1] ⁇ int (V gray -F pre > Sens) implementation.
  • PLIF Parametric Leaky-Integrate and Fire model, parameterized leaky-integrated release model. Joint optimization is more comprehensive than manual setting, and can be optimized to obtain better prominent weights.
  • PLIF is used as a layer in SNN to construct an emotion recognition SNN model, as follows:
  • the above-mentioned spiking neural network includes a feature extraction module, a voting neuron group module and an emotion mapping module;
  • the original video frame in Figure 3 obtains a pulse sequence through the dynamic visual simulation algorithm (that is, the dynamic visual sensor simulation method), and the pulse sequence is used as the input of the pulse neural network.
  • Feature extraction is performed from the input pulse sequence to obtain more expressive pulse features;
  • the function of the voting neuron group module is to simulate the working characteristics of group neurons in the brain, and use multiple neurons to represent a decision-making tendency;
  • the emotion mapping module determines the mapping result of the final emotion classification based on the frequency of firing pulses of neuron groups.
  • the feature extraction module in the embodiment of this application simulates the way brain neurons process information, abstracts convolution operations and pooling operations, and uses pulse neurons when transmitting information in this embodiment of the application Model PLIF.
  • the feature extraction module includes a single forward extraction unit composed of convolution, normalization, parameterized leaky integrated distribution model PLIF and average pooling, and a network unit composed of two layers of full connection and PLIF arranged at intervals.
  • the operation of a single forward feature extraction includes: a convolution operation with a convolution kernel of 3 ⁇ 3 (such as Conv 3x3 in Figure 3), a normalization operation (such as BatchNorm in Figure 3), PLIF (such as PLIFNode in Figure 3), average pooling (AvgPool in Figure 3), this calculation process can be repeated multiple times (for example, 3 times), and the input pulse will be compressed to a certain extent, reducing the number of pulse features and improving The discriminative properties of pulse features are improved, and the average pooling window size can be 2 ⁇ 2.
  • a two-layer fully connected method is used in the feature extraction module for further effective feature compression, because the output of the traditional fully connected layer is a floating point number, which represents the membrane potential , so it is necessary to add the PLIF layer to convert the floating point number into the transmission form of the pulse, that is, to adopt two layers of fully connected and PLIF arranged at intervals, and the specific order is fully connected layer 1, PLIF1, fully connected layer 2, PLIF2, among them,
  • the number of neurons contained in fully connected layer 1 and PLIF1 can be flexibly set, but the number of both must be consistent, for example, set to 1000; the number of neurons contained in fully connected layer 2 and PLIF2 needs to be set according to the specific number of output emotion categories , for example, for binary classification, it can be set to 20, and the specific values can be determined according to actual needs, which is not specifically limited in this embodiment of the present application.
  • Voting neuron group module the decision-making of neurons in the brain is based on the collaborative work of multiple neurons, so in the embodiment of this application, for the final number of emotion recognition categories, multiple neurons are used to form a group to perform a certain emotion category identify. Specifically, ten neurons can be used to form a group corresponding to a category. In the embodiment of this application, an example of recognition of two emotion categories is used to explain, that is to say, ten neurons are used to jointly decide whether the group of neurons corresponds to The total number of emotional categories is the number of emotional categories multiplied by ten, and the output of the voting neuron group module is a pulse sequence.
  • the emotion mapping module needs to map the pulse sequence output by the voting neuron group module to the final emotion category.
  • the pulse sequence emitted by each neuron can correspond to a frequency, which can be used as one of the output maps of the neuron, and then the frequencies of neurons in all neuron groups of the current category are averaged, so that each category Each neuron group corresponds to a final frequency. The larger the frequency, the corresponding emotion category is activated, and the emotion category corresponding to the neuron group with the highest frequency is output.
  • the process may include:
  • S310 Initialize the parameter weights of the pre-established spiking neural network, specifically, initialize the parameter weights of the pre-established spiking neural network emotion recognition model;
  • the dynamic visual data set can be divided into three parts, namely training set, verification set and test set, and the spiking neural network emotion recognition model is pre-built.
  • the spiking neural network emotion recognition model is specific As mentioned above, the embodiments of the present application will not be described again. Specifically, the parameter weights of the spiking neural network emotion recognition model are first initialized.
  • S320 Use the dynamic visual data set as the input of the current spiking neural network, and propagate the network forward to obtain the output frequency of the voting neuron group of each emotion category. Specifically, use the dynamic visual data set as the input of the spiking neural network emotion recognition model The input of the current spiking neural network is propagated forward through the current spiking neural network to obtain the output frequency of the voting neuron group of each emotion category;
  • the current spike neural network is determined based on the current parameter weights in each round of training, and the training set in the dynamic visual data set is used as the input of the current spike neural network in the spike neural network emotion recognition model, and then through The forward propagation of the current spike neural network obtains the output frequency of the voting neuron group of each emotion category.
  • the average value of the output frequency of each voting neuron in the voting neuron group can be calculated. Get the output frequency of the group of voting neurons.
  • S330 Calculate the error between the output frequency and the real label of the corresponding emotion category for each emotion category; specifically, calculate the error between the output frequency of the voting neuron group of the emotion category and the real label of the corresponding emotion category for each emotion category error;
  • the error can be calculated according to the output frequency of the voting neuron group and the real label of the corresponding emotion category. squared error MSE.
  • S340 Calculate the gradient corresponding to the parameter weight according to the error, and use the gradient to update the parameter weight of the current spiking neural network;
  • the final average error can be calculated according to the error corresponding to each voting neuron group, and then the gradient corresponding to the parameter weight can be calculated according to the average error, and then the current pulse in the impulse neural network emotion recognition model can be adjusted using the gradient
  • the parameter weights of the neural network are updated.
  • the stochastic gradient descent (Stochastic Gradient Descent, SGD) algorithm can be used, and other parameter optimization methods based on gradient descent can also be used to update the parameter weights, including but not limited to RMSprob (Root Mean Square propagation), Adagrad (Adaptive Subgradient), Adam (Adaptive Moment Estimation), Adamax (Adam variant based on infinite norm), ASGD (Averaged Stochastic Gradient Descent), RMSprob and other methods, which method can be used according to the actual situation Alternatively, this embodiment of the present application does not specifically limit it.
  • the current spiking neural network in the spiking neural network emotion recognition model is determined based on the updated parameter weights, and then the convergence of the current spiking neural network can be further based on the verification set in the dynamic visual data set Make a judgment, and when the current spiking neural network converges, enter S360 to end the operation, and obtain the spiking neural network emotion recognition model based on the latest parameter weights, and can also use the obtained test sets of various emotional categories to recognize the spiking neural network emotion
  • the model is tested and trained, and the corresponding emotion category is output, that is, the trained spiking neural network emotion recognition model is obtained.
  • S360 End the training, and obtain the trained spiking neural network emotion recognition model. Specifically, when it is determined that the current spiking neural network after updating the parameter weights has converged, end the training, and obtain the trained spiking neural network emotion recognition model.
  • the pulse neural network is trained by pre-establishing a dynamic visual data set to obtain the pulse neural network emotion recognition model, and then the pulse sequence to be recognized corresponding to the video information is obtained, and the pulse sequence to be recognized is input into the pulse neural network.
  • the pulse sequence to be recognized is recognized by the pulse neural network emotion recognition model, and the corresponding emotion category is obtained; the application can realize the recognition of the emotion category based on the video information during the use process, so that the ways of emotion recognition increase, and there are It is conducive to better emotion recognition.
  • FIG. 5 is a flow chart of an emotion recognition model training method provided in the embodiment of the present application, including:
  • S501 Pre-establish a dynamic visual data set based on emotion recognition
  • S502 Using the dynamic visual data set to train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • FIG. 6 is another flow chart of an emotion recognition model training method provided in the embodiment of the present application, including:
  • S602 Using the test set to test and train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • an embodiment of the present application further provides an emotion recognition device, please refer to FIG. 7 for details.
  • the unit includes:
  • Obtaining module 21 is used for obtaining the pulse sequence to be identified corresponding to video information
  • the recognition module 22 is configured to use the pulse neural network emotion recognition model to recognize the pulse sequence to be recognized to obtain the corresponding emotion category.
  • the device further includes: a training module 801, a schematic diagram of which is shown in FIG. 8 , wherein,
  • the training module 81 is used to train the spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • the training module includes: a test set acquisition module and a first training module, wherein,
  • the test set acquisition module is used to obtain the test sets of multiple emotional categories after the first building module pre-establishes the impulse neural network emotion recognition model;
  • the first training module is used to use the test set obtained by the test set acquisition module to test and train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • the training module includes: an establishment module and a second training module, wherein,
  • the second training module is used to train the pre-established spiking neural network emotion recognition model by using the dynamic visual data set to obtain the trained spiking neural network emotion recognition model.
  • the establishment module includes: a pulse sequence acquisition module, a data set establishment module, and at least one of a simulation processing module and a pulse sequence acquisition module, in,
  • a data acquisition module configured to acquire raw visual data based on emotion recognition
  • the simulation processing module is used to simulate and process the original visual data by means of a dynamic visual sensor simulation method to obtain a plurality of pulse sequences corresponding to the original visual data;
  • the first pulse sequence acquisition module is used to directly acquire a plurality of pulse sequences corresponding to the original visual data by using a dynamic vision camera;
  • the data set establishment module is used to establish a dynamic visual data set based on emotion recognition based on multiple pulse sequences obtained by the simulation processing module or the pulse sequence acquisition module.
  • the simulation processing module includes: a traversal module, a first conversion module and a first assignment module, wherein,
  • the traversal module is used to traverse the N frames of video frame images in the original dynamic video data in turn; wherein, N represents the total number of video frame images included in the original visual data;
  • the first conversion module is used to convert the video frame image of the current i-th frame from the RGB color space to the gray-scale space when the traversal module traverses to the current i-th frame, and use the converted video frame data as the current video frame Data; where, the value of i ranges from 1 to N;
  • the first assignment module is used to assign all floating-point data of the current video frame data to the first output channel of the first time step of the analog data when the value of i is equal to 1, so as to obtain a pulse sequence composed of the first output channel .
  • the simulation processing module further includes: a second assignment module, an update module and a second conversion module, wherein,
  • the second assignment module is used to assign values to the first output channel and the second output channel respectively according to the gray difference value and the preset threshold value between the current video frame and the previous video frame when i is not equal to 1, and
  • the current video frame data is used as the previous video frame;
  • the first update module is used to update the value of i plus 1;
  • the second conversion module is also used to convert the video frame image of the current i frame from the RGB color space to the gray space when the i updated by the update module is less than N, and use the converted video frame data as the current Video frame data.
  • the simulation processing module further includes: a second pulse sequence acquisition module, wherein,
  • the second pulse sequence acquisition module is used to complete the traversal of N frames of video frame images in the original dynamic video data when the updated i of the update module is not less than N, and obtain the image composed of the first output channel and the second output channel pulse train.
  • the assignment module includes: a first calculation module, and at least one of the first location assignment module and the first location assignment module, wherein,
  • the first calculation module is used to calculate the gray scale difference value between the current video frame and the previous video frame at the pixel for each pixel;
  • the first position assignment module is used to assign a value of 1 to the position corresponding to the first output channel when the gray difference value is greater than the preset threshold;
  • the second position assignment module is configured to assign a value of 1 to the position corresponding to the second output channel when the gray level difference value is smaller than the preset threshold.
  • the spiking neural network includes voting neuron groups
  • the second training module includes: an initialization module, a first propagation module, an error calculation module, a gradient calculation module, a second update module, a judgment module and a model training determination module, wherein,
  • the initialization module is used to initialize the parameter weights of the pre-established spiking neural network emotion recognition model
  • the first propagation module is used to use the dynamic visual data set as the input of the current spiking neural network in the spiking neural network emotion recognition model, through the forward propagation of the current spiking neural network, to obtain the output frequency of the voting neuron group of each emotion category ;
  • An error calculation module is used to calculate the error between the output frequency of the voting neuron group of the emotional category and the true label of the corresponding emotional category for each emotional category;
  • the gradient calculation module is used to calculate the gradient corresponding to the parameter weight according to the error
  • the second update module is used to update the parameter weights of the current spike neural network using the gradient calculated by the gradient calculation module;
  • Judgment module for judging whether the current spiking neural network after the update module updates the parameter weights converges
  • the model training determination module is used for ending the training when the judging module judges that the current impulse neural network after updating the parameter weights has converged, and obtains the trained impulse neural network emotion recognition model.
  • the second training module further includes: a second propagation module, wherein,
  • the second propagation module is used to use the dynamic visual data set as the input of the current pulse neural network in the pulse neural network emotion recognition model through the current pulse neural network when the judging module judges that the current pulse neural network after updating the parameter weight does not converge. Forward propagation, the output frequency of the voting neuron group of each emotion category is obtained.
  • the judging module includes at least one of the following:
  • the first judging module is used to determine whether the current spiking neural network converges by judging whether the current training times of the current spiking neural network after updating the parameter weight reaches a preset number of times;
  • the second judging module is used to determine whether the current spiking neural network converges by judging whether the error decrease degree of the current spiking neural network after updating the parameter weight is stable within a preset range;
  • the third judging module is used to determine whether the current pulse neural network converges by judging whether the error of the current pulse neural network after updating the parameter weight is less than the error threshold;
  • the fourth judging module is used for judging whether the current spiking neural network after updating the parameter weights converges according to the verification set in the dynamic vision data set.
  • the spiking neural network further includes: a feature extraction module and an emotion mapping module, wherein the feature extraction module includes convolution, normalization, Parameterized leaky integrated distribution model PLIF and average pooling composed of a single forward extraction unit and a network unit composed of two layers of full connection and PLIF arranged at intervals; the emotional mapping module is used to output the pulse sequence of the voting neuron group Mapping to final sentiment categories is performed.
  • the feature extraction module includes convolution, normalization, Parameterized leaky integrated distribution model PLIF and average pooling composed of a single forward extraction unit and a network unit composed of two layers of full connection and PLIF arranged at intervals
  • the emotional mapping module is used to output the pulse sequence of the voting neuron group Mapping to final sentiment categories is performed.
  • FIG. 9 is a structural block diagram of an emotion recognition model training device provided by the embodiment of the present application, including: a building module 91 and a training module 92, wherein,
  • Build module 91 be used for setting up the dynamic vision data set based on emotion recognition in advance;
  • the training module 92 is used to use the dynamic visual data set to train the pre-established spiking neural network emotion recognition model to obtain the trained spiking neural network emotion recognition model.
  • FIG. 10 is another block diagram of an emotion recognition model training device provided by the embodiment of the present application, including: an acquisition module 11 and a training module 12, wherein,
  • the training module 12 is used to test and train the pre-established spiking neural network emotion recognition model by using the test set to obtain the trained spiking neural network emotion recognition model.
  • the emotion recognition device provided in the embodiment of the present application has the same beneficial effect as the emotion recognition method provided in the above embodiment, and for the specific introduction of the emotion recognition method involved in the embodiment of the application, please refer to The above embodiments are not described in detail here.
  • an emotion recognition device including:
  • the processor is used to implement the above-mentioned emotion recognition method or the above-mentioned emotion recognition model training method when executing a computer program.
  • the processor in the embodiment of the present application can be specifically used to obtain the pulse sequence to be recognized corresponding to the video information; use the pre-established pulse neural network emotion recognition model to identify the pulse sequence to be recognized, and obtain the corresponding emotion category; , the spiking neural network emotion recognition model is obtained by using the pre-established dynamic visual data set to train the spiking neural network.
  • the embodiment of the present application also provides an electronic device, including:
  • the processor is used for implementing the above emotion recognition method or the above emotion recognition model training method when executing the computer program.
  • the embodiment of the present application also provides a computer non-volatile readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the Such as the above-mentioned emotion recognition method or the above-mentioned emotion recognition model training method.
  • the non-volatile readable storage medium of the computer may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • U disk mobile hard disk
  • read-only memory Read-Only Memory
  • RAM random access memory
  • magnetic disk or optical disk etc.
  • an embodiment of the present application further provides a computer program product, including computer programs or instructions.
  • a computer program product including computer programs or instructions.
  • the above emotion recognition method or the above emotion recognition model training method is implemented.
  • the device embodiments described above are only illustrative, and the unit blocks described as separate components may or may not be physically separated, and the components shown as units may or may not be physical modules, that is, they may be located in a place, or it can be distributed over multiple networks. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative effort.
  • Fig. 11 is a block diagram of an electronic device 1100 provided by an embodiment of the present application.
  • the electronic device 1100 may be a mobile terminal or a server.
  • the electronic device is a mobile terminal as an example for illustration.
  • the electronic device 1100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • electronic device 1100 may include one or more of the following components: processing component 1102, memory 1104, power component 1106, multimedia component 1108, audio component 1110, input/output (I/O) interface 1112, sensor component 1114 , and the communication component 1116.
  • the processing component 1102 generally controls the overall operations of the electronic device 1100, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 1102 may include one or more processors 1120 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 1102 may include one or more modules that facilitate interaction between processing component 1102 and other components. For example, processing component 1102 may include a multimedia module to facilitate interaction between multimedia component 1108 and processing component 1102 .
  • the memory 1104 is configured to store various types of data to support operations at the device 1100 . Examples of such data include instructions for any application or method operating on the electronic device 1100, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 1104 can be implemented by any type of volatile or non-volatile memory device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the power supply component 1106 provides power to various components of the electronic device 1100 .
  • Power components 1106 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 1100 .
  • the multimedia component 1108 includes a screen providing an output interface between the electronic device 1100 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or a swipe action, but also detect duration and pressure associated with the touch or swipe operation.
  • the multimedia component 1108 includes a front camera and/or a rear camera. When the device 1100 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 1110 is configured to output and/or input audio signals.
  • the audio component 1110 includes a microphone (MIC), which is configured to receive an external audio signal when the electronic device 1100 is in an operation mode, such as a call mode, a recording mode and a voice recognition mode. Received audio signals may be further stored in memory 1104 or sent via communication component 1116 .
  • the audio component 1110 also includes a speaker for outputting audio signals.
  • the I/O interface 1112 provides an interface between the processing component 1102 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 1114 includes one or more sensors for providing various aspects of status assessment for electronic device 1100 .
  • the sensor component 1114 can detect the open/closed state of the device 1100, the relative positioning of components, such as the display and the keypad of the electronic device 1100, and the sensor component 1114 can also detect the position of the electronic device 1100 or a component of the electronic device 1100 changes, the presence or absence of user contact with the electronic device 1100, the electronic device 1100 orientation or acceleration/deceleration and the temperature change of the electronic device 1100.
  • Sensor assembly 1114 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • the sensor assembly 1114 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 1114 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 1116 is configured to facilitate wired or wireless communication between the electronic device 1100 and other devices.
  • the electronic device 1100 can access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G or 5G), or a combination thereof.
  • the communication component 1116 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 1116 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • the electronic device 1100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gates An array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components are implemented to execute the above-mentioned data decryption method or data encryption method.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable Gates An array
  • controller a controller
  • microcontroller a microprocessor or other electronic components
  • a computer-readable storage medium such as a memory 1104 including instructions, which can be executed by the processor 1120 of the electronic device 1100 to implement the above-mentioned data decryption method or data encryption method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a computer program product is also provided.
  • the instructions in the computer program product are executed by the processor 1120 of the electronic device 1100 , the electronic device 1100 executes the above-mentioned data decryption method or data encryption method.
  • Fig. 12 is a block diagram of an apparatus 1200 for emotion recognition or emotion recognition model training provided by an embodiment of the present application.
  • the apparatus 1200 may be provided as a server.
  • apparatus 1200 includes processing component 1222 , which further includes one or more processors, and a memory resource represented by memory 1232 for storing instructions executable by processing component 1222 , such as application programs.
  • the application program stored in memory 1232 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1222 is configured to execute instructions to perform the above method.
  • Device 1200 may also include a power component 1226 configured to perform power management of device 1200 , a wired or wireless network interface 1250 configured to connect device 1200 to a network, and an input-output (I/O) interface 1258 .
  • the device 1200 can operate based on an operating system stored in the memory 1232, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for relevant details, please refer to the description of the method part.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种情感识别方法、训练方法、装置、设备、存储介质及产品,该识别方法包括:获取与视频信息对应的待识别脉冲序列(S110);采用脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别(S120)。也就是说,可以通过脉冲神经网络情感识别模型对获取到的待识别脉冲序列进行识别,得到对应的情感类别。实现了对视频信息的情感类别的识别,拓展情感识别的途径,有利于更好的进行情感识别。

Description

情感识别方法、训练方法、装置、设备、存储介质及产品
相关申请的交叉引用
本申请要求于2022年02月09日提交中国专利局,申请号为202210119803.3,申请名称为“一种情感识别方法、装置、系统及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及情感识别技术领域,特别是涉及一种情感识别方法、情感识别模型训练方法、装置、电子设备、计算机非易失性可读存储介质及计算机程序产品。
背景技术
当前,随着云计算,大数据和人工智能的不断发展,包括但不限于人脸识别,步态识别等应用已经被广泛应用到各行各业。人工智能客服对话则是其中另一个重要的商用场景,这种潜在的人机交互的应用对现在的现状提出了很多挑战,其中重要的一点就是人机交互中,如何让机器理解人的情感,也就是情感识别的任务。情感识别作为情感计算领域的一个热门研究课题,受到了来自计算机视觉,自然语言处理和人机交互等领域的许多研究人员的关注,大多数方法采用人工神经网络(ANN,Artificial Neural Network,或ANNs,Artificial Neural Networks)来完成情感识别,然而,情感识别模型的推理需要消耗移动端设备较大的能量,这种高能耗ANNs模式的情感识别方式阻碍了情感识别在嵌入式和移动设备上的应用。
作为第三代神经网络,低功耗的脉冲神经网络(SNN,Spiking Neural Network,或SNNs,Spiking Neural Networks)是实现适用于嵌入式和移动端的情感识别算法的一个潜在解决方案,相比于ANN,SNN中单个神经元的构造和大脑中神经元的结构具有更强的相似性。
目前,现关技术中,通常应用SNNs完成情感识别任务的方法来从语音、跨模态或脑电图中提取情感信息,尚未实现从视频片段中提取情感信息,从而限制了情感识别的途径,因此,如何从视频片段中提取情感信息是本领域技术人员需要解决的问题。
发明内容
本申请实施例的目的是提供一种情感识别方法、情感识别模型训练方法、装置、电子设备、计算机非易失性可读存储介质及计算机程序产品,在使用过程中能够实现基于视频信息识别出情感类别,使情感识别的途径增多,有利于更好的进行情感识别。
为解决上述技术问题,本申请实施例提供了一种情感识别方法,包括:
获取与视频信息对应的待识别脉冲序列;
采用脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别。
可选的,在获取与视频信息对应的待识别脉冲序列之前,该方法还包括:
对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可选的,对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型包括:
获取多种情感类别的测试集;
利用测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
可选的,对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型,包括:
预先建立基于情感识别的动态视觉数据集;
采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可选的,预先建立基于情感识别的动态视觉数据集的过程包括:
获取基于情感识别的原始视觉数据;
采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到与原始视觉数据对应的多个脉冲序列;或者采用动态视觉相机直接获取与原始视觉数据对应的脉冲序列;
基于述属多个脉冲序列建立基于情感识别的动态视觉数据集。
可选的,采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到与原始视觉数据对应的多个脉冲序列的过程,包括:
对原始动态视频数据中的N帧视频帧图像依次进行遍历;其中,N表示原始视觉数据包含的视频帧图像总数量;
在遍历到当前第i帧时,对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据;其中,i的数值范围为1到N;
在i的数值等于1时,将当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道,得到由第一输出通道构成的脉冲序列。
可选的,采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到对应的脉冲序列的过程,还包括:
在i不等于1时,根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输 出通道和第二输出通道分别进行赋值,并将当前视频帧数据作为前一帧视频帧;
对i的数值加1进行更新;
在更新后的i小于N时,执行对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据的步骤。
可选的,采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到对应的脉冲序列的过程,还包括:
在更新后的i不小于N时,完成对原始动态视频数据中的N帧视频帧图像的遍历,得到由第一输出通道和第二输出通道构成的脉冲序列。
可选的,根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,包括:
针对每个像素,计算出当前视频帧与前一帧视频帧在像素处的灰度差异值;
在灰度差异值大于预设阈值时,第一输出通道对应的位置处赋值为1;或者
在灰度差异值小于预设阈值时,第二输出通道对应的位置处赋值为1。
可选的,脉冲神经网络包括表决神经元群;
采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型的过程,包括:
初始化预先建立的脉冲神经网络情感识别模型的参数权重;
将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入,经过当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率;
针对每个情感类别计算情感类别的表决神经元群的输出频率与对应情感类别的真实标签之间的误差;
根据误差计算参数权重对应的梯度,并采用梯度对当前脉冲神经网络的参数权重进行更新;
判断更新参数权重后的当前脉冲神经网络是否收敛;
在判定更新参数权重后的当前脉冲神经网络已收敛时,结束训练,得到训练后的脉冲神经网络情感识别模型。
可选的,采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型的过程,还包括:
在判断更新参数权重后的当前脉冲神经网络未收敛时,返回执行将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入经过当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率的步骤。
可选的,按照下述方式判断更新参数权重后的当前脉冲神经网络是否收敛:
通过判断更新参数权重后的当前脉冲神经网络的当前训练次数是否达到预设次数,来判断当前脉冲神经网络是否收敛;或者
通过判断更新参数权重后的当前脉冲神经网络的误差下降程度是否稳定在预设范围,来判断当前脉冲神经网络是否收敛;或者
通过判断更新参数权重后的当前脉冲神经网络的误差是否小于误差阈值来判断当前脉冲神经网络是否收敛;或者
根据动态视觉数据集中的验证集判断更新参数权重后的当前脉冲神经网络的是否收敛。
可选的,采用动态视觉传感器模拟方法对原始视觉数据进行处理,得到与原始视觉数据对应的脉冲序列的过程,包括:
从原始视觉数据的第一帧视频帧图像开始遍历,将第i帧视频帧图像从RGB的颜色空间转换至灰度空间,得到转换后的当前视频帧数据;
判断i是否等于1;
若等于1,则将当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道,并将当前视频帧数据作为前一帧视频帧;
若不等于1,则根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,并将当前视频帧数据作为前一帧视频帧;
将i的数值加1,并判断更新后的i是否小于N;
若小于N,则返回执行将第i帧视频帧图像从RGB的颜色空间转换至灰度空间的步骤;
若不小于N,则结束操作,得到由第一输出通道和第二输出通道构成的脉冲序列;其中,N表示原始视觉数据包含的视频帧图像总数量。
可选的,根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,包括:
针对每个像素,计算出当前视频帧与前一帧视频帧在像素处的灰度差异值;
将灰度差异值与预设阈值进行比较,当灰度差异值大于预设阈值时,第一输出通道对应的位置处赋值为1;当灰度差异值小于预设阈值时,第二输出通道对应的位置处赋值为1。
可选的,脉冲神经网络包括特征提取模块、表决神经元群模块以及情感映射模块;
采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型的过程包括:
初始化预先建立的脉冲神经网络的参数权重;
将动态视觉数据集作为当前脉冲神经网络的输入,网络前向传播得到各情感类别的表决 神经元群的输出频率;
针对每个情感类别计算输出频率与对应情感类别的真实标签之间的误差;
根据误差计算参数权重对应的梯度,并采用梯度对当前脉冲神经网络的参数权重进行更新;
判断更新参数权重后的当前脉冲神经网络是否收敛,若是,则结束训练,得到训练后的脉冲神经网络情感识别模型;若否,则返回执行将动态视觉数据集作为当前脉冲神经网络的输入,网络前向传播得到各情感类别的表决神经元群的输出频率的步骤,以进行下一轮训练。
可选的,脉冲神经网络还包括:特征提取模块,其中,特征提取模块包括由卷积、归一化、参数化带泄露整合发放模型PLIF和平均池化构成的单次前向提取单元以及由间隔排列的两层全连接和PLIF构成的网络单元。
本申请实施例还提供了一种情感识别模型训练方法,包括:
预先建立基于情感识别的动态视觉数据集,利用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型;或者
获取多种情感类别的测试集,利用测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
本申请实施例还提供了一种情感识别装置,包括:
获取模块,用于获取与视频信息对应的待识别脉冲序列;
识别模块,用于采用预先建立的脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别。
可选的,该装置还包括:
训练模块,用于对所述脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可选的,训练模块包括:
测试集获取模块,用于在第一建立模块预先建立脉冲神经网络情感识别模型之后,获取多种情感类别的测试集;
第一训练模块,用于利用测试集获取模块获取的测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
可选的,训练模块包括:
建立模块,用于预先建立基于情感识别的动态视觉数据集;
第二训练模块,用于采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可选的,建立模块包括:脉冲序列获取模块,数据集建立模块,以及模拟仿真处理模块和脉冲序列获取模块的至少一个,其中,
数据获取模块,用于获取基于情感识别的原始视觉数据;
模拟仿真处理模块,用于利用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到与原始视觉数据对应的多个脉冲序列;
第一脉冲序列获取模块,用于采用动态视觉相机直接获取与原始视觉数据对应的多个脉冲序列;
数据集建立模块,用于基于模拟仿真处理模块或脉冲序列获取模块得到的多个脉冲序列建立基于情感识别的动态视觉数据集。
可选的,模拟仿真处理模块包括:
遍历模块,用于对原始动态视频数据中的N帧视频帧图像依次进行遍历;其中,N表示原始视觉数据包含的视频帧图像总数量;
第一转换模块,用于在遍历模块遍历到当前第i帧时,对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据;其中,i的数值范围为1到N;
第一赋值模块,用于在i的数值等于1时,将当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道,得到由第一输出通道构成的脉冲序列。
可选的,模拟仿真处理模块还包括:
第二赋值模块,用于在i不等于1时,根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,并将当前视频帧数据作为前一帧视频帧;
第一更新模块,用于对i的数值加1进行更新;
第二转换模块,还用于在更新模块更新后的i小于N时,执行对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据。
可选的,模拟仿真处理模块还包括:
第二脉冲序列获取模块,用于在更新模块更新后的i不小于N时,完成对原始动态视频数据中的N帧视频帧图像的遍历,得到由第一输出通道和第二输出通道构成的脉冲序列。
可选的,赋值模块包括:第一计算模块,以及第一位置赋值模块和第一位置赋值模块的至少一个,其中,
第一计算模块,用于针对每个像素,计算出当前视频帧与前一帧视频帧在像素处的灰度差异值;
第一位置赋值模块,用于在灰度差异值大于预设阈值时,第一输出通道对应的位置处赋 值为1;
第二位置赋值模块,用于在灰度差异值小于预设阈值时,第二输出通道对应的位置处赋值为1。
可选的,脉冲神经网络包括表决神经元群;
第二训练模块包括:
初始化模块,用于初始化预先建立的脉冲神经网络情感识别模型的参数权重;
第一传播模块,用于将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入,经过当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率;
误差计算模块,用于针对每个情感类别计算情感类别的表决神经元群的输出频率与对应情感类别的真实标签之间的误差;
梯度计算模块,用于根据误差计算参数权重对应的梯度;
第二更新模块,用于采用梯度计算模块计算的梯度对当前脉冲神经网络的参数权重进行更新;
判断模块,用于判断更新模块更新参数权重后的当前脉冲神经网络是否收敛;
模型训练确定模块,用于用于在判断模块判定更新参数权重后的当前脉冲神经网络已收敛时,结束训练,得到训练后的脉冲神经网络情感识别模型。
可选的,第二训练模块还包括:
第二传播模块,用于在判断模块判断更新参数权重后的当前脉冲神经网络未收敛时,将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入经过当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率。
可选的,判断模块至少包括下述一个:
第一判断模块,用于通过判断更新参数权重后的当前脉冲神经网络的当前训练次数是否达到预设次数,来判断当前脉冲神经网络是否收敛;
第二判断模块,用于通过判断更新参数权重后的当前脉冲神经网络的误差下降程度是否稳定在预设范围,来判断当前脉冲神经网络是否收敛;
第三判断模块,用于通过判断更新参数权重后的当前脉冲神经网络的误差是否小于误差阈值来判断当前脉冲神经网络是否收敛;
第四判断模块,用于根据动态视觉数据集中的验证集判断更新参数权重后的当前脉冲神经网络的是否收敛。
可选的,脉冲神经网络还包括:特征提取模块,其中,特征提取模块包括由卷积、归一化、参数化带泄露整合发放模型PLIF和平均池化构成的单次前向提取单元以及由间隔排列 的两层全连接和PLIF构成的网络单元。
本申请实施例还提供一种情感识别模型训练装置,其中,包括:
建立模块,用于预先建立基于情感识别的动态视觉数据集;
训练模块,用于利用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
本申请实施例还提供一种情感识别模型训练装置,其中,包括:
获取模块,用于获取多种情感类别的测试集;
训练模块,用于利用测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
本申请实施例还提供了一种情感识别装置,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上述的情感识别方法或如上的情感识别模型训练方法。
本申请实施例还提供了一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上述的情感识别方法或如上的情感识别模型训练方法。
本申请实施例还提供了一种计算机非易失性可读存储介质,计算机非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述的情感识别方法或如上的情感识别模型训练方法。
本申请还提供了一种计算机程序产品,包括计算机程序或指令,计算机程序或指令被处理器执行时实现如上的卷积特征缓存方法或者如上的情感识别方法或如上的情感识别模型训练方法。
本申请实施例提供的技术方案至少带来以下有益效果:
本申请实施例中,先获取与视频信息对应的待识别脉冲序列;之后,采用脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别。也就是说,本申请实施例中,可以通过脉冲神经网络情感识别模型对获取到的待识别脉冲序列进行识别,得到对应的情感类别。即本申请可以基于视频信息识别出情感类别,增加了情感识别的途径,有利于更好的进行情感识别。进一步,本申请实施例中还可以通过预先建立动态视觉数据集对脉冲神经网络进行训练,得到脉冲神经网络情感识别模型,然后获取与视频信息对应的待识别脉冲序列,将该待识别脉冲序列输入至脉冲神经网络情感识别模型中,通过脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别。也就是说,本申请实施例中,可 以通过预先建立动态视觉数据集对脉冲神经网络进行训练,并利用训练后得到的脉冲神经网络情感识别模型对与视频信息对应的待识别脉冲序列进行识别,得到对应的情感类别,使情感识别的途径增多,有利于更好的对视频信息的情感识别。
附图说明
为了更清楚地说明本申请实施例或相关技术中的技术方案,下面将对本申请实施例或相关技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种情感识别方法的流程示意图;
图2为本申请实施例提供的一种将原始动态视觉数转换为脉冲序列方法的流程示意图;
图3为本申请实施例提供的一种脉冲神经网络的结构示意图;
图4为本申请实施例提供的一种建立脉冲神经网络情感识别模型方法的流程示意图;
图5为本申请实施例提供的一种情感识别模型训练方法的流程图;
图6为本申请实施例提供的一种情感识别模型训练方法的另一流程图;
图7为本申请实施例提供的一种情感识别装置的结构示意图;
图8为本申请实施例提供的一种情感识别装置的另一结构示意图;
图9为本申请实施例提供的一种情感识别模型训练装置的结构框图;
图10为本申请实施例提供的一种情感识别模型训练方法的另一结构框图;
图11是本申请实施例提供的一种电子设备的框图;
图12是本申请实施例提供的一种用于情感识别或情感识别模型训练的装置的框图。
具体实施方式
本申请实施例提供了一种情感识别方法、情感识别模型训练方法、装置、计算机非易失性可读存储介质及计算机程序产品,在使用过程中能够实现基于视频信息识别出情感类别,使情感识别的途径增多,有利于更好的进行情感识别。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参照图1,图1为本申请实施例提供的一种情感识别方法的流程示意图。该方法,包括:
S110:获取与视频信息对应的待识别脉冲序列;
该实施例中,获取与视频信息对应的待识别脉冲序列,可以采用动态视觉相机直接获 取与视频信息对应的待识别脉冲序列,也可以采用视频信息的仿真数据。需要说明的是,由于动态视觉相机成本较高,本申请实施例中为了降低成本,可以采用先获取视频信息,然后对视频信息进行仿真模拟,得到对应的待识别脉冲序列。
S120:采用脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别。其中,脉冲神经网络情感识别模型为采用预先建立的动态视觉数据集对脉冲神经网络进行训练得到的。
该步骤中,对待识别脉冲序列进行识别的具体过程包括:将该待识别脉冲序列输入至脉冲神经网络情感识别模型中,通过该脉冲神经网络情感识别模型对该待识别脉冲序列进行识别,得到对应的情感类别。
该步骤中的脉冲神经网络情感识别模型是预先建立的,可选的,在建立脉冲神经网络情感识别模型后,且对待识别脉冲序列进行识别之前,先对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。具体的训练过程包括:
一种训练方式:获取多种情感类别的测试集;利用测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
另一种训练方式:预先建立基于情感识别的动态视觉数据集;采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可以理解的是,本申请实施例中预先建立基于情感识别的动态视觉数据集以及脉冲神经网络,然后采用该动态视觉数据集对脉冲神经网络进行训练,从而得到训练后的脉冲网络情感识别模型。其中,上述预先建立动态视觉数据集的过程,具体可以包括:
获取基于情感识别的原始视觉数据;
采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到与原始视觉数据对应的多个脉冲序列;
基于多个脉冲序列建立基于情感识别的动态视觉数据集。
需要说明的是,在实际应用中可以采用动态视觉相机直接获取与视频信息对应的待识别脉冲序列,但是动态视觉相机成本较高。本申请实施例中为了进一步降低成本,可以先通过普通的视频采集设备采集基于情感识别的原始视觉数据,然后采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真,得到与原始视觉数据对应的脉冲数据,实现了将原始视觉数据转换为脉冲数据,节约了设备成本。可以理解的是,一个原始视觉数据对应的脉冲序列实际上是整个原始视觉数据中每个视频画面的每个像素位置的脉冲序列构成的脉冲序列阵列,本申请实施例中将该脉冲序列阵列简称为与原始视觉数据对应的脉冲序列,并且在实际应用中通过对多个原始视觉数据均采用上述动态视觉传感器模拟方法进行模拟仿真处理,得 到多个脉冲序列,基于多个脉冲序列建立基于情感识别的动态视觉数据集。
更进一步的,请参照图2,上述采用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到与原始视觉数据对应的多个脉冲序列的过程,具体可以包括:
S200:对原始动态视频数据中的N帧视频帧图像依次进行遍历;在遍历到当前第i帧时,对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据,其中,N表示原始视觉数据包含的视频帧图像总数量,i的数值范围为1到N。即从原始视觉数据的第一帧视频帧图像开始遍历,将第i帧视频帧图像从RGB的颜色空间转换至灰度空间,得到转换后的当前视频帧数据;其中,RGB是图象颜色的三基色,即红绿蓝,英文表示:red,green,blue。
S210:判断i是否等于1;若等于1,则进入S220;若不等于1,则进入S230;
S220:将当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道,得到由第一输出通道构成的脉冲序列,并将当前视频帧数据作为前一帧视频帧;
S230:根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,并将当前视频帧数据作为前一帧视频帧;
S240:将i的数值加1,并判断更新后的i是否小于N;若小于N,则返回执行将第i帧视频帧图像从RGB的颜色空间转换至灰度空间的步骤,即S200;若不小于N,则进入S250;
S250:完成对原始动态视频数据中的N帧视频帧图像的遍历,结束操作,得到由第一输出通道和第二输出通道构成的脉冲序列。
需要说明的是,动态视觉的特点就是相机捕捉的不再是整个场景中的所有信息,尤其是在场景变化不大的情况下,这些可以极大的减少数据的记录量和传输量,本申请实施例中通过将视频数据中相邻画面帧之间灰度信息进行差分,并根据预设阈值对差分结果进行判定,以便确定是否需要记录数据来完成符合动态视觉特性的模拟。
动态视觉数据的记录特点是只记录变化,用形式化的符号描述来定义,一般用E[x i,y i,t i,p i]来表示,其中E代表事件,事件只有两种属性,发生和不发生,(x i,y i)代表的是场景中事件发生的位置,t i代表发生事件的时间,p i代表事件发生的极性,例如对于事件记录的场景中的光强的变化情况,光强的变化有两个方向,有强转弱或者由弱转强,这两个变化都表示事件的发生,为了区别这两种事件,所以定义了极性这个维度。本申请实施例提供的方法就是要通过计算机模拟的方式来产生形式上类似的动态视觉数据,一个场景的连续记录这里使用视频数据来表示,因为本系统面向的任务是情感识别,所以这里使用的数据是情感识别的原始视觉数据,假设一段原始视觉数据总共包含N帧视频帧图像,这些视频帧图像就是动态视觉传感器模拟方法的输入,具体可以按照以下模拟步骤进行计算,产生模拟 的动态视觉数据:
在实际应用中可以定义全零数值的模拟视觉数据表示:E[x i,y i,t i,p i],其中,i的数值范围为1到N,E的大小为H×W×N×2,其中,H和W分别为视频帧图像的高和宽;初始化中间变量记录前一帧的数据,标记为F pre,定义帧间的敏感度(也即预设阈值)为Sens,具体为两帧之间差异超过敏感度的时候模拟事件发生。
具体的,本申请实施例中在将原始动态视频数据转换为脉冲序列的过程中,可以从第一帧视频帧图像开始对整个原始动态视频数据中的N帧视频帧图像进行遍历。例如,对于当前的第i帧视频帧图像,将该视频帧图像从RGB的颜色空间转换至灰度空间,用V gray表示,并将转换后的视频帧数据作为当前视频帧数据,然后对i的大小进行判断。
具体的,当i等于1时,也即针对与第一帧视频帧图像对应的当前视频帧数据,可以将当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道(可以通过代码E[:,:,i,0]←V gray实现),并将当前视频帧数据作为前一帧视频帧(可以通过代码F pre←V gray实现)。
当i不等于1时,则根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,并将当前视频帧数据作为前一帧视频帧,并执行S240将i的数值加1的步骤。该过程可以通过以下方法实现:
针对每个像素,计算出当前视频帧与前一帧视频帧在像素处的灰度差异值;
将灰度差异值与预设阈值进行比较,当灰度差异值大于预设阈值时,第一输出通道对应的位置处赋值为1;当灰度差异值小于预设阈值时,第二输出通道对应的位置处赋值为1。
具体的,本申请实施例中针当前的视频帧图像中的每个像素,均计算出当前视频帧与前一帧视频帧在该像素处的灰度差异值,然后将该灰度差异值与预设阈值进行比较,根据比较结果对两类不同的事件进行赋值,具体的在当灰度差异值大于预设阈值时,第一输出通道对应的位置处赋值为1,可以通过代码E[:,:,i,0]←int(V gray-F pre>Sens)实现,当灰度差异值小于预设阈值时,第二输出通道对应的位置处赋值为1,可以通过代码E[:,:,i,1]←int(V gray-F pre>Sens)实现。
另外,本申请实施例中在将i的数值加1后,判断更新后的i是否小于N,当小于N时,返回执行将第i帧视频帧图像从RGB的颜色空间转换至灰度空间的步骤以便继续对下一视频帧图像进行处理,当不小于N时,结束操作,说明N个视频帧图像均处理完毕,从而得到由第一输出通道和第二输出通道构成的脉冲序列。
还需要说明的是,脉冲神经网络由于采用脉冲的方式传递信息,脉冲传递过程本身是 不可求导的,导致无法使用梯度反向传播的方式进行突触权重更新,并且在优化过程中为了避免手动设置一些超参数(例如,神经元的膜时间常数τ),近来领域内技术人员提出一种可以将神经元的膜时间常数τ融入到整个模型突触权重的联合更新当中,这种模型称之为PLIF(Parametric Leaky-Integrate and Fire model,参数化带泄漏整合发放模型)。联合优化要比手动设置更加方面,能够优化得到更好的突出权重,本申请实施例采用了PLIF作为SNN中的层来进行情感识别SNN模型的构建,具体如下:
请参照图3,上述脉冲神经网络包括特征提取模块、表决神经元群模块以及情感映射模块;
需要说明的是,图3中原始视频帧通过动态视觉仿真算法(也即动态视觉传感器模拟方法)后得到脉冲序列,该脉冲序列作为脉冲神经网络的输入,脉冲神经网络中的特征提取模块的作用是从输入的脉冲序列当中进行特征提取,得到表达性更强的脉冲特征;表决神经元群模块的作用是模拟大脑中群组神经元的工作特征,用多个神经元代表一种决策倾向;情感映射模块基于神经元群发射脉冲的频率来决定最终情感分类的映射结果。
具体的,本申请实施例中的特征提取模块,模拟大脑神经元对信息处理的方式,抽象出了卷积操作和池化操作,并且本申请实施例中在信息传递的时候使用了脉冲神经元模型PLIF。其中,特征提取模块包括由卷积、归一化、参数化带泄露整合发放模型PLIF和平均池化构成的单次前向提取单元以及由间隔排列的两层全连接和PLIF构成的网络单元。具体地,单次前向特征提取的操作包括:卷积核为3×3的卷积操作(如图3中的Conv 3x3),归一化操作(如图3中的BatchNorm),PLIF(如图3中的PLIFNode),平均池化(如图3中的AvgPool),该计算过程可以重复多次(例如3次),会将输入脉冲进行一定程度的压缩,减少了脉冲特征的数量,提高了脉冲特征的判别性,其中,平均池化的窗口大小可以为2×2。特别地,本申请实施例为了进一步减少脉冲特征的数量,特征提取模块中还使用了两层全连接的方式进行进一步的特征有效压缩,因为传统全连接层的输出是浮点数,这里代表膜电势,因此需要加入PLIF层来将浮点数转化为脉冲的传递形式,也即,采用隔排列的两层全连接和PLIF,具体顺序为全连接层1、PLIF1、全连接层2、PLIF2,其中,全连接层1和PLIF1中包含的神经元个数可以灵活设置,但两者数量必须一致,例如设置为1000;全连接层2和PLIF2中包含的神经元数量需要根据具体的输出情感类别数目设置,例如为二分类,可以设置为20,具体数值均可以根据实际需要进行确定,本申请实施例对此不做特殊限定。
表决神经元群模块,大脑中神经元的决策是基于多个神经元协同工作的,所以本申请实施例中针对最终情感识别类别数,使用多个神经元组成一个群来对某个情感类别进行识别。具体地,可以使用十个神经元组成一个类别对应的群,本申请实施例中用两个情感类别 的识别例子展开解释,也就是说使用十个神经元协同决定最终是否为该群神经元对应的情感类别,总共数量为情感类别数量乘以十,该表决神经元群模块的输出是脉冲序列。
情感映射模块,情感映射模块需要对表决神经元群模块输出的脉冲序列进行映射到最终的情感类别。具体的,每一个神经元发射的脉冲序列可以对应一个频率,该频率可以作为神经元的输出映射之一,随后将所有当前类别的神经元群内的神经元的频率做平均,这样每一类神经元群都有对应一个最终频率,该频率越大,表示对应的情感类别被激活,将频率最大的神经元群对应的情感类别输出。
请参照图4,下面对采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型的过程进行详细介绍,该过程可以包括:
S310:初始化预先建立的脉冲神经网络的参数权重,具体的,初始化预先建立的脉冲神经网络情感识别模型的参数权重;
需要说明的是,在实际应用中可以将动态视觉数据集分为三部分,分别为训练集、验证集和测试集,并预先搭建好脉冲神经网络情感识别模型,该脉冲神经网络情感识别模型具体如上述,本申请实施例不再赘述。具体的,先初始化脉冲神经网络情感识别模型的参数权重。
S320:将动态视觉数据集作为当前脉冲神经网络的输入,网络前向传播,得到各情感类别的表决神经元群的输出频率,具体的,将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入,经过当前脉冲神经网络前向传播,得到各情感类别的表决神经元群的输出频率;
该步骤中,在每一轮训练过程中均基于当前的参数权重确定当前脉冲神经网络,并将动态视觉数据集中的训练集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入,然后经过当前脉冲神经网络的前向传播,得到各个情感类别的表决神经元群的输出频率,针对一个表决神经元群,可以通过计算该表决神经元群中的各个表决神经元的输出频率的平均值,得到表决神经元群的输出频率。
S330:针对每个情感类别计算输出频率与对应情感类别的真实标签之间的误差;具体的,针对每个情感类别计算情感类别的表决神经元群的输出频率与对应情感类别的真实标签之间的误差;
该步骤中,由于每个表决神经元群各自对应一个情感类别,因此可以根据表决神经元群的输出频率及对应的情感类别的真实标签,计算出误差,本申请实施例中具体可以计算出均方误差MSE。
S340:根据误差计算参数权重对应的梯度,并采用梯度对当前脉冲神经网络的参数权重进行更新;
具体的,可以根据与每个表决神经元群各自对应的误差,计算得到最终的平均误差,然后根据该平均误差计算参数权重对应的梯度,然后采用梯度对脉冲神经网络情感识别模型中的当前脉冲神经网络的参数权重进行更新。
需要说明的是,在实际应用中可以采用随机梯度下降(Stochastic Gradient Descent,SGD)算法,同时也可以选用其它基于梯度下降的参数优化方法对参数权重进行更新,具体包括但不限于RMSprob(Root Mean Square propagation),Adagrad(Adaptive Subgradient),Adam(Adaptive Moment Estimation),Adamax(Adam基于无穷范数的变种),ASGD(Averaged Stochastic Gradient Descent),RMSprob等方法,具体采用哪种方法可以根据实际情况进行确定,本申请实施例对此不做特殊限定。
S350:判断更新参数权重后的当前脉冲神经网络是否收敛,若是,则进入S360;若否,则返回执行S320,以进行下一轮训练,直到得到训练后的脉冲神经网络情感识别模型;
具体的,在对参数权重更新后,基于更新后的参数权重确定出脉冲神经网络情感识别模型中的当前脉冲神经网络,然后可以进一步根据动态视觉数据集中的验证集对当前脉冲神经网络的收敛性进行判断,在当前脉冲神经网络收敛时,进入S360结束操作,并得到基于最新的参数权重的脉冲神经网络情感识别模型,还可以通过获取的多种情感类别的测试集对该脉冲神经网络情感识别模型进行测试训练,输出对应的情感类别,即得到训练后的脉冲神经网络情感识别模型。在当前脉冲神经网络不收敛时,则可以返回S320中重新采用训练集对更新后的当前脉冲神经网络进行下一轮训练,以对参数权重进行再次更新,直至更新后的当前脉冲神经网络收敛为止。
S360:结束训练,得到训练后的脉冲神经网络情感识别模型,具体的,在判定更新参数权重后的当前脉冲神经网络已收敛时,结束训练,得到训练后的脉冲神经网络情感识别模型。
需要说明的是,在实际应用中判断当前脉冲神经网络是否收敛的方法可以有多种,例如,判断当前训练次数是否达到预设次数,若达到,则收敛,若没有达到,则不收敛。还可以判断基于当前脉冲神经网络的误差下降程度是否稳定在预设范围,若是,则收敛,若否,则不收敛。还可以进一步通过判断基于当前脉冲神经网络的误差是否小于误差阈值来判断是否收敛,当小于时收敛,不小于则不收敛。
可见,本申请实施例中通过预先建立动态视觉数据集对脉冲神经网络进行训练得到脉冲神经网络情感识别模型,然后获取与视频信息对应的待识别脉冲序列,将该待识别脉冲序列输入至脉冲神经网络情感识别模型中,通过脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别;本申请在使用过程中能够实现基于视频信息识别出情感类 别,使情感识别的途径增多,有利于更好的进行情感识别。
可选的,还请参阅图5,为本申请实施例提供的一种情感识别模型训练方法的流程图,包括:
S501:预先建立基于情感识别的动态视觉数据集;
S502:利用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
需要说明的是,该实施例中的各个步骤的具体实现过程,详见上述实施例中对应步骤的实现过程,在此不再赘述。
可选的,还请参阅图6,为本申请实施例提供的一种情感识别模型训练方法的另一流程图,包括:
S601:获取多种情感类别的测试集;
S602:利用测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
需要说明的是,该实施例中的各个步骤的具体实现过程,详见上述实施例中对应步骤的实现过程,在此不再赘述。
在上述实施例的基础上,本申请实施例还提供了一种情感识别装置,具体请参照图7。该装置包括:
获取模块21,用于获取与视频信息对应的待识别脉冲序列;
识别模块22,用于采用脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别。
可选的,在另一实施例中,该实施例在上述实施例的基础上,装置还包括:训练模块801,其结构示意图如图8所示,其中,
训练模块81,用于对脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可选的,在另一实施例中,该实施例在上述实施例的基础上,训练模块包括:测试集获取模块和第一训练模块,其中,
测试集获取模块,用于在第一建立模块预先建立脉冲神经网络情感识别模型之后,获取多种情感类别的测试集;
第一训练模块,用于利用测试集获取模块获取的测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
可选的,在另一实施例中,该实施例在上述实施例的基础上,训练模块包括:建立模块和第二训练模块,其中,
建立模块,用于预先建立基于情感识别的动态视觉数据集;
第二训练模块,用于采用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
可选的,在另一实施例中,该实施例在上述实施例的基础上,建立模块包括:脉冲序列获取模块,数据集建立模块,以及模拟仿真处理模块和脉冲序列获取模块的至少一个,其中,
数据获取模块,用于获取基于情感识别的原始视觉数据;
模拟仿真处理模块,用于利用动态视觉传感器模拟方法对原始视觉数据进行模拟仿真处理,得到与原始视觉数据对应的多个脉冲序列;
第一脉冲序列获取模块,用于采用动态视觉相机直接获取与原始视觉数据对应的多个脉冲序列;
数据集建立模块,用于基于模拟仿真处理模块或脉冲序列获取模块得到的多个脉冲序列建立基于情感识别的动态视觉数据集。
可选的,在另一实施例中,该实施例在上述实施例的基础上,模拟仿真处理模块包括:遍历模块,第一转换模块和第一赋值模块,其中,
遍历模块,用于对原始动态视频数据中的N帧视频帧图像依次进行遍历;其中,N表示原始视觉数据包含的视频帧图像总数量;
第一转换模块,用于在遍历模块遍历到当前第i帧时,对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据;其中,i的数值范围为1到N;
第一赋值模块,用于在i的数值等于1时,将当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道,得到由第一输出通道构成的脉冲序列。
可选的,在另一实施例中,该实施例在上述实施例的基础上,模拟仿真处理模块还包括:第二赋值模块,更新模块和第二转换模块,其中,
第二赋值模块,用于在i不等于1时,根据当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,并将当前视频帧数据作为前一帧视频帧;
第一更新模块,用于对i的数值加1进行更新;
第二转换模块,还用于在更新模块更新后的i小于N时,执行对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据。
可选的,在另一实施例中,该实施例在上述实施例的基础上,模拟仿真处理模块还包括:第二脉冲序列获取模块,其中,
第二脉冲序列获取模块,用于在更新模块更新后的i不小于N时,完成对原始动态视频数据中的N帧视频帧图像的遍历,得到由第一输出通道和第二输出通道构成的脉冲序列。
可选的,在另一实施例中,该实施例在上述实施例的基础上,赋值模块包括:第一计算模块,以及第一位置赋值模块和第一位置赋值模块的至少一个,其中,
第一计算模块,用于针对每个像素,计算出当前视频帧与前一帧视频帧在像素处的灰度差异值;
第一位置赋值模块,用于在灰度差异值大于预设阈值时,第一输出通道对应的位置处赋值为1;
第二位置赋值模块,用于在灰度差异值小于预设阈值时,第二输出通道对应的位置处赋值为1。
可选的,在另一实施例中,该实施例在上述实施例的基础上,脉冲神经网络包括表决神经元群;
第二训练模块包括:初始化模块,第一传播模块,误差计算模块,梯度计算模块,第二更新模块,判断模块和模型训练确定模块,其中,
初始化模块,用于初始化预先建立的脉冲神经网络情感识别模型的参数权重;
第一传播模块,用于将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入,经过当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率;
误差计算模块,用于针对每个情感类别计算情感类别的表决神经元群的输出频率与对应情感类别的真实标签之间的误差;
梯度计算模块,用于根据误差计算参数权重对应的梯度;
第二更新模块,用于采用梯度计算模块计算的梯度对当前脉冲神经网络的参数权重进行更新;
判断模块,用于判断更新模块更新参数权重后的当前脉冲神经网络是否收敛;
模型训练确定模块,用于用于在判断模块判定更新参数权重后的当前脉冲神经网络已收敛时,结束训练,得到训练后的脉冲神经网络情感识别模型。
可选的,在另一实施例中,该实施例在上述实施例的基础上,第二训练模块还包括:第二传播模块,其中,
第二传播模块,用于在判断模块判断更新参数权重后的当前脉冲神经网络未收敛时,将动态视觉数据集作为脉冲神经网络情感识别模型中的当前脉冲神经网络的输入经过当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率。
可选的,在另一实施例中,该实施例在上述实施例的基础上,判断模块至少包括下述一 个:
第一判断模块,用于通过判断更新参数权重后的当前脉冲神经网络的当前训练次数是否达到预设次数,来判断当前脉冲神经网络是否收敛;
第二判断模块,用于通过判断更新参数权重后的当前脉冲神经网络的误差下降程度是否稳定在预设范围,来判断当前脉冲神经网络是否收敛;
第三判断模块,用于通过判断更新参数权重后的当前脉冲神经网络的误差是否小于误差阈值来判断当前脉冲神经网络是否收敛;
第四判断模块,用于根据动态视觉数据集中的验证集判断更新参数权重后的当前脉冲神经网络的是否收敛。
可选的,在另一实施例中,该实施例在上述实施例的基础上,脉冲神经网络还包括:特征提取模块和情感映射模块,其中,特征提取模块包括由卷积、归一化、参数化带泄露整合发放模型PLIF和平均池化构成的单次前向提取单元以及由间隔排列的两层全连接和PLIF构成的网络单元;情感映射模块用来对表决神经元群输出的脉冲序列进行映射到最终的情感类别。
还请参阅图9,为本申请实施例提供的一种情感识别模型训练装置的结构框图,包括:建立模块91和训练模块92,其中,
建立模块91,用于预先建立基于情感识别的动态视觉数据集;
训练模块92,用于利用动态视觉数据集对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
还请参阅图10,为本申请实施例提供的一种情感识别模型训练装置的另一框图,包括:获取模块11和训练模块12,其中,
获取模块11,用于获取多种情感类别的测试集;
训练模块12,用于利用测试集对预先建立的脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在上述对应的方法实施例中进行了详细描述,此处将不做详细阐述说明。
需要说明的是,本申请实施例中提供的情感识别装置具有与上述实施例中所提供的情感识别方法相同的有益效果,并且对于本申请实施例所涉及到的情感识别方法的具体介绍请参照上述实施例,本申请在此不再赘述。
在上述实施例的基础上,本申请实施例还提供了一种情感识别装置,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上述情感识别方法或如上的情感识别模型训练方 法。
例如,本申请实施例中的处理器具体可以用于实现获取与视频信息对应的待识别脉冲序列;采用预先建立的脉冲神经网络情感识别模型对待识别脉冲序列进行识别,得到对应的情感类别;其中,脉冲神经网络情感识别模型为采用预先建立的动态视觉数据集对脉冲神经网络进行训练得到的。
本申请实施例还提供一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序时实现如上的情感识别方法或如上的情感识别模型训练方法。
在上述实施例的基础上,本申请实施例还提供了一种计算机非易失性可读存储介质,计算机非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述情感识别方法或如上的情感识别模型训练方法。
该计算机非易失性可读存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
可选的,本申请实施例还提供一种计算机程序产品,包括计算机程序或指令,计算机程序或指令被处理器执行时实现如上项情感识别方法或如上的情感识别模型训练方法。
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元块可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
图11是本申请实施例提供的一种电子设备1100的框图。例如,电子设备1100可以为移动终端也可以为服务器,本申请实施例中以电子设备为移动终端为例进行说明。例如,电子设备1100可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图11,电子设备1100可以包括以下一个或多个组件:处理组件1102,存储器1104,电力组件1106,多媒体组件1108,音频组件1110,输入/输出(I/O)的接口1112,传感器组件1114,以及通信组件1116。
处理组件1102通常控制电子设备1100的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件1102可以包括一个或多个处理器1120来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件1102可以包括一个或多 个模块,便于处理组件1102和其他组件之间的交互。例如,处理组件1102可以包括多媒体模块,以方便多媒体组件1108和处理组件1102之间的交互。
存储器1104被配置为存储各种类型的数据以支持在设备1100的操作。这些数据的示例包括用于在电子设备1100上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器1104可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件1106为电子设备1100的各种组件提供电力。电源组件1106可以包括电源管理系统,一个或多个电源,及其他与为电子设备1100生成、管理和分配电力相关联的组件。
多媒体组件1108包括在电子设备1100和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件1108包括一个前置摄像头和/或后置摄像头。当设备1100处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件1110被配置为输出和/或输入音频信号。例如,音频组件1110包括一个麦克风(MIC),当电子设备1100处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器1104或经由通信组件1116发送。在一些实施例中,音频组件1110还包括一个扬声器,用于输出音频信号。
I/O接口1112为处理组件1102和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件1114包括一个或多个传感器,用于为电子设备1100提供各个方面的状态评估。例如,传感器组件1114可以检测到设备1100的打开/关闭状态,组件的相对定位,例如组件为电子设备1100的显示器和小键盘,传感器组件1114还可以检测电子设备1100或电子设备1100一个组件的位置改变,用户与电子设备1100接触的存在或不存在,电子设备1100方位或加速/减速和电子设备1100的温度变化。传感器组件1114可以包括接近传感 器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件1114还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件1114还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件1116被配置为便于电子设备1100和其他设备之间有线或无线方式的通信。电子设备1100可以接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、4G或5G),或它们的组合。在一个示例性实施例中,通信组件1116经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件1116还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在实施例中,电子设备1100可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述所示的数据解密方法或者数据加密方法。
在实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器1104,上述指令可由电子设备1100的处理器1120执行以完成上述所示的数据解密方法或者数据加密方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
在实施例中,还提供了一种计算机程序产品,当计算机程序产品中的指令由电子设备1100的处理器1120执行时,使得电子设备1100执行上述所示的数据解密方法或者数据加密方法方法。
图12是本申请实施例提供的一种用于情感识别或情感识别模型训练的装置1200的框图。例如,装置1200可以被提供为一服务器。参照图12,装置1200包括处理组件1222,其进一步包括一个或多个处理器,以及由存储器1232所代表的存储器资源,用于存储可由处理组件1222的执行的指令,例如应用程序。存储器1232中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1222被配置为执行指令,以执行上述方法。
装置1200还可以包括一个电源组件1226被配置为执行装置1200的电源管理,一个有线或无线网络接口1250被配置为将装置1200连接到网络,和一个输入输出(I/O)接口1258。装置1200可以操作基于存储在存储器1232的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其他实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (21)

  1. 一种情感识别方法,其中,包括:
    获取与视频信息对应的待识别脉冲序列;
    采用脉冲神经网络情感识别模型对所述待识别脉冲序列进行识别,得到对应的情感类别。
  2. 根据权利要求1所述的情感识别方法,其中,在采用脉冲神经网络情感识别模型对所述待识别脉冲序列进行识别之前,所述方法还包括:
    对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
  3. 根据权利要求2所述的情感识别方法,其中,所述对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型包括:
    获取多种情感类别的测试集;
    利用所述测试集对预先建立的所述脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
  4. 根据权利要求2所述的情感识别方法,其中,所述对预先建立的脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型,包括:
    预先建立基于情感识别的动态视觉数据集;
    采用所述动态视觉数据集对预先建立的所述脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
  5. 根据权利要求4所述的情感识别方法,其中,所述预先建立基于情感识别的动态视觉数据集的过程包括:
    获取基于情感识别的原始视觉数据;
    采用动态视觉传感器模拟方法对所述原始视觉数据进行模拟仿真处理,得到与所述原始视觉数据对应的多个脉冲序列;或者采用动态视觉相机直接获取与所述原始视觉数据对应的多个脉冲序列;
    基于所述多个脉冲序列建立基于情感识别的动态视觉数据集。
  6. 根据权利要求5所述的情感识别方法,其中,所述采用动态视觉传感器模拟方法对所述原始视觉数据进行模拟仿真处理,得到与所述原始视觉数据对应的多个脉冲序列的过程,包括:
    对所述原始动态视频数据中的N帧视频帧图像依次进行遍历;其中,N表示所述原始视觉数据包含的视频帧图像总数量;
    在遍历到当前第i帧时,对所述当前第i帧的视频帧图像从RGB的颜色空间转换至 灰度空间,并将转换后的视频帧数据作为当前视频帧数据;其中,所述i的数值范围为1到N;
    在所述i的数值等于1时,将所述当前视频帧数据的所有浮点型数据赋值至模拟数据第一时间步的第一输出通道,得到由所述第一输出通道构成的脉冲序列,并将所述当前视频帧数据作为前一帧视频帧。
  7. 根据权利要求6所述的情感识别方法,其中,所述采用动态视觉传感器模拟方法对所述原始视觉数据进行模拟仿真处理,得到对应的脉冲序列的过程,还包括:
    在所述i不等于1时,根据所述当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,并将所述当前视频帧数据作为前一帧视频帧;
    对所述i的数值加1进行更新;
    在更新后的所述i小于所述N时,执行所述对当前第i帧的视频帧图像从RGB的颜色空间转换至灰度空间,并将转换后的视频帧数据作为当前视频帧数据的步骤。
  8. 根据权利要求7所述的情感识别方法,其中,所述采用动态视觉传感器模拟方法对所述原始视觉数据进行模拟仿真处理,得到对应的脉冲序列的过程,还包括:
    在更新后的所述i不小于所述N时,完成对所述原始动态视频数据中的N帧视频帧图像的遍历,得到由所述第一输出通道和所述第二输出通道构成的脉冲序列。
  9. 根据权利要求7所述的情感识别方法,其中,所述根据所述当前视频帧与前一帧视频帧的灰度差异值及预设阈值,对第一输出通道和第二输出通道分别进行赋值,包括:
    针对每个像素,计算出所述当前视频帧与前一帧视频帧在所述像素处的灰度差异值;
    在所述灰度差异值大于所述预设阈值时,第一输出通道对应的位置处赋值为1;或者
    在所述灰度差异值小于所述预设阈值时,第二输出通道对应的位置处赋值为1。
  10. 根据权利要求4至9任意一项所述的情感识别方法,其中,所述脉冲神经网络包括表决神经元群模块;
    所述采用所述动态视觉数据集对预先建立的所述脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型的过程,包括:
    初始化预先建立的所述脉冲神经网络情感识别模型的参数权重;
    将所述动态视觉数据集作为所述脉冲神经网络情感识别模型中的当前脉冲神经网络的输入,经过所述当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率;
    针对每个情感类别计算所述情感类别的表决神经元群的输出频率与对应情感类别的 真实标签之间的误差;
    根据所述误差计算所述参数权重对应的梯度,并采用所述梯度对所述当前脉冲神经网络的参数权重进行更新;
    判断更新参数权重后的所述当前脉冲神经网络是否收敛;
    在判定更新参数权重后的所述当前脉冲神经网络已收敛时,结束训练,得到训练后的脉冲神经网络情感识别模型。
  11. 根据权利要求10所述的情感识别方法,其中,所述采用所述动态视觉数据集对预先建立的所述脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型的过程,还包括:
    在判断更新参数权重后的所述当前脉冲神经网络未收敛时,返回执行所述将所述动态视觉数据集作为所述脉冲神经网络情感识别模型中的当前脉冲神经网络的输入经过所述当前脉冲神经网络的前向传播,得到各情感类别的表决神经元群的输出频率的步骤。
  12. 根据权利要求10所述的情感识别方法,其中,按照下述方式判断更新参数权重后的所述当前脉冲神经网络是否收敛:
    通过判断更新参数权重后的所述当前脉冲神经网络的当前训练次数是否达到预设次数,来判断所述当前脉冲神经网络是否收敛;或者
    通过判断更新参数权重后的所述当前脉冲神经网络的误差下降程度是否稳定在预设范围,来判断所述当前脉冲神经网络是否收敛;或者
    通过判断更新参数权重后的当前脉冲神经网络的误差是否小于误差阈值来判断所述当前脉冲神经网络是否收敛;或者
    根据所述动态视觉数据集中的验证集判断更新参数权重后的所述当前脉冲神经网络的是否收敛。
  13. 根据权利要求10所述的情感识别方法,其中,所述脉冲神经网络还包括:特征提取模块和情感映射模块,其中,所述特征提取模块包括由卷积、归一化、参数化带泄露整合发放模型PLIF和平均池化构成的单次前向提取单元以及由间隔排列的两层全连接和PLIF构成的网络单元;所述情感映射模块用来对表决神经元群输出的脉冲序列进行映射到最终的情感类别。
  14. 一种情感识别模型训练方法,其中,包括:
    预先建立基于情感识别的动态视觉数据集,利用所述动态视觉数据集对预先建立的所述脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型;或者
    获取多种情感类别的测试集,利用所述测试集对预先建立的所述脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
  15. 一种情感识别装置,其中,包括:
    获取模块,用于获取与视频信息对应的待识别脉冲序列;
    识别模块,用于采用脉冲神经网络情感识别模型对所述待识别脉冲序列进行识别,得到对应的情感类别。
  16. 一种情感识别模型训练装置,其中,包括:
    建立模块,用于预先建立基于情感识别的动态视觉数据集;
    训练模块,用于利用所述动态视觉数据集对预先建立的所述脉冲神经网络情感识别模型进行训练,得到训练后的脉冲神经网络情感识别模型。
  17. 一种情感识别模型训练装置,其中,包括:
    获取模块,用于获取多种情感类别的测试集;
    训练模块,用于利用所述测试集对预先建立的所述脉冲神经网络情感识别模型进行测试训练,得到训练后的脉冲神经网络情感识别模型。
  18. 一种情感识别装置,其中,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至13任一项所述情感识别方法或如权利要求14所述的情感识别模型训练方法。
  19. 一种电子设备,其中,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行计算机程序时实现如权利要求1至13任一项所述情感识别方法或如权利要求14所述的情感识别模型训练方法。
  20. 一种计算机非易失性可读存储介质,其中,所述计算机非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至13任一项所述情感识别方法或如权利要求14所述的情感识别模型训练方法。
  21. 一种计算机程序产品,其中,包括计算机程序或指令,所述计算机程序或指令被处理器执行时实现如权利要求1至13任一项所述情感识别方法或如权利要求14所述的情感识别模型训练方法。
PCT/CN2022/122788 2022-02-09 2022-09-29 情感识别方法、训练方法、装置、设备、存储介质及产品 WO2023151289A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210119803.3 2022-02-09
CN202210119803.3A CN114155478B (zh) 2022-02-09 2022-02-09 一种情感识别方法、装置、系统及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023151289A1 true WO2023151289A1 (zh) 2023-08-17

Family

ID=80450274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122788 WO2023151289A1 (zh) 2022-02-09 2022-09-29 情感识别方法、训练方法、装置、设备、存储介质及产品

Country Status (2)

Country Link
CN (1) CN114155478B (zh)
WO (1) WO2023151289A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117110700A (zh) * 2023-08-23 2023-11-24 易集康健康科技(杭州)有限公司 一种射频电源脉冲功率检测方法及系统
CN117232638A (zh) * 2023-11-15 2023-12-15 常州检验检测标准认证研究院 机器人振动检测方法及系统
CN117809381A (zh) * 2024-03-01 2024-04-02 鹏城实验室 视频动作分类方法、装置、设备和存储介质
CN118262184A (zh) * 2024-05-31 2024-06-28 苏州元脑智能科技有限公司 图像情感识别方法及装置、存储介质及电子设备

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155478B (zh) * 2022-02-09 2022-05-10 苏州浪潮智能科技有限公司 一种情感识别方法、装置、系统及计算机可读存储介质
CN114466153B (zh) * 2022-04-13 2022-09-09 深圳时识科技有限公司 自适应脉冲生成方法、装置、类脑芯片和电子设备
CN114913590B (zh) * 2022-07-15 2022-12-27 山东海量信息技术研究院 一种数据的情感识别方法、装置、设备及可读存储介质
CN115238835B (zh) * 2022-09-23 2023-04-07 华南理工大学 基于双空间自适应融合的脑电情感识别方法、介质及设备
CN115578771A (zh) * 2022-10-24 2023-01-06 智慧眼科技股份有限公司 活体检测方法、装置、计算机设备及存储介质
CN116259310A (zh) * 2023-01-16 2023-06-13 之江实验室 一种面向硬件的深度脉冲神经网络语音识别方法和系统
CN116882469B (zh) * 2023-09-06 2024-02-02 苏州浪潮智能科技有限公司 用于情感识别的脉冲神经网络部署方法、装置及设备
CN117435917B (zh) * 2023-12-20 2024-03-08 苏州元脑智能科技有限公司 一种情感识别方法、系统、装置及介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169409A (zh) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 一种情感识别方法及装置
CN108596039A (zh) * 2018-03-29 2018-09-28 南京邮电大学 一种基于3d卷积神经网络的双模态情感识别方法及系统
CN109815785A (zh) * 2018-12-05 2019-05-28 四川大学 一种基于双流卷积神经网络的人脸情绪识别方法
CN110210563A (zh) * 2019-06-04 2019-09-06 北京大学 基于Spike cube SNN的图像脉冲数据时空信息学习及识别方法
CN111310672A (zh) * 2020-02-19 2020-06-19 广州数锐智能科技有限公司 基于时序多模型融合建模的视频情感识别方法、装置及介质
CN112580617A (zh) * 2021-03-01 2021-03-30 中国科学院自动化研究所 自然场景下的表情识别方法和装置
US20210390288A1 (en) * 2020-06-16 2021-12-16 University Of Maryland, College Park Human emotion recognition in images or video
CN114155478A (zh) * 2022-02-09 2022-03-08 苏州浪潮智能科技有限公司 一种情感识别方法、装置、系统及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113056748A (zh) * 2018-11-28 2021-06-29 惠普发展公司,有限责任合伙企业 使用深度神经网络的输出的基于事件的处理
CN110556129B (zh) * 2019-09-09 2022-04-19 北京大学深圳研究生院 双模态情感识别模型训练方法及双模态情感识别方法
CN113257282B (zh) * 2021-07-15 2021-10-08 成都时识科技有限公司 语音情感识别方法、装置、电子设备以及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169409A (zh) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 一种情感识别方法及装置
CN108596039A (zh) * 2018-03-29 2018-09-28 南京邮电大学 一种基于3d卷积神经网络的双模态情感识别方法及系统
CN109815785A (zh) * 2018-12-05 2019-05-28 四川大学 一种基于双流卷积神经网络的人脸情绪识别方法
CN110210563A (zh) * 2019-06-04 2019-09-06 北京大学 基于Spike cube SNN的图像脉冲数据时空信息学习及识别方法
CN111310672A (zh) * 2020-02-19 2020-06-19 广州数锐智能科技有限公司 基于时序多模型融合建模的视频情感识别方法、装置及介质
US20210390288A1 (en) * 2020-06-16 2021-12-16 University Of Maryland, College Park Human emotion recognition in images or video
CN112580617A (zh) * 2021-03-01 2021-03-30 中国科学院自动化研究所 自然场景下的表情识别方法和装置
CN114155478A (zh) * 2022-02-09 2022-03-08 苏州浪潮智能科技有限公司 一种情感识别方法、装置、系统及计算机可读存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117110700A (zh) * 2023-08-23 2023-11-24 易集康健康科技(杭州)有限公司 一种射频电源脉冲功率检测方法及系统
CN117110700B (zh) * 2023-08-23 2024-06-04 易集康健康科技(杭州)有限公司 一种射频电源脉冲功率检测方法及系统
CN117232638A (zh) * 2023-11-15 2023-12-15 常州检验检测标准认证研究院 机器人振动检测方法及系统
CN117232638B (zh) * 2023-11-15 2024-02-20 常州检验检测标准认证研究院 机器人振动检测方法及系统
CN117809381A (zh) * 2024-03-01 2024-04-02 鹏城实验室 视频动作分类方法、装置、设备和存储介质
CN117809381B (zh) * 2024-03-01 2024-05-14 鹏城实验室 视频动作分类方法、装置、设备和存储介质
CN118262184A (zh) * 2024-05-31 2024-06-28 苏州元脑智能科技有限公司 图像情感识别方法及装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN114155478B (zh) 2022-05-10
CN114155478A (zh) 2022-03-08

Similar Documents

Publication Publication Date Title
WO2023151289A1 (zh) 情感识别方法、训练方法、装置、设备、存储介质及产品
CN112889108B (zh) 使用视听数据进行说话分类
WO2020093837A1 (zh) 人体骨骼关键点的检测方法、装置、电子设备及存储介质
WO2020134556A1 (zh) 图像风格迁移方法、装置、电子设备及存储介质
KR20190056538A (ko) 서버 및 그것의 동작 방법
CN105809704B (zh) 识别图像清晰度的方法及装置
CN107529650B (zh) 闭环检测方法、装置及计算机设备
CN111507343A (zh) 语义分割网络的训练及其图像处理方法、装置
CN106469302A (zh) 一种基于人工神经网络的人脸肤质检测方法
CN109658352A (zh) 图像信息的优化方法及装置、电子设备和存储介质
TW202038191A (zh) 活體檢測方法和裝置、電子設備及儲存介質
WO2018120662A1 (zh) 一种拍照方法,拍照装置和终端
US20200294249A1 (en) Network module and distribution method and apparatus, electronic device, and storage medium
CN109543714A (zh) 数据特征的获取方法、装置、电子设备及存储介质
EP3923202A1 (en) Method and device for data processing, and storage medium
CN105205479A (zh) 人脸颜值评估方法、装置及终端设备
CN106845398B (zh) 人脸关键点定位方法及装置
CN110399841B (zh) 一种视频分类方法、装置及电子设备
CN110443366A (zh) 神经网络的优化方法及装置、目标检测方法及装置
CN103886284B (zh) 人物属性信息识别方法、装置及电子设备
WO2021047069A1 (zh) 人脸识别方法和电子终端设备
WO2022166069A1 (zh) 深度学习网络确定方法、装置、电子设备及存储介质
US20210192192A1 (en) Method and apparatus for recognizing facial expression
CN110765924A (zh) 一种活体检测方法、装置以及计算机可读存储介质
CN110889489A (zh) 神经网络的训练方法、图像识别方法及其装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22925647

Country of ref document: EP

Kind code of ref document: A1