CN116631583A

CN116631583A - Psychological dispersion method, device and server based on big data of Internet of things

Info

Publication number: CN116631583A
Application number: CN202310619534.1A
Authority: CN
Inventors: 黄振培; 吴丹
Original assignee: China Brain Science Research Zhuhai Hengqin Co ltd
Current assignee: China Brain Science Research Zhuhai Hengqin Co ltd
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-08-22

Abstract

The application is applicable to the technical field of voice processing, and provides a psychological grooming method, a psychological grooming device and a psychological grooming server based on big data of the Internet of things, wherein the psychological grooming method comprises the following steps: acquiring a text emotion vector of a user; acquiring an image emotion vector of a user; fusing the text emotion vector and the image emotion vector to obtain a multi-modal emotion vector; inputting the multi-mode emotion vector into a pre-trained emotion decoder to obtain an emotion tag; inquiring the emotion feedback sentence pattern corresponding to the emotion label, and playing the sound signal corresponding to the emotion feedback sentence pattern to the user. Therefore, the embodiment of the application can actively identify the emotion label of the child, and generate the corresponding emotion feedback sentence pattern according to the emotion label, thereby achieving the purpose of timely psychological dispersion for the child.

Description

Psychological dispersion method, device and server based on big data of Internet of things

Technical Field

The application belongs to the technical field of voice processing, and particularly relates to a psychological grooming method, device and server based on big data of the Internet of things.

Background

The psychological growth of children is in a rapid development stage, the psychological health is vital, and the psychological growth has irreversible influence on the character, the self-confidence, the value of looking into the world and the like after adults. However, when the current society works fast and busy, parents have difficulty in having time and experience, and find the slight psychological change of children, so that communication is not timely and in place, and long time, the psychological health growth of children is ignored.

Thus, there is a need for a method for actively discovering psychological changes in children and performing psychological dispersion.

Disclosure of Invention

The embodiment of the application provides a psychological dispersion method, a psychological dispersion device and a psychological dispersion server based on big data of the Internet of things, which can solve the technical problem that in the prior art, psychological changes of children are actively found and psychological dispersion is carried out.

In a first aspect, an embodiment of the present application provides a psychological grooming method based on big data of the internet of things, including:

acquiring a text emotion vector of a user;

acquiring an image emotion vector of a user;

fusing the text emotion vector and the image emotion vector to obtain a multi-mode emotion vector;

inputting the multi-mode emotion vector into a pre-trained emotion decoder to obtain an emotion label;

inquiring the emotion feedback sentence pattern corresponding to the emotion label, and playing a sound signal corresponding to the emotion feedback sentence pattern to the user.

In a possible implementation manner of the first aspect, obtaining a text emotion vector of a user includes:

acquiring text sequence information of a user;

and inputting the text sequence information to a pre-trained text sequence encoder to generate a text emotion vector.

In a possible implementation manner of the first aspect, the pre-trained text sequence encoder includes a text sequence encoding model and a semantic representation enhancement network; the text sequence coding model comprises an embedded layer, a preset number of coding blocks, a first pooling layer and an output layer;

inputting the text sequence information to a pre-trained text sequence encoder to generate text emotion vectors, comprising:

preprocessing the text sequence information according to an embedding layer to obtain an embedded representation vector with a fixed length;

extracting text features of the fixed-length vectors according to a preset number of coding blocks to obtain text feature vectors;

summarizing the text feature vectors according to the first pooling layer to obtain low-dimension semantic representation vectors;

inputting the low-dimensional semantic representation vector to a semantic representation enhancement network according to an output layer;

and carrying out hidden state capturing on the low-dimensional semantics according to a semantic representation enhancement network to obtain a high-dimensional semantic representation vector, and taking the high-dimensional semantic representation vector as a text emotion vector.

In a possible implementation manner of the first aspect, obtaining an image emotion vector of a user includes:

Acquiring image sequence information of a user;

and inputting the image sequence information to a pre-trained image sequence encoder to generate an image emotion vector.

In a possible implementation manner of the first aspect, the image sequence encoder includes an input layer, a convolution layer, a second pooling layer, a multi-channel convolution layer, a local connection layer, and a full connection layer;

inputting the image sequence information to a pre-trained image sequence encoder to generate an image emotion vector, comprising:

preprocessing the image sequence information according to an input layer;

carrying out convolution processing on the preprocessed image sequence information according to a convolution layer to obtain a characteristic image;

performing downsampling treatment on the feature image according to the second pooling layer to obtain a low-dimensional feature image;

performing attention calculation processing on the low-dimensional characteristic image according to the multichannel convolution layer to obtain a candidate image emotion vector;

carrying out local processing on the candidate image emotion vectors according to the local connection layer to obtain local image emotion vectors;

and carrying out full connection processing on the local image emotion vector according to the full connection layer to obtain the image emotion vector.

In a possible implementation manner of the first aspect, fusing the text emotion vector and the image emotion vector to obtain a multi-modal emotion vector includes:

Normalizing the text emotion vector and the image emotion vector;

calculating a similarity matrix between the text emotion vector and the image emotion vector according to the following formula:

A(i,j)＝v_text(i)*v_image(j)，

wherein A (i, j) represents a similarity matrix between jth image emotion vectors of the ith text emotion vector, v_text (i) represents the text emotion vector, v_image (j) represents the image emotion vector, and ". Times." represents dot product operation of the vectors;

and carrying out bidirectional attention processing on the similarity matrix to generate a multi-mode emotion vector.

In one possible implementation of the first aspect, the pre-trained emotion decoder includes a generator and a arbiter;

inputting the multi-modal emotion vector into a pre-trained emotion decoder to obtain an emotion tag, including:

inputting the multi-modal emotion vector into the generator to generate candidate emotion labels;

and inputting the candidate emotion labels into the discriminator to obtain emotion labels.

In a second aspect, an embodiment of the present application provides a psychological grooming device based on big data of the internet of things, including:

the first acquisition module is used for acquiring text emotion vectors of users;

The second acquisition module is used for acquiring the image emotion vector of the user;

the fusion module is used for fusing the text emotion vector and the image emotion vector to obtain a multi-mode emotion vector;

the emotion decoding module is used for inputting the multi-mode emotion vector into a pre-trained emotion decoder to obtain an emotion label;

and the query module is used for querying the emotion feedback sentence pattern corresponding to the emotion label and playing the sound signal corresponding to the emotion feedback sentence pattern to the user.

In an optional implementation manner of the second aspect, the first obtaining module includes:

the first acquisition sub-module is used for acquiring text sequence information of a user;

the first generation sub-module is used for inputting the text sequence information to a pre-trained text sequence encoder and generating text emotion vectors.

In an optional implementation manner of the second aspect, the pre-trained text sequence encoder includes a text sequence encoding model and a semantic representation enhancement network; the text sequence coding model comprises an embedded layer, a preset number of coding blocks, a first pooling layer and an output layer;

a first generation sub-module comprising:

the preprocessing unit is used for preprocessing the text sequence information according to the embedding layer to obtain an embedded representation vector with a fixed length;

The text feature extraction unit is used for extracting text features of the fixed-length vectors according to a preset number of coding blocks to obtain text feature vectors;

the pooling unit is used for summarizing the text feature vectors according to the first pooling layer to obtain low-dimension semantic representation vectors;

the input unit is used for inputting the low-dimensional semantic representation vector into a semantic representation enhancement network according to an output layer;

the semantic representation enhancement unit is used for capturing the hidden state of the low-dimensional semantic according to a semantic representation enhancement network to obtain a high-dimensional semantic representation vector, and the high-dimensional semantic representation vector is used as a text emotion vector.

In an optional implementation manner of the second aspect, the second obtaining module includes:

the second acquisition sub-module is used for acquiring the image sequence information of the user;

and the second generation submodule is used for inputting the image sequence information into a pre-trained image sequence encoder and generating an image emotion vector.

In an alternative implementation manner of the second aspect, the image sequence encoder includes an input layer, a convolution layer, a second pooling layer, a multi-channel convolution layer, a local connection layer, and a full connection layer;

A second generation sub-module, comprising:

the preprocessing unit is used for preprocessing the image sequence information according to an input layer;

the convolution processing unit is used for carrying out convolution processing on the preprocessed image sequence information according to the convolution layer to obtain a characteristic image;

the sampling processing unit is used for carrying out downsampling processing on the characteristic image according to the second pooling layer to obtain a low-dimensional characteristic image;

the intention calculation processing unit is used for carrying out the attention calculation processing on the low-dimensional characteristic image according to the multi-channel convolution layer to obtain candidate image emotion vectors;

the local connection unit is used for carrying out local processing on the candidate image emotion vectors according to the local connection layer to obtain local image emotion vectors;

and the full-connection processing unit is used for carrying out full-connection processing on the local image emotion vector according to the full-connection layer to obtain the image emotion vector.

In a third aspect, an embodiment of the present application provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the computer program.

In a third aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as described in the first aspect above.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

in the embodiment of the application, the text emotion vector of the user is obtained; acquiring an image emotion vector of a user; fusing the text emotion vector and the image emotion vector to obtain a multi-modal emotion vector; inputting the multi-mode emotion vector into a pre-trained emotion decoder to obtain an emotion tag; inquiring the emotion feedback sentence pattern corresponding to the emotion label, and playing the sound signal corresponding to the emotion feedback sentence pattern to the user. Therefore, the embodiment of the application can actively identify the emotion label of the child, and generate the corresponding emotion feedback sentence pattern according to the emotion label, thereby achieving the purpose of timely psychological dispersion for the child.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a psychological grooming method based on big data of the internet of things according to an embodiment of the present application;

fig. 2 is a block diagram of a psychological grooming device based on big data of the internet of things according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The technical scheme provided by the embodiment of the application is described by a specific embodiment.

Referring to fig. 1, a flow chart of a psychological grooming method based on internet of things big data according to an embodiment of the present application is provided, and the method is applied to a server, and includes the following steps:

step S101, obtaining text emotion vectors of users.

Wherein, the user refers to children.

In a specific application, obtaining a text emotion vector of a user includes:

step S201, obtaining text sequence information of a user.

The text sequence information is text information representing the semantics, the speech speed, the intonation, the volume information and the like of the child, and the text sequence information is obtained by collecting the voice of the user through an audio collection device (such as a microphone) arranged in the study and the workplace of the child, identifying the semantic information of the user by adopting a natural voice processing technology (Natural Language Processing, NLP), extracting the information such as the speech speed, the intonation, the volume and the like of the child speaking through linear predictive coding (Linear predictive coding, LPC), and carrying out normalization processing on the information such as the semantics, the speech speed, the intonation, the volume and the like.

Step S202, inputting the text sequence information into a pre-trained text sequence encoder to generate text emotion vectors.

Wherein the pre-trained text sequence encoder comprises a text sequence encoding model and a semantic representation enhancement network; the text sequence coding model comprises an embedded layer, a preset number of coding blocks, a first pooling layer and an output layer. By way of example, the text sequence coding model may be a pre-trained BERT model, the preset number of coding blocks may be 12 transform coding blocks, each consisting of a multi-headed self-attention mechanism and a forward neural network, the context-dependent feature representation may be captured, and the BERT may learn a deeper, more expressive feature representation by stacking multiple transform encoders; the semantic representation enhancement network may be an LSTM network, which is a bi-directional long short-term memory network (BiLSTM) consisting of both forward and backward LSTM networks. Its inputs are the output of the BERT encoder, i.e. the hidden state vector for each word, and the corresponding position embedding and paragraph embedding vectors. These vectors can provide current word, context, and paragraph information to better capture semantic information of text. The LSTM network processes these vectors and outputs a new, higher level semantic representation.

step S301, preprocessing the text sequence information according to the embedding layer to obtain the embedded representation vector with fixed length.

Step S302, extracting text features of the fixed-length vectors according to a preset number of coding blocks to obtain text feature vectors.

And step S303, summarizing the text feature vectors according to the first pooling layer to obtain low-dimension semantic representation vectors.

Step S304, the low-dimensional semantic representation vector is input to a semantic representation enhancement network according to the output layer.

Step S305, carrying out hidden state capturing on the low-dimensional semantics according to the semantic representation enhancement network to obtain a high-dimensional semantic representation vector, and taking the high-dimensional semantic representation vector as a text emotion vector.

In a specific application, an LSTM network firstly projects an input vector to a smaller dimension through a full connection layer to reduce the complexity of a model, then, a forward LSTM and a backward LSTM respectively process a projected vector sequence to obtain forward and backward LSTM hidden state sequences, the hidden states contain semantic information of different layers, the dependency relationship and the context information among words can be captured, and finally, the outputs of the forward LSTM and the backward LSTM are spliced to obtain a text emotion vector.

Step S102, obtaining an image emotion vector of a user.

In a specific application, acquiring an image emotion vector of a user comprises:

step S401, acquiring image sequence information of a user.

The image sequence information comprises information representing facial expressions of a user and facial expressions such as eyes, mouths and the like, and the gesture actions comprise limb actions such as head, shoulders, hands, arms, legs, feet and the like. Specifically, images of users are collected through cameras arranged in children learning and workplaces, key points of faces, such as positions of eyes, mouth and nose, are identified through a face key point detection algorithm (such as dlib and OpenCV), and facial expression vectors are constructed to serve as facial expression information according to geometric information of distances, included angles and the like among the key points; three-dimensional rotation is described using euler angles, which specify three rotation angles around different axes, for example, the head can be described by X, Y and Z-axis rotation angles, which can be regarded as a vector, where each component represents an axis rotation angle, similarly, the pose of the shoulder, hand, arm, leg, and foot can be defined using vector representations, and then all vectors are connected to make up a complete pose motion vector as pose motion information. And carrying out normalization processing on the facial expression information and the gesture motion information to obtain image sequence information.

In step S402, the image sequence information is input to a pre-trained image sequence encoder, and an image emotion vector is generated.

The image sequence encoder comprises an input layer, a convolution layer, a second pooling layer, a multi-channel convolution layer, a local connection layer and a full connection layer.

Illustratively, the image sequence encoder refers to an ACCN model, which is a model based on Convolutional Neural Network (CNN), and its structure mainly consists of the following parts:

input layer: accept the input data and convert it to a format for subsequent processing. The input layer converts the input data into a format that can be processed later, which is generally referred to as tensor format, i.e., an array of dimensions. In convolutional neural networks, the input data is typically image data, and thus the image data needs to be converted into tensor format. This process is called preprocessing.

Convolution layer: features of input data are detected through convolution operation, and corresponding feature images are extracted.

A second pooling layer: each feature image is downsampled to reduce the dimensions of the feature image, thereby making the subsequent processing more efficient.

Multichannel convolution layer: and carrying out convolution processing on different characteristic images through different convolution kernels so as to further extract deep characteristic information. Channel-based attention mechanisms, channel Attention Mechanism, are used in ACNN. In the ACNN model, the output of each multi-channel convolutional layer passes through an attention module that can adaptively learn the importance of each channel and weight the outputs of the multi-channel convolutional layers together. This may make the model more focused on those channels that have a greater impact on the final result, thereby improving the accuracy and robustness of the model. Specifically, in the attention module of the ACNN, the global average value and the global maximum value of each channel are calculated according to the output of the multi-channel convolution layer, then the two values are respectively sent to two full-connection layers for processing to obtain the importance weight of each channel, and finally the weights and the corresponding channels are subjected to weighted summation to obtain the final weighted output result.

Local connection layer: the outputs of the previous multi-channel convolution layers are locally connected to increase the nonlinear expression capability of the model.

Full tie layer: and fully connecting the output of the previous local connection layer to generate a final output result.

The image sequence information is input to a pre-trained image sequence encoder to generate an image emotion vector, comprising:

step S501, preprocessing the image sequence information according to the input layer.

Step S502, carrying out convolution processing on the preprocessed image sequence information according to the convolution layer to obtain a characteristic image.

Step S503, performing downsampling processing on the feature image according to the second pooling layer to obtain a low-dimensional feature image.

And step S504, performing attention calculation processing according to the multi-channel convolution layer low-dimensional characteristic image to obtain candidate image emotion vectors.

Step S505, carrying out local processing on the candidate image emotion vectors according to the local connection layer to obtain local image emotion vectors.

And step S506, performing full connection processing on the local image emotion vector according to the full connection layer to obtain the image emotion vector.

And step S103, fusing the text emotion vector and the image emotion vector to obtain the multi-mode emotion vector.

Specifically, fusing text emotion vectors and image emotion vectors to obtain multi-modal emotion vectors, including:

step S601, normalizing the text emotion vector and the image emotion vector.

Illustratively, a text emotion vector is denoted as h_text, an image emotion vector is denoted as h_image, and each vector is normalized by using L2 regularization to obtain unit vectors v_text and v_image.

Step S602, calculating a similarity matrix between the text emotion vector and the image emotion vector according to the following formula:

A(i,j)＝v_text(i)*v_image(j)，

step S603, performing bidirectional attention processing on the similarity matrix to generate a multi-mode emotion vector.

Illustratively, performing softmax operation on each row and each column of the similarity matrix respectively to obtain normalized similarity matrices P_text and P_image;

the process of carrying out bidirectional attention processing on the similarity matrix is as follows:

(1) The image representation vector h_image is weighted and averaged through P_text, and a text perceived image representation vector m_text is obtained:

m_text=p_text_h_image, where "×" denotes a matrix multiplication operation.

(2) Similarly, the text representation vector h_text is weighted and averaged through P_image, so that an image perceived text representation vector m_image is obtained:

m_image=p_image_h_text, wherein "×" denotes a matrix multiplication operation

(3) Then, the text-aware image representation vector m_text and the image-aware text representation vector m_image may be spliced together to obtain a global multi-modal representation vector m:

m= [ m_text, m_image ], and taking the global multi-modal expression vector as a multi-modal emotion vector.

It can be appreciated that by performing bidirectional attention processing on the similarity matrix, the embodiment of the application performs bidirectional stitching on the two representation vectors on the feature dimension, so as to form a more comprehensive and richer multi-modal representation.

Step S104, inputting the multi-mode emotion vector into a pre-trained emotion decoder to obtain an emotion label.

Wherein the emotion tag comprises angry, sadness, happiness, fear and neutrality, and the pre-trained emotion decoder comprises a generator and a discriminator;

illustratively, the emotion decoder may be a Generating Antagonism Network (GAN), which is constructed as follows:

(1) Definition generator and arbiter:

the generator model takes as input the multi-modal emotion vector, outputs a vector containing emotion tags angry, sad, happy, fear and neutral, and the discriminant model takes as input the emotion tags angry, sad, happy, fear and neutral, outputs a binary value indicating whether it is a true emotion tag.

(2) Training a generator and a discriminant model:

the generator model randomly generates the vector of the emotion label and transmits the vector to the discriminator to judge whether the emotion label is true or not, and the generated emotion label is closer to the true emotion label by back propagation of the update parameters. The discriminator model distinguishes true and generated emotion labels through learning and back propagates updated parameters.

(3) Fine tuning the generator: a generator is used to generate a batch of emotion labels, and the emotion labels are mixed with the true emotion labels to train the discriminant model again. And updating parameters of the generator model through back propagation, so that the generated emotion label is more similar to the real emotion label.

Inputting the multi-modal emotion vector into a pre-trained emotion decoder to obtain an emotion tag, comprising:

step S701, inputting the multi-mode emotion vector into a generator to generate a candidate emotion label;

Step S702, inputting the candidate emotion labels into a discriminator to obtain emotion labels.

Step S105, inquiring emotion feedback sentence patterns corresponding to emotion labels, and playing sound signals corresponding to the emotion feedback sentence patterns to a user.

Specifically, a relational database (such as MySQL, postgreSQL, oracle) is used for storing emotion labels and corresponding emotion guiding sentences based on a table structure, SQL language is used for inquiring emotion feedback sentences corresponding to the emotion labels, text-to-Speech (TTS) technology is used for converting the emotion feedback sentences into corresponding sound signals, and sound information is played to children through speakers arranged in a hearing range of the children, so that the aim of psychological guiding is achieved.

Illustratively, when the emotion tag is angry, the corresponding emotion feedback sentence is "understand that you are now angry, can tell me what happens? "," I know your feel, sometimes we feel angry for something. Let us find a solution together, let you feel better ";

when the emotion tag is sad, the corresponding emotion feedback sentence is "i know what is you hard to feel, and can tell i what happens? Sometimes we feel sad, which is a normal emotion. Let us discuss some methods together to help you overcome this emotion ";

When the emotion tag is happy, the corresponding emotion feedback sentence is "look very happy-! Can you share your happiness with me? "see you so happy, true let happy-! You do very well-! ";

when the emotion tag is fear, the corresponding emotion feedback sentence is "do i know what you feel fear, can tell me what you are afraid? "when we feel fear, we need to find some way to get oneself safe. We want to be able to feel you safe. ";

when the emotion tag is neutral, the corresponding emotion feedback sentence is "what is you calm and what needs to be shared with me? "sometimes our mood will be calm, which is also a normal emotion. I would like to listen if you need to talk about anything with me. "

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the method described in the above embodiments, fig. 2 shows a block diagram of the psychological grooming device based on big data of the internet of things according to the embodiment of the present application, and for convenience of explanation, only the parts related to the embodiment of the present application are shown.

Referring to fig. 2, the apparatus includes:

a first obtaining module 21, configured to obtain a text emotion vector of a user;

a second obtaining module 22, configured to obtain an image emotion vector of a user;

the fusion module 23 is configured to fuse the text emotion vector and the image emotion vector to obtain a multimodal emotion vector;

the emotion decoding module 24 is configured to input the multi-modal emotion vector into a pre-trained emotion decoder to obtain an emotion tag;

and the query module 25 is used for querying the emotion feedback sentence pattern corresponding to the emotion label and playing the sound signal corresponding to the emotion feedback sentence pattern to the user.

In an alternative implementation manner, the first obtaining module includes:

In an alternative implementation, the pre-trained text sequence encoder includes a text sequence encoding model and a semantic representation enhancement network; the text sequence coding model comprises an embedded layer, a preset number of coding blocks, a first pooling layer and an output layer;

a first generation sub-module comprising:

In an alternative implementation, the second obtaining module includes:

In an alternative implementation, the image sequence encoder includes an input layer, a convolution layer, a second pooling layer, a multi-channel convolution layer, a local connection layer, and a full connection layer;

a second generation sub-module, comprising:

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

Fig. 3 is a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 3, the server 3 of this embodiment includes: at least one processor 30, a memory 31 and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the various method embodiments described above when executing the computer program 32.

The server 3 may be a computing device such as a cloud server. The server may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the server 3 and is not meant to be limiting as the server 3, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 30 may be a central processing unit (Central Processing Unit, CPU), the processor 30 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may in some embodiments be an internal storage unit of the server 3, such as a hard disk or a memory of the server 3. The memory 31 may in other embodiments also be an external storage device of the server 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the server 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the server 3. The memory 31 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs etc., such as program codes of the computer program etc. The memory 31 may also be used for temporarily storing data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a server, a recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The psychological dispersion method based on the big data of the Internet of things is characterized by comprising the following steps of:

acquiring a text emotion vector of a user;

acquiring an image emotion vector of a user;

2. The psychological dispersion method based on big data of the internet of things according to claim 1, wherein the obtaining of the text emotion vector of the user comprises:

acquiring text sequence information of a user;

3. The internet of things big data based psychological grooming method as in claim 2, wherein the pre-trained text sequence encoder comprises a text sequence encoding model and a semantic representation enhancement network; the text sequence coding model comprises an embedded layer, a preset number of coding blocks, a first pooling layer and an output layer;

4. The psychology grooming method based on the big data of the internet of things as claimed in claim 1, wherein the obtaining the image emotion vector of the user comprises:

acquiring image sequence information of a user;

5. The internet of things big data based psychological grooming method according to claim 4, wherein the image sequence encoder comprises an input layer, a convolution layer, a second pooling layer, a multi-channel convolution layer, a local connection layer and a full connection layer;

preprocessing the image sequence information according to an input layer;

6. The method for psychologically grooming based on big data of the internet of things as claimed in claim 1, wherein fusing the text emotion vector and the image emotion vector to obtain a multi-modal emotion vector comprises:

normalizing the text emotion vector and the image emotion vector;

A(i,j)＝v_text(i)*v_image(j)，

7. The psychometric grooming method based on internet of things big data according to claim 1, wherein the pre-trained emotion decoder comprises a generator and a discriminator;

8. Psychological grooming device based on thing networking big data, characterized by comprising:

9. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.