CN115063874A

CN115063874A - Control method, device and equipment of intelligent household equipment and storage medium

Info

Publication number: CN115063874A
Application number: CN202210977815.XA
Authority: CN
Inventors: 蔡芳发; 周波; 苗瑞; 邹小刚; 莫少锋
Original assignee: Shenzhen HQVT Technology Co Ltd
Current assignee: Shenzhen Haiqing Zhiyuan Technology Co ltd
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-09-16
Anticipated expiration: 2042-08-16
Also published as: CN115063874B

Abstract

The application provides a control method, a control device, equipment and a storage medium of intelligent household equipment. The method comprises the following steps: inputting the acquired track video of the user into a track prediction model and/or a face image into a face recognition model; controlling the intelligent household equipment according to the track prediction result and/or the face recognition result; the trajectory prediction model comprises a plurality of cyclic neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the first attention mechanism module is used for extracting the characteristics of an interest area in a first track characteristic diagram output by the recurrent neural network module, and the gate control unit is used for capturing the track behavior habit of a user in a track video; the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and alternately connected; the second attention mechanism module is used for extracting the characteristics of the interest area in the first face characteristic diagram output by the residual convolutional neural network module.

Description

Control method, device and equipment of intelligent household equipment and storage medium

Technical Field

The present application relates to artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for controlling smart home devices.

Background

In recent years, artificial intelligence is introduced into a home scene to provide intelligent life services for users, so that convenience and comfort in life are brought to the users.

The intelligent home system is a key for providing intelligent home service for users. The intelligent home system analyzes and processes some behavior habits of the user, so that the intelligent home equipment is automatically controlled, and the purpose of reducing user operation is achieved. How to improve the intelligence of the intelligent home system and provide more intelligent life service for users is a problem to be solved at present.

Disclosure of Invention

The application provides a control method, a control device, control equipment and a storage medium of intelligent home equipment, which are used for improving the intelligence of an intelligent home system and providing more intelligent life service for a user.

In a first aspect, the present application provides a method for controlling smart home devices, including: acquiring a track video and/or a face image of a user; the track video comprises a plurality of frames of track images; inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting the face image into a pre-trained face recognition model to obtain a face recognition result of the user; controlling the intelligent household equipment according to the track prediction result and/or the face recognition result; the trajectory prediction model comprises a plurality of cyclic neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the recurrent neural network module is used for extracting a first track characteristic diagram from each frame of track image, the first attention mechanism module is used for extracting the characteristics of an interest area in the first track characteristic diagram, and the gate control unit is used for capturing the track behavior habit of a user in the track video; the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the second attention mechanism module is used for extracting the characteristics of the interest region in the first face characteristic diagram output by the residual convolutional neural network module.

In a second aspect, the present application provides a control device for smart home devices, including: the acquisition module is used for acquiring a track video and/or a face image of a user; the track video comprises a plurality of frames of track images; the input module is used for inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting the face image into a pre-trained face recognition model to obtain a face recognition result of the user; the control module is used for controlling the intelligent household equipment according to the track prediction result and/or the face recognition result; the trajectory prediction model comprises a plurality of cyclic neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the recurrent neural network module is used for extracting a first track characteristic diagram from each frame of track image, the first attention mechanism module is used for extracting the characteristics of an interest area in the first track characteristic diagram, and the gate control unit is used for capturing the track behavior habit of a user in the track video; the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the second attention mechanism module is used for extracting the characteristics of the interest region in the first face characteristic diagram output by the residual convolutional neural network module.

In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored by the memory to implement the method of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the method according to the first aspect when executed by a processor.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.

According to the control method, the control device, the control equipment and the storage medium of the intelligent household equipment, the track video and/or the face image of the user are/is obtained; the track video comprises a plurality of frames of track images; inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting a face image into a pre-trained face recognition model to obtain a face recognition result of the user; the trajectory prediction model comprises a plurality of cyclic neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the cyclic neural network module is used for extracting a first track characteristic diagram from each frame of track image, the first attention mechanism module is used for extracting the characteristics of an interest area in the first track characteristic diagram, and the gate control unit is used for capturing the track behavior habit of a user in the track video; the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the second attention mechanism module is used for extracting the features of the interest region in the first face feature map output by the residual convolutional neural network module. Due to the fact that the attention mechanism module is added in the track prediction model and the face recognition model, richer track characteristic information and face characteristic information can be extracted, accuracy of user track prediction and face recognition is improved, intelligence of the intelligent home system is improved, better intelligent home service is provided for users, and intelligent home use experience of the users is improved. The gate control unit can well solve the problem of gradient disappearance in the recurrent neural network module, so that the long-term track behavior habit and the short-term track behavior habit of the user in the track video are captured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic architecture diagram of an intelligent home system provided in an embodiment of the present application;

fig. 2 is a flowchart of a control method for smart home devices according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a recurrent neural network with gating units according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a principle of trajectory prediction according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram illustrating a principle of face recognition according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for training a trajectory prediction model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating a method for training a trajectory prediction model according to an embodiment of the present disclosure;

fig. 8 is a schematic architecture diagram of an intelligent home system provided in the embodiment of the present application;

fig. 9 is a flowchart of a training method of a face recognition model according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a control device of smart home equipment according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The intelligent home system is characterized in that a home is used as a platform, facilities related to home life are integrated by utilizing a comprehensive wiring technology, a network communication technology, a safety precaution technology, an automatic control technology, an audio and video technology and the like, an efficient management system for home facilities and family schedule affairs is constructed, home safety, convenience and comfortableness are improved, and an environment-friendly and energy-saving living environment is realized.

In an intelligent home scene, the central control system is respectively connected with control systems of various intelligent household appliances, such as a light control system, a home theater system, an electric curtain system, a home audio system, a home security system, a face recognition access control system and an air-conditioning intelligent control system. And the control system of each intelligent household appliance is controlled respectively, so that the intelligent control of each intelligent household appliance can be realized.

Illustratively, a plurality of cameras can be distributed and arranged at any position in residential spaces such as bedrooms, living rooms, kitchens and toilets in a residential scene, the cameras are used for collecting track videos of users, and more intelligent home service can be provided for the users by mining and analyzing the track videos of the users. For example, when the user trajectory behavior habit is predicted according to the user trajectory video, and the trajectory behavior habit that the user wants to go to bed to sleep is obtained, the curtain is controlled to be closed through the electric curtain system, or the curtain is controlled to be opened when the trajectory behavior habit that the user wants to get up is predicted. For another example, the air conditioner is controlled to be turned on by the air conditioner control system before the user returns home every day according to the prediction that the air conditioner needs to be turned on when the user returns home 19:00 every day according to the track videos of the user for a plurality of continuous days.

Fig. 1 is a schematic architecture diagram of an intelligent home system provided in an embodiment of the present application. As shown in fig. 1, the system includes: the system comprises a user track acquisition unit 11, a face image acquisition unit 12 and a server 13, wherein the user track acquisition unit 11 and the face image acquisition unit 12 are respectively in communication connection with the server 13. Optionally, the user trajectory acquisition unit 111 and the face image acquisition unit 112 may be cameras.

When the user is located in the residential space, the user trajectory capture units 11 distributed in the residential space capture a trajectory video of the user and upload the trajectory video to the server 13. The server 13 stores a trajectory prediction model 131, and predicts the trajectory of the user from the trajectory video through the trajectory prediction model 131 to obtain a trajectory prediction result.

In addition, the face image acquisition unit 12 acquires a face image and uploads the face image to the server 13. The server 13 stores a face recognition model 132, and performs face recognition on the user according to the face image through the face recognition model 132 to obtain a face recognition result.

The accuracy of the track prediction result and the accuracy of the face recognition result influence the use experience of the user on the intelligent home system, and therefore, how to improve the accuracy of the track prediction result and the accuracy of the face recognition result of the user is to improve the intelligence of the intelligent home system, provide better use experience for the user, and solve the problem at present.

In view of the above technical problems, the inventors of the present application propose the following technical idea: the attention mechanism module is added in the track prediction model of the intelligent home system to extract track behavior characteristics through the plurality of the circulating neural networks and the attention mechanism module which are alternately connected, so that the dependency relationship between long-term track behavior habits and short-term track behavior habits of a user can be captured, and the attention mechanism module is added in the face recognition model to extract richer face characteristic information for face recognition, so that the track prediction precision and the face recognition precision are improved.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of a control method for smart home devices according to an embodiment of the present application. As shown in fig. 2, the method for controlling the smart home device includes the following steps:

s201, acquiring a track video and/or a face image of a user; the track video comprises a plurality of frames of track images.

The execution subject of the method of the present embodiment may be a server as shown in fig. 1.

Wherein, step S201 includes: the server acquires a track video of the user from the user track acquisition unit 11 and/or acquires a face image of the user from the face image acquisition unit 12.

For example, when a user is in a home scene, cameras distributed in any position in a residential space such as a bedroom, a living room, a kitchen, a bathroom and the like can acquire track videos of the user. When the user brushes the face at the access control system, the camera at the access control system can acquire the face image of the user. Both the track video and the face image can be used as behavior data of the user. The track video is data used for expressing track behavior habits of the user, and the face image is image data generated by face brushing behaviors of the user.

S202, inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting a face image into a pre-trained face recognition model to obtain a face recognition result of the user.

The trajectory prediction model comprises a plurality of sequential and overlapped and connected Recurrent Neural Network (RNN) modules, a first attention mechanism module and a gate control unit; the first attention mechanism module is used for extracting the characteristics of an interest area in the first track characteristic diagram by adopting an attention mechanism to obtain a second track characteristic diagram. The second track feature map comprises sub-feature maps of a plurality of channels, and the sub-feature maps of the plurality of channels are different in importance degree in the track image. The gate control unit is used for capturing short-term track behavior habits and long-term track behavior habits in the track video.

In this embodiment, a cooperative attention mechanism is introduced into the recurrent neural network module, so that the dependency relationship between the short-term context information and the long-term context information of the user data can be obtained. When the user data is the track video, the dependency relationship between the long-term track behavior habit and the short-term behavior habit of the user in the track video can be obtained. The dependency relationship between the long-term track behavior habit and the short-term behavior habit of the user can be understood as mining the track behavior habit of the user, and the method aims to understand the reason that the short-term track behavior habit is suddenly generated by the long-term track behavior habit of the user, so that the method better helps the model to predict the track behavior of the user, and improves the accuracy of a track prediction result.

The face recognition model comprises a plurality of residual convolutional neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the residual convolution neural network is used for extracting a first face characteristic image from the face image, the second attention mechanism module is used for extracting the characteristics of an interest area in the first face characteristic image by adopting an attention mechanism to obtain a second face characteristic image, and the information quantity of the face characteristics of the second face characteristic image is larger than that of the first face characteristic image.

Optionally, inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, where the track prediction result includes:

a1, performing feature extraction on the ith-1 output result of the ith-1 gate control unit output aiming at the track image of the t time step through an ith recurrent neural network module to obtain an ith first track feature map; i is an integer greater than or equal to 1 and less than or equal to a first preset value.

Each frame of track image in the track video corresponds to time, the frame of track image can be uniquely marked according to the time of each frame of track image, and the track image at the t time step is the track image at the t moment. The feature extraction of each frame of track image can be understood as extracting the position features of the user in each frame of track image, and the feature extraction of the multi-frame track image can be understood as extracting the track behavior habit of the user in the multi-frame track image.

Step a2, performing feature extraction on the ith first track feature map through an ith first attention mechanism module to obtain an ith second track feature map; the feature information of the second track feature map is larger than that of the first track feature map.

Wherein the first attention mechanism module is used for paying attention to the characteristics of the region of interest in the ith first trajectory feature map. Specifically, the ith first trajectory feature map may be understood as corresponding to a plurality of channels, each channel corresponds to a part of an image area in the ith first trajectory feature map, each channel has a different weight value, and when the ith first trajectory feature map passes through the plurality of channels, the ith second trajectory feature map may be obtained. The different weight values of the channels are used for reflecting the attention degree of the model to different image areas in the ith first track feature map, namely the model focuses more on the feature extraction of image areas useful for track prediction.

Step a3, selectively reserving the ith second track characteristic diagram through the ith gate control unit to obtain the ith output result; the ith output result includes the trajectory behavior habits of the user in the trajectory image at the t time step and the previous time step.

Illustratively, when at least two frames of track images are input, a track feature map sequence of at least two frames can be obtained, wherein the track feature map sequence of at least two frames comprises track behavior habits of users in the track images of at least two frames. With the increasing number of input track images, the RNN generates a gradient vanishing phenomenon, which results in a part of information being lost, and the lost information has an important meaning for subsequent track prediction. Therefore, the second trajectory feature map with higher contribution degree to trajectory prediction can be selectively reserved through the gate control unit, and the obtained trajectory behavior habit of the user is more accurate.

Fig. 3 is a schematic diagram of a recurrent neural network with a gating unit according to an embodiment of the present disclosure. As shown in fig. 3, for an RNN comprising an input layer, a hidden layer, a cyclic layer and an output layer, x is input, U is the weight matrix from the input layer to the hidden layer, V is the weight matrix from the hidden layer to the output layer, s is the hidden state, and o is the output. Wherein the value s of the hidden layer depends not only on the input x of the current time, but also on the value s of the previous hidden layer, and W is a weight matrix from the value s of the previous hidden layer to the value of the current hidden layer. The network structure to the right of the arrow can be obtained by expanding the network structure according to the time line, which represents that the RNN receives the input x at the time t _t The value of the hidden layer is then s _t The output value is o _t And the value s of the layer is hidden _t Not only dependent on x _t Also depends on s _t-1 。

Wherein, the input of the reset gate and the input of the update gate are both the input x of the current time step _t Hidden state s from the previous time step _t-1 . And the updated door helps the model to determine how much past information is to be transferred into the future, or how much information from the previous time step and the current time step needs to be transferred.

In a trajectory prediction scenario, x _t The feature vector representing the t-th frame track image, i.e. the t-th component of the track video, is subject to a linear change (i.e. multiplied by the weight matrix W). That is, the value s of the hidden layer _t Not only dependent on the t-th frame track image x _t And also on the previous frame of the track image x _t-1 Value s of the hidden layer obtained by the hidden layer _t-1 ，s _t-1 I.e. the RNN memory, stores the information of the previous time step t-1, which also goes through a linear change, and the reset gate determines how to put the current frame trace image x into memory _t Value s of hidden layer corresponding to last time step _t-1 Combining, the updating gate adds the results of the two linear changes and passes through the Sigmoid activation function, and then the current frame track image x is determined _t The corresponding hidden layer should keep the current frame track image x _t The previous frame trace image corresponds to the hidden layer value.

Optionally, the gate control unit comprises a reset gate, which intuitively determines how to combine the new input information with the previous memory, and an update gate, which defines the amount of time the previous memory is saved to the current time step. In the present embodiment, the reset gate is used to capture short-term dependencies in the track video and the update gate is used to preserve long-term dependencies in the track video. Then the embodiment further includes: and controlling the combination of the ith second track characteristic diagram and a second track characteristic diagram memorized before the ith second track characteristic diagram by resetting the door to capture the short-term dependence relationship in the track video, and controlling the saving amount from the second track characteristic diagram memorized before the ith second track characteristic diagram to the ith second track characteristic diagram by updating the door to save the long-term dependence relationship in the track video. The short-term dependency relationship can be understood as a trace behavior habit feature of a user in a short term, and the long-term dependency relationship can be understood as a trace behavior habit feature of the user in a long term.

Specifically, when the reset gate is according to the ith second track characteristic diagram

(input of t time step), hidden state of t-1 time step memorized before ith second trajectory feature map

t-1 and a weight matrix corresponding to the reset gate are calculated to obtain an output value Rt of the reset gate and an output value Zt of the update gate; respectively can beExpressed as the following formula (1) and formula (2):

t-1

；（1）

wherein, the first and the second end of the pipe are connected with each other,

n is the number of samples, d is the input dimension;

t-1

h is the number of hidden nodes;

representing a weight matrix between the input data and the reset gate;

a weight matrix representing the weight between the old hidden state and the reset gate;

to reset the gate bias, a broadcast mechanism is used in the calculation.

t-1

；（2）

Wherein the content of the first and second substances,

n is the number of samples, d is the input dimension;

t-1 is the old hidden state;

representing a weight matrix between the input data and the update gate;

a weight matrix representing the weight between the old hidden state and the update gate;

to update the door bias, a broadcast mechanism is used in the calculation process to determine.

Then, according to the ith second track feature map

(input at t time step), reset gate output value

The hidden state of t-1 time step memorized before the ith second track characteristic graph

t-1, determining candidate hidden states of the current time step t

Specifically, it can be expressed as the following formula (3):

(Rt⨀Ht-1)

；（3）

wherein the content of the first and second substances,

n is the number of samples, d isInputting dimensions; rt is the output value of the reset gate;

t-1 is the old hidden state;

a weight matrix representing between the input data and the candidate hidden states;

a weight matrix representing between the old hidden state and the candidate hidden state;

As can be seen from the above-mentioned formula,

the element near 0 in (c) is the hidden state to be reset,

the element near 1 in (1) is the hidden state which is not reset.

And according to the ith second track feature map

t-1, determining a new hidden state corresponding to the current time step t by the hidden state updating mechanism, which can be specifically expressed as the following formula (4):

t-1

；（4）

n is the number of samples, d is the input dimension;

t-1 is the old hidden state;

Then, it is determined which of the new hidden states is only a copy of the old hidden state and how much of the candidate hidden state will be used according to the update gate, which is determined according to the following formula (5):

Ht=

(Zt⨀Ht-1+(1-Zt)⨀

；（5）

in equation (5), when the element of the update gate Zt is close to 1, the new hidden state is maintained as the old hidden state, i.e., the input data from the current time step t

The information (second trajectory feature map) of (a) can be ignored; when the element of the update gate Zt is close to 0, the new hidden state will be close to 0Candidate hidden states.

As can be seen from the above process, if the value of the update gate is close to 1 in all time steps of the entire track video sub-sequence, the old hidden state Ht-1 at the initial time step can be saved and transferred to the hidden state Ht at the subsequent time step regardless of the length of the track video sub-sequence. In this way, the gradient vanishing problem in RNN can be solved by resetting and updating the gates, and the dependence of large time span in the trace video sequence can be better captured.

Step a4, taking the ith output result as the input of the (i + 1) th recurrent neural network module, adding 1 to i, determining i +1 as a new i, returning to the step a1 until the value of i reaches a first preset value, and executing the step a 5;

and a step a5, performing an operation of adding 1 to t, determining t +1 as a new t, and returning to the step a1 until all processing of the multi-frame track images is completed, so as to obtain a track prediction result of the user.

The steps a1-a5 can be understood as that the trajectory image at the time step t passes through the recurrent neural network module 1, the first attention mechanism module 1, the gate control unit 1, the recurrent neural network module 2, the first attention mechanism module 2, the gate control unit 2 … … recurrent neural network module N, the first attention mechanism module N and the gate control unit N in sequence to perform feature extraction. And then, sequentially passing the track image of the t +1 time step through a recurrent neural network module 1, a first attention mechanism module 1, a gate control unit 1, a recurrent neural network module 2, a first attention mechanism module 2, a gate control unit 2 … … recurrent neural network module N, a first attention mechanism module N and a gate control unit N to perform feature extraction, repeating the processes, and obtaining a track prediction result of the user after the track images of all the time steps are processed by the processes, namely predicting the track behavior of the user according to the multi-frame track image.

It should be noted that each gate control unit refers to the hidden state before t time step (i.e. the trajectory prediction result before t time step) when processing the trajectory feature map output by the first attention mechanism module connected to the gate control unit for the trajectory image of the current time step, and the specific reference process can be referred to the description of the principle of the gate control unit in the above embodiment.

The depth of the track prediction model is determined by the value size of the first preset value, and the greater the depth of the track prediction model is, the better the track characteristic extraction effect is, and more abundant track characteristic information can be extracted. The greater the depth of the trajectory prediction model is, the greater the complexity of the model is, and the greater the computational complexity is, so that the depth of the trajectory prediction model is not too large or too small, and a person skilled in the art can set the depth of the trajectory prediction model according to actual requirements.

Fig. 4 is a schematic diagram illustrating a principle of trajectory prediction according to an embodiment of the present application. As shown in fig. 4, a feature vectorization process is performed on a track video of a user to obtain a track video feature vector, where the track video feature vector includes feature vectors of multiple frames of track images. Then, the trajectory video feature vectors are sequentially processed through the recurrent neural network module 1, the first attention mechanism module 1, the recurrent neural network module 2, the first attention mechanism modules 2 and … …, the recurrent neural network module N and the first attention mechanism module N which are sequentially connected, so as to obtain a trajectory prediction result of the user. Specifically, the track video feature vector is firstly input into the recurrent neural network module 1 to perform track feature extraction, a first track feature map of the track video output by the recurrent neural network module 1 is then input into the first attention mechanism module 1, and the first attention mechanism module 1 continuously performs track feature extraction on the first track feature map to obtain a second track feature map. The second trajectory feature diagram output by the first attention mechanism module 1 is input into the gate control unit 1, the gate control unit 1 selectively retains the second trajectory feature diagram, and sequentially inputs the result of the selective retention into the recurrent neural network module 2, the first attention mechanism module 2 and the gate control unit 2, wherein the recurrent neural network module 2, the first attention mechanism module 2 and the gate control unit 2 are the same as the recurrent neural network module 1, the first attention mechanism module 1 and the gate control unit 1 in execution steps, and the difference lies in that input data is different. And by analogy, trajectory feature extraction is sequentially carried out in each cyclic neural network module and each first attention mechanism module according to the sequence from left to right in the drawing, selective reservation is carried out through the gate control units, and when the trajectory prediction result passes through the last gate control unit N, the hidden state N output by the gate control unit N is the trajectory prediction result of the user.

Optionally, the inputting the face image into a pre-trained face recognition model to obtain a face recognition result of the user includes:

b1, extracting the features of the face image of the user through a jth residual convolution neural network to obtain a jth first face feature map; j is an integer greater than or equal to 1 and less than or equal to a second predetermined value.

B2, performing feature extraction on the jth first face feature map through a jth second attention mechanism module to obtain a jth second face feature map; the feature information of the second face feature map is larger than that of the first face feature map.

And b3, adding 1 to j, determining j +1 as new j, returning to the step of extracting features through the jth residual convolution neural network module until the value of j reaches a second preset value, and obtaining a target face feature map of the user.

The depth of the face recognition model is determined by the value of the second preset numerical value, and the greater the depth of the face recognition model is, the better the face feature extraction effect is, and more abundant face feature information can be extracted. The greater the depth of the face recognition model is, the greater the complexity of the model is, and the greater the computational complexity is, so that the depth of the face recognition model is not too large or too small, and the person skilled in the art can set the depth of the face recognition model according to actual requirements.

And b4, performing face recognition according to the target face feature image of the user, and outputting a face recognition result.

Optionally, the face recognition according to the target face feature map of the user includes: determining the similarity between a target face feature image of a user and a pre-stored face feature image of the user; and if the similarity is greater than or equal to the preset similarity, determining that the face recognition result is that the user passes the face recognition, namely the user is a legal user, otherwise, determining that the face recognition result is that the face recognition does not pass the face recognition, namely the user is an illegal user.

Fig. 5 is a schematic diagram of a principle of face recognition provided in an embodiment of the present application. As shown in fig. 5, feature vectorization processing is performed on the face image of the user to obtain a face feature vector. Then, the face feature vectors are sequentially processed through the residual convolutional neural network module 1, the second attention mechanism module 1, the residual convolutional neural network module 2, the second attention mechanism modules 2 and … …, the residual convolutional neural network module N and the second attention mechanism module N which are sequentially connected, so as to obtain a face recognition result of the user. Specifically, the face feature vector is firstly input into the residual convolutional neural network module 1 to extract the face features, the face feature extraction result 1 output by the residual convolutional neural network module 1 is then input into the second attention mechanism module 1, the face feature extraction result 1 is continuously extracted by the second attention mechanism module 1, the face feature extraction result 1 output by the second attention mechanism module 1 is continuously input into the residual convolutional neural network module 2, and by analogy, the face features are sequentially extracted in each residual convolutional neural network module and each second attention mechanism module according to the sequence from left to right in the drawing, and the face feature extraction result N output when the second attention mechanism module N performs feature extraction is the face recognition result.

And S203, controlling the intelligent household equipment according to the track prediction result and/or the face recognition result.

Illustratively, when the server predicts that the user goes to bed according to the track video of the user, the curtain is controlled to be closed through the electric curtain system, or when the user is predicted to get up, the curtain is controlled to be opened. For another example, if the air conditioner is predicted to be turned on when the user returns to home at 19:00 days according to the track videos of the user for a plurality of continuous days, the air conditioner is controlled to be turned on by the air conditioner control system before the user returns to home every day.

Or controlling the gate to be opened when the user is identified to be a legal user according to the face identification result, or controlling the gate to be closed when the user is identified to be an illegal user according to the face identification result.

The embodiment obtains a track video and/or a face image of a user; the track video comprises a plurality of frames of track images; inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting a face image into a pre-trained face recognition model to obtain a face recognition result of the user; the trajectory prediction model comprises a plurality of cyclic neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the cyclic neural network module is used for extracting a first track characteristic diagram from each frame of track image, the first attention mechanism module is used for extracting the characteristics of an interested area from the first track characteristic diagram, and the gate control unit is used for capturing the track behavior habit of a user in the track video; the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the second attention mechanism module is used for extracting the face features of the first face feature image extracted by the residual convolutional neural network. Due to the fact that the attention mechanism module is added in the track prediction model and the face recognition model, richer track characteristic information and face characteristic information can be extracted, accuracy of user track prediction and face recognition is improved, intelligence of the intelligent home system is improved, better intelligent home service is provided for users, and intelligent home use experience of the users is improved. The gate control unit can well solve the problem of gradient disappearance in the recurrent neural network module, so that long-term behavior habits and short-term behavior habits in the track video are captured.

The above embodiments describe the application process of the trajectory prediction model. Before the trajectory prediction model is applied to perform trajectory prediction, the trajectory prediction model needs to be trained. The following describes the training process of the trajectory prediction model in detail with reference to the accompanying drawings:

fig. 6 is a flowchart of a training method of a trajectory prediction model according to an embodiment of the present disclosure. As shown in fig. 6, the method includes the steps of:

s601, obtaining a track video sample and a first label of a user, wherein the track video sample comprises multiple frames of track image samples, and the first label comprises a position of the user marked in the multiple frames of track image samples.

In this embodiment, the historical track video of the smart home user may be used as the track video sample, or the track video of another user obtained from the public original data set. The track video sample is a series of track image sample sequences including track behaviors of the user, and a situation that the user does not appear in a part of frame track image samples may exist in a plurality of frame track image samples of the track video sample, so that the first label in the step is the user marked in each frame track image sample in the part of frame track image samples or all frame track image samples in the plurality of frame track image samples and the position of the user in the track image samples.

The first label is the track behavior habit of the user, and the track behavior habit comprises the position of the user in the multi-frame track image sample. The track behavior habit of the user can be obtained by marking the positions of the user on part of or all of the frame track image samples in the multi-frame track image samples.

S602, training a plurality of sequential and overlapped connected recurrent neural network modules, a first attention mechanism module and a gate control unit according to a track video sample to obtain a track training result; the track training result comprises a prediction result of the user position in the track image sample of the next frame of the track image sample.

It should be understood that, assuming that the multi-frame track image samples are 0-T frames, the track training result includes a prediction result of the user position in the track image sample of the next frame of each frame of track image sample in the T-1 frame track image samples. Namely, the next frame track image of each frame track image in the T-1 frame track image sample can be used as the first label of each frame track image, and the T-th frame track image sample is only used as the first label of the T-1 frame track image sample, and is not input into the RNN, the first attention mechanism module and the gate control unit for feature extraction.

The gate control unit is used for processing the related dependency relationship of the trace image samples subjected to feature extraction by the RNN, namely the trace behavior habit with stronger correlation is reserved according to the correlation between the long-term trace behavior habit and the short-term behavior habit, so that the model is guided to be trained more accurately, and the trace can be predicted on the existing trace image better.

S603, adjusting parameters of the plurality of sequential and overlapped cyclic neural network modules, the first attention mechanism module and the gate control unit according to the difference between the track training result of the multi-frame track image and the first label until the training is finished to obtain a track prediction model.

Specifically, if the track sample image currently participating in training is the t-th frame track image, the first label corresponding to the t-th frame track image is the t + 1-th frame track image. In this step, parameter adjustment is performed according to the difference between the trajectory training result obtained from the trajectory image of the previous t frame (the user position at the t +1 th frame predicted from the trajectory image of the previous t frame) and the user position in the trajectory image of the t +1 th frame.

In this step, along with the increase of the number of the track image sample frames participating in training, the long-term track behavior path (long-term track behavior habit) and the short-term track behavior path (short-term track behavior habit) of the user can be obtained by correlating the positions of the user in the multi-frame track image samples.

When the training is finished, a track prediction model can be obtained according to the parameters of each cyclic neural network module, each first attention mechanism module and each gate control unit at the training end.

Optionally, training a plurality of sequential and overlapping connected recurrent neural network modules, first attention mechanism modules and gate control units according to the trajectory video sample to obtain a trajectory training result, including:

step c1, training the ith-1 training result output by the ith-1 gate control unit aiming at the trace image sample of the t time step through the ith cyclic neural network module to obtain the ith first trace training result; the first track training result comprises track behavior habit information of the user, which is represented by all track image samples at the t time step and the previous time step.

Step c2, training the ith first attention mechanism module according to the ith first track training result to obtain an ith second track training result; and the trajectory behavior habit information of the user in the ith second trajectory training result is larger than that of the ith first trajectory training result.

Step c3, training the ith gate control unit according to the ith second track training result to obtain an ith training result; the ith training result comprises the trajectory behavior habits of the user in the trajectory image samples of the t time step and the previous time step.

Step c4, taking the ith training result as the input of the (i + 1) th recurrent neural network module, adding 1 to i, determining i +1 as a new i, returning to the step c1 until the value of i reaches a first preset value, and executing the step c 5;

step c5, training according to the new track characteristic image sample of the t time step until all the training of the multi-frame track image samples is finished, and obtaining a track training result of the user; the track training result comprises track behavior habits of users in the multi-frame track image samples.

Optionally, before step S602, performing feature vectorization processing on the track video sample to obtain a track video feature vector; the track video feature vector comprises a feature vector of each frame of track image in a plurality of frames of track images; and training is carried out according to the track video feature vector.

Fig. 7 is a schematic diagram illustrating a principle of a training method of a trajectory prediction model according to an embodiment of the present application. As shown in fig. 7, the feature vectorization processing is performed on the track video samples to obtain track video feature vectors, and for the feature vectors of each frame of track image samples, the feature vectors of the track video samples are sequentially processed through the recurrent neural network module 1, the first attention mechanism module 1, the recurrent neural network module 2, the first attention mechanism modules 2 and … …, the recurrent neural network module N, and the first attention mechanism module N, which are sequentially connected, so as to obtain the track prediction training result of the user. For the training process of the track video sample, reference may be made to a process of processing the track video by the recurrent neural network module 1, the first attention mechanism module 1, the recurrent neural network module 2, the first attention mechanism modules 2 and … …, the recurrent neural network module N, and the first attention mechanism module N, which are sequentially connected in fig. 3, and details are not repeated here.

In one or more embodiments of the present application, optionally, adjusting parameters of a plurality of sequential and overlapping connected recurrent neural network modules, a first attention mechanism module, and a gate control unit according to a difference between a trajectory training result of a plurality of frames of trajectory images and a first label until training is finished to obtain a trajectory prediction model, including:

and A1, adjusting parameters of a plurality of sequential and overlapped cyclic neural network modules, a first attention mechanism module and a gate control unit according to the difference between the track training result of the multi-frame track image and the label until a preset first training end condition is reached, and obtaining a middle track prediction model.

The first training end condition may be that the objective function value is smaller than a preset value, or that the number of training times reaches a preset number.

After the neural network is trained according to the track video sample to obtain the intermediate track prediction model, the intermediate track prediction model can be optimized, in order to protect privacy data of a user, the intermediate track prediction model can be trained and tested in the local server, a test result is uploaded to the cloud server, model optimization is carried out in the cloud server, optimized model parameters are downloaded by the local server and are continuously trained, and the track prediction model is finally obtained.

Fig. 8 is an architecture diagram of the smart home system provided in the embodiment of the present application, and as shown in fig. 8, the smart home system forms a point-edge-cloud architecture mode, each smart home device that generates user data in the smart home is a point, the local server is an edge, the cloud server is a cloud, a track video generated by each smart home device is analyzed and processed in the local server, and a processing result is uploaded to the cloud server, so that a possibility of leakage of user privacy data is reduced. The specific implementation process comprises the following steps A2 and A3:

a2, testing the intermediate track prediction model through a local server of the intelligent home system according to the obtained track test data set, and uploading the test result to a central cloud server, so that when the central cloud server determines that the performance parameters of the intermediate track prediction model do not meet the preset performance parameter requirements according to the test result, the model parameters of the intermediate track prediction model are optimized, the optimized model parameters are obtained, and the optimized model parameters are returned to the local server; the trajectory test data set includes a plurality of frames of trajectory images for model testing.

The trajectory test data set in this embodiment is used to test the performance of the intermediate trajectory prediction model, so as to test the accuracy of the trajectory prediction of the trajectory test data set, and if the accuracy of the trajectory prediction is less than the preset accuracy, it is determined that the performance parameter of the intermediate trajectory prediction model does not meet the requirement of the preset performance parameter, otherwise, it is determined that the performance parameter of the intermediate trajectory prediction model meets the requirement of the preset performance parameter.

And step A3, receiving the optimized model parameters, and continuing training the intermediate trajectory prediction model according to the optimized model parameters until a preset second training end condition is reached to obtain the trajectory prediction model.

The second training end condition may be that the performance parameter of the intermediate trajectory prediction model meets a preset performance parameter requirement.

According to the embodiment, the track prediction model is trained and tested on the local server, and the test result is uploaded to the cloud server for model tuning, so that the user privacy data are well protected.

The above embodiments describe the application process of the face recognition model. Before the face recognition model performs face recognition, the face recognition model needs to be trained. The following describes the training process of the face recognition model in detail with reference to the accompanying drawings:

fig. 9 is a flowchart of a training method of a face recognition model according to an embodiment of the present application. As shown in fig. 9, the face recognition model is obtained by training through the following method steps:

s901, acquiring a plurality of face image samples and a second label of a user; the second label includes a face of the user annotated in the face image sample.

In this embodiment, a plurality of facial images generated by the historical face brushing behavior of the smart home user may be used as a plurality of facial image samples, or facial images of other users may be acquired from a public data set. And when facial images of other users are acquired from the disclosed data set, facial images of different users may be acquired.

S902, aiming at each face image sample in the face image samples, training a plurality of sequentially and overlapped connected residual convolution neural network modules and a second attention mechanism module according to the face image sample to obtain a face training result of the face image.

Optionally, step S902 includes:

d1, performing feature extraction on the j-1 th second face feature map sample through the j-th group of residual convolutional neural networks aiming at each face image in the plurality of face images to obtain a j-th first face feature map sample; j is an integer greater than or equal to 1;

d2, performing feature extraction on the jth first face feature map through the jth group of attention mechanism modules to obtain a jth second face feature map sample; the face feature information of the second face feature map sample is larger than the face feature information of the first face feature map sample.

And d3, performing operation of adding 1 to j, determining j +1 as new j, returning to the step of performing feature extraction on the residual convolutional neural network module of the jth group until the value of j reaches a second preset value, and obtaining the training result of each face image in the plurality of face images.

And S903, adjusting model parameters of a plurality of sequentially and overlapped connected residual convolution neural network modules and second attention mechanism modules according to the difference between the face training result of the face image and a second label corresponding to the face image, and repeating the training until the training is finished to obtain the face recognition model.

In the embodiment, the residual convolutional neural network is adopted for feature extraction, and the network input of the upper layer is connected with the network output of the lower layer through layer jump connection, so that the problem of gradient dissipation can be reduced, and the feature extraction can be better performed. The output of the residual error network is further optimized through an attention mechanism, and the attention mechanism module has the effect that the attention of the process of feature extraction is more focused on the region where the face is located in the face image, and the attention of the process of feature extraction on the non-face region is weakened, so that more excellent feature information can be obtained, and a more accurate recognition result can be obtained.

In one or more embodiments of the present application, optionally, adjusting model parameters of a plurality of sequentially and overlappingly connected residual convolutional neural network modules and a second attention mechanism module according to a difference between a face training result of the face image and a second label corresponding to the face image, and repeating the training until the training is finished to obtain a face recognition model, including:

and step B1, adjusting model parameters of the residual convolution neural network module and the second attention mechanism module which are sequentially and in overlapped connection according to the face training result, and repeating the training until a preset third training end condition is reached to obtain an intermediate face recognition model.

After the neural network is trained according to the face image sample to obtain an intermediate face recognition model, the intermediate face recognition model can be optimized, in order to protect privacy data of a user, the intermediate face recognition model can be trained and tested in the local server, a test result is uploaded to the cloud server, model optimization is carried out in the cloud server, optimized model parameters are downloaded by the local server and training is continued, and the face recognition model is finally obtained. Referring to fig. 8, the face images generated by the smart home devices are analyzed and processed in the local server, and the processing result is uploaded to the cloud server, so that the possibility of leakage of the privacy data of the user is reduced.

Step B2, testing the intermediate face recognition model according to the obtained face test data set through a local server of the intelligent home system, and uploading the test result to the cloud server, so that when the cloud server determines that the performance parameters of the intermediate face recognition model do not meet the requirements of preset performance parameters according to the test result, the model parameters of the intermediate face recognition model are optimized, and the optimized model parameters are returned to the local server; the face test data set includes a plurality of face image test samples for model testing.

And step B3, receiving the optimized model parameters, and continuing training the intermediate face recognition model according to the optimized model parameters until a preset fourth training end condition is reached to obtain the face recognition model.

Optionally, the preset fourth training end condition may be that the performance parameter of the intermediate face recognition model meets the requirement of the preset performance parameter.

In addition, in one or more embodiments of the present application, optionally, the collected user data may also be sent to a local server of the smart home system through a 5G mobile data network and a wifi network. The embodiment adopts an application layer, a transmission layer, a network layer and a physical layer 4-layer protocol stack model to realize a multi-channel technology, so that the 5G cellular channel and the wifi channel simultaneously carry out data transmission, the bandwidth of a double physical link is realized, the network transmission rate is improved, the transmission loss of user data between different devices is reduced, and the end-to-end experience of a user is further improved. In addition, by utilizing the characteristic of low delay of the 5G technology, different devices can be well interconnected, so that a series of problems caused by high delay rate are solved.

Similarly, the verification process of the pre-trained behavior prediction model may also be performed in the local server according to the behavior verification data, and the verification result is uploaded to the cloud server, so that when the cloud server determines that the performance parameter of the pre-trained behavior prediction model does not meet the requirement of the preset performance parameter according to the verification result, the model parameter of the pre-trained behavior prediction model is optimized, and the optimized model parameter is returned to the local server. And the local server downloads the optimized model parameters from the cloud server, and continues to train the pre-trained behavior prediction model according to the optimized model parameters until the performance parameters of the pre-trained behavior prediction model meet the requirements of preset performance parameters to obtain the behavior prediction model.

Optionally, the original data used by the control method for the smart home device according to the application is collected according to specific task requirements. For example, the smart home system includes a user behavior trajectory prediction model and a face recognition model, and the collected data includes user behavior trajectory data and face data. The relevant data may be obtained from raw data sets published on the web or from historical data of the user. The tool for making the label and the real value graph can adopt image processing tools such as LabelMe and photoshop. And unifying the data size of the processed data, and performing pixel normalization. Then, by image data expansion methods such as image inversion and gray value transformation, the method comprises the following steps of 1: 10 to expand the original data to enrich the sample size.

On the basis of the foregoing method embodiment, fig. 10 is a schematic structural diagram of a control device of smart home equipment according to an embodiment of the present application. As shown in fig. 10, the control device of the smart home device includes: the device comprises an acquisition module 101, an input module 102 and a control module 103; the acquisition module 101 is configured to acquire a track video and/or a face image of a user; the track video comprises a plurality of frames of track images; an input module 102, configured to input the trajectory video into a pre-trained trajectory prediction model to obtain a trajectory prediction result of the user, and/or input the face image into a pre-trained face recognition model to obtain a face recognition result of the user; the control module 103 is used for controlling the intelligent household equipment according to the track prediction result and/or the face recognition result; the trajectory prediction model comprises a plurality of recurrent neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the recurrent neural network module is used for extracting a first track feature map from each frame of track image, the first attention mechanism module is used for extracting the features of the region of interest in the first track feature map, and the gate control unit is used for capturing the track behavior habit of the user in the track video; the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the second attention mechanism module is used for extracting the characteristics of the interest region in the first face characteristic diagram output by the residual convolutional neural network module.

Optionally, the inputting module 102 inputs the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and specifically includes: performing feature extraction on the ith-1 output result of the ith-1 gate control unit output aiming at the track image at the t time step through an ith recurrent neural network module to obtain an ith first track feature map; i is an integer which is greater than or equal to 1 and less than or equal to a first preset value; performing feature extraction on the ith first trajectory feature map through an ith first attention mechanism module to obtain an ith second trajectory feature map; the characteristic information of the second track characteristic diagram is larger than that of the first track characteristic diagram; selectively reserving the ith second track characteristic diagram through an ith gate control unit to obtain an ith output result; the ith output result comprises the trajectory behavior habits of the user in the trajectory images of the t time step and the previous time step; and taking the ith output result as the input of an (i + 1) th recurrent neural network module, determining i +1 as a new i, continuously executing the step of performing feature extraction on the track image at the t time step until the value of i reaches the first preset value, determining t +1 as a new t, performing feature extraction on the track image at the new t time step until the multi-frame track image finishes feature extraction, and obtaining the track prediction result of the user.

Optionally, the apparatus further includes a first training module 104, configured to train and obtain the trajectory prediction model by the following method steps: acquiring a track video sample and a first label of the user, wherein the track video sample comprises a plurality of track image samples, and the first label comprises the position of the user marked in the plurality of track image samples; training a plurality of sequential and overlapped connected recurrent neural network modules, a first attention mechanism module and a gate control unit according to the track video sample to obtain a track training result; the track training result comprises a prediction result of the user position in a track image sample of the next frame of the track image sample; and adjusting parameters of a plurality of sequentially and overlappingly connected recurrent neural network modules, a first attention mechanism module and a gate control unit according to the difference between the track training result of the multi-frame track image and the first label until the training is finished to obtain the track prediction model.

Optionally, the first training module 104 is specifically configured to: training the ith-1 training result output by the ith-1 gate control unit aiming at the trace image sample of the t time step through an ith cyclic neural network module to obtain an ith first trace training result; the first track training result comprises track behavior habit information of the user, which is represented by all track image samples of the t time step and the previous time step; training an ith first attention mechanism module according to the ith first track training result to obtain an ith second track training result; the trajectory behavior habit information of the user in the ith second trajectory training result is larger than that of the ith first trajectory training result; training an ith gate control unit according to the ith second track training result to obtain an ith training result; the ith training result comprises the trajectory behavior habits of the user in the trajectory image samples of the t time step and the previous time step; taking the ith training result as the input of an (i + 1) th cyclic neural network module, determining i +1 as new i, continuously performing feature extraction on the (i-1) th output result output by the (i-1) th gate control unit aiming at the track image at the t time step through the ith cyclic neural network module until the value of i reaches the first preset value, determining t +1 as new t, and training according to the new track feature map sample at the t time step until all the training of the multi-frame track image samples is completed to obtain the track training result of the user; and the track training result comprises the track behavior habit of the user in the multi-frame track image sample.

Optionally, the first training module 104 is specifically configured to: adjusting parameters of a plurality of sequentially and overlapped cyclic neural network modules, a first attention mechanism module and a gate control unit according to the difference between the track training result of the multi-frame track image and the first label until a preset first training end condition is reached to obtain a middle track prediction model; testing the intermediate track prediction model through a local server of the intelligent home system according to the obtained track test data set, and uploading a test result to a cloud server, so that the cloud server optimizes the model parameters of the intermediate track prediction model when determining that the performance parameters of the intermediate track prediction model do not meet the requirements of preset performance parameters according to the test result, and returns the optimized model parameters to the local server; the track test data set comprises a plurality of frames of track image test samples for model test; and receiving the optimized model parameters, and continuing training the intermediate trajectory prediction model according to the optimized model parameters until a preset second training end condition is reached to obtain the trajectory prediction model.

Optionally, the inputting module 102 inputs the face feature vector corresponding to the face image into a pre-trained face recognition model to obtain the face recognition result of the user, and specifically includes: performing feature extraction on the face image of the user through a jth residual convolution neural network to obtain a jth first face feature map, wherein j is an integer which is greater than or equal to 1 and less than or equal to a second preset value; performing feature extraction on the jth first face feature map through a jth second attention mechanism module to obtain a jth second face feature map; the feature information of the second face feature map is larger than that of the first face feature map; adding 1 to j, determining j +1 as new j, and returning to the step of extracting features through a jth residual convolutional neural network module until the value of j reaches the second preset value to obtain a target face feature map of the user; and carrying out face recognition according to the target face feature image of the user and outputting a face recognition result.

Optionally, the apparatus further includes a second training module 105, specifically configured to: acquiring a plurality of face image samples and a second label of the user; the second label comprises the face of the user marked in the face image sample; aiming at each face image sample in the face image samples, training a plurality of sequentially and overlapped connected residual convolution neural network modules and a second attention mechanism module according to the face image sample to obtain a face training result of the face image; and adjusting model parameters of a plurality of sequentially and overlapped connected residual convolution neural network modules and second attention mechanism modules according to the difference between the face training result of the face image and the second label corresponding to the face image, and repeating the training until the training is finished to obtain the face recognition model.

Optionally, the second training module 105 is specifically configured to: performing feature extraction on a j-1 th second face feature map sample through a j-th residual convolutional neural network aiming at each face image in the plurality of face images to obtain a j-th first face feature map sample; j is an integer greater than or equal to 1; performing feature extraction on the jth first face feature map through a jth second attention mechanism module to obtain a jth second face feature map sample; the face feature information of the second face feature map sample is larger than the face feature information of the first face feature map sample; and adding 1 to the j, determining the j +1 as a new j, and returning to the step of extracting the features through the jth residual convolution neural network module until the value of the j reaches the second preset value to obtain the training result of each face image in the plurality of face images.

Optionally, the second training module 105 is specifically configured to: adjusting model parameters of a plurality of sequentially and overlapped connected residual convolution neural network modules and a second attention mechanism module according to the face training result, and repeating training until a preset third training end condition is reached to obtain an intermediate face recognition model; testing the intermediate face recognition model according to the obtained face test data set through a local server of the intelligent home system, and uploading a test result to a cloud server, so that the cloud server optimizes model parameters of the intermediate face recognition model when determining that the performance parameters of the intermediate face recognition model do not meet the requirements of preset performance parameters according to the test result, and returns the optimized model parameters to the local server; the face test data set comprises a plurality of face image test samples for model test; and receiving the optimized model parameters, and continuing training the intermediate face recognition model according to the optimized model parameters until a preset fourth training end condition is reached to obtain the face recognition model.

Optionally, the obtaining module 101 is specifically configured to: acquiring a track video of the user from a track video acquisition unit through a 5G network and a wifi network; and/or acquiring the facial image of the user from a facial image acquisition unit through a 5G network and a wifi network.

The control device of the smart home device provided in the embodiment of the application can be used for executing the technical scheme of the control method of the smart home device in the embodiment, and the implementation principle and the technical effect are similar, which are not described herein again.

It should be noted that the division of each module of the above apparatus is only a logical division, and all or part of the actual implementation may be integrated into one physical entity or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. All or part of the modules can be integrated together or can be independently realized. The processing element here may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 11, the electronic device may include: transceiver 111, processor 112, memory 113.

The processor 112 executes computer-executable instructions stored in the memory, causing the processor 112 to perform the scheme in the embodiments described above. The processor 112 may be a general-purpose processor including a central processing unit CPU, a Network Processor (NP), and the like; but also a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components.

A memory 113 is coupled to the processor 112 via the system bus and is in communication with each other, the memory 113 storing computer program instructions.

The transceiver 111 may be used to acquire a track video and/or a face image of the user, and send the test results to the cloud server, and download the optimization parameters from the cloud server.

The system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The transceiver is used to enable communication between the database access device and other computers (e.g., clients, read-write libraries, and read-only libraries). The memory may include Random Access Memory (RAM) and may also include non-volatile memory (non-volatile memory).

The embodiment of the application further provides a chip for running the instruction, and the chip is used for executing the technical scheme of the control method of the intelligent household equipment in the embodiment.

An embodiment of the present application further provides a computer-readable storage medium, where a computer instruction is stored in the computer-readable storage medium, and when the computer instruction runs on a computer, the computer is enabled to execute the technical solution of the control method for smart home devices in the foregoing embodiment.

The embodiment of the present application further provides a computer program product, where the computer program product includes a computer program, the computer program is stored in a computer-readable storage medium, at least one processor can read the computer program from the computer-readable storage medium, and when the at least one processor executes the computer program, the technical solution of the control method for the smart home device in the foregoing embodiment can be implemented.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A control method of intelligent household equipment is characterized by comprising the following steps:

acquiring a track video and/or a face image of a user; the track video comprises a plurality of frames of track images;

inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting the face image into a pre-trained face recognition model to obtain a face recognition result of the user;

controlling the intelligent household equipment according to the track prediction result and/or the face recognition result;

the trajectory prediction model comprises a plurality of cyclic neural network modules, a first attention mechanism module and a gate control unit which are sequentially and alternately connected; the recurrent neural network module is used for extracting a first track feature map from each frame of track image, the first attention mechanism module is used for extracting the features of the region of interest in the first track feature map, and the gate control unit is used for capturing the track behavior habit of the user in the track video;

the face recognition model comprises a plurality of residual convolution neural network modules and a second attention mechanism module which are sequentially and in overlapped connection; the second attention mechanism module is used for extracting the characteristics of the interest region in the first face characteristic diagram output by the residual convolutional neural network module.

2. The method of claim 1, wherein inputting the trajectory video into a pre-trained trajectory prediction model to obtain a trajectory prediction result of the user comprises:

performing feature extraction on the i-1 st output result of the i-1 st gate control unit output aiming at the track image of the t time step through the i-th recurrent neural network module to obtain an i-th first track feature map; i is an integer which is greater than or equal to 1 and less than or equal to a first preset value;

performing feature extraction on the ith first trajectory feature map through an ith first attention mechanism module to obtain an ith second trajectory feature map; the characteristic information of the second track characteristic diagram is larger than that of the first track characteristic diagram;

selectively reserving the ith second track characteristic diagram through an ith gate control unit to obtain an ith output result; the ith output result comprises the trajectory behavior habits of the user in the trajectory images of the t time step and the previous time step;

and taking the ith output result as the input of an (i + 1) th recurrent neural network module, determining i +1 as a new i, continuously executing the step of performing feature extraction on the track image at the t time step until the value of i reaches the first preset value, determining t +1 as a new t, performing feature extraction on the track image at the new t time step until the multi-frame track image finishes feature extraction, and obtaining the track prediction result of the user.

3. The method of claim 2, wherein the trajectory prediction model is trained by the method steps of:

acquiring a track video sample and a first label of the user, wherein the track video sample comprises a plurality of track image samples, and the first label comprises the position of the user marked in the plurality of track image samples;

training a plurality of sequentially and overlappingly connected recurrent neural network modules, a first attention mechanism module and a gate control unit according to the track video sample to obtain a track training result; the track training result comprises a prediction result of the user position in a track image sample of the next frame of the track image sample;

and adjusting parameters of a plurality of sequentially and overlappingly connected recurrent neural network modules, a first attention mechanism module and a gate control unit according to the difference between the track training result of the multi-frame track image and the first label until the training is finished to obtain the track prediction model.

4. The method of claim 3, wherein the training a plurality of sequential and overlapping connected recurrent neural network modules, first attention mechanism modules and gating units according to the trajectory video sample to obtain a trajectory training result comprises:

training the ith-1 training result output by the ith-1 gate control unit aiming at the trace image sample of the t time step through an ith cyclic neural network module to obtain an ith first trace training result; the first track training result comprises track behavior habit information of the user, which is represented by all track image samples of the t time step and the previous time step;

training an ith first attention mechanism module according to the ith first track training result to obtain an ith second track training result; the trajectory behavior habit information of the user in the ith second trajectory training result is larger than that of the ith first trajectory training result;

training an ith gate control unit according to the ith second track training result to obtain an ith training result; the ith training result comprises the trajectory behavior habits of the user in the trajectory image samples of the t time step and the previous time step;

taking the ith training result as the input of an (i + 1) th cyclic neural network module, determining i +1 as a new i, continuously performing feature extraction on an (i-1) th output result output by an (i-1) th gate control unit aiming at the track image of the t time step through the ith cyclic neural network module until the value of i reaches the first preset value, determining t +1 as a new t, and training according to a new track feature map sample of the t time step until all the training of the multi-frame track image samples is completed to obtain the track training result of the user; and the track training result comprises the track behavior habit of the user in the multi-frame track image sample.

5. The method according to claim 3, wherein the adjusting parameters of a plurality of sequential and overlapping connected recurrent neural network modules, first attention mechanism modules and gate control units according to the difference between the trajectory training result of the plurality of frames of trajectory images and the first label until the training is finished to obtain the trajectory prediction model comprises:

adjusting parameters of a plurality of sequentially and overlapped cyclic neural network modules, a first attention mechanism module and a gate control unit according to the difference between the track training result of the multi-frame track image and the first label until a preset first training end condition is reached to obtain a middle track prediction model;

testing the intermediate track prediction model through a local server of the intelligent home system according to the obtained track test data set, and uploading a test result to a cloud server, so that the cloud server optimizes the model parameters of the intermediate track prediction model when determining that the performance parameters of the intermediate track prediction model do not meet the requirements of preset performance parameters according to the test result, and returns the optimized model parameters to the local server; the track test data set comprises a plurality of frames of track image test samples for model test;

and receiving the optimized model parameters, and continuing training the intermediate trajectory prediction model according to the optimized model parameters until a preset second training end condition is reached to obtain the trajectory prediction model.

6. The method according to claim 1, wherein the inputting the face feature vector corresponding to the face image into a pre-trained face recognition model to obtain the face recognition result of the user comprises:

performing feature extraction on the face image of the user through a jth residual convolution neural network to obtain a jth first face feature map, wherein j is an integer which is greater than or equal to 1 and less than or equal to a second preset value;

performing feature extraction on the jth first face feature map through a jth second attention mechanism module to obtain a jth second face feature map; the feature information of the second face feature map is larger than that of the first face feature map;

adding 1 to j, determining j +1 as new j, and returning to the step of extracting features through a jth residual convolutional neural network module until the value of j reaches the second preset value to obtain a target face feature map of the user;

and carrying out face recognition according to the target face feature image of the user and outputting a face recognition result.

7. The method of claim 6, wherein the face recognition model is trained by the following method steps:

acquiring a plurality of face image samples and a second label of the user; the second label comprises the face of the user marked in the face image sample;

aiming at each face image sample in the face image samples, training a plurality of sequentially and overlapped connected residual convolution neural network modules and a second attention mechanism module according to the face image sample to obtain a face training result of the face image;

and adjusting model parameters of a plurality of sequentially and overlapped connected residual convolution neural network modules and second attention mechanism modules according to the difference between the face training result of the face image and the second label corresponding to the face image, and repeating the training until the training is finished to obtain the face recognition model.

8. The method of claim 7, wherein the training a plurality of sequentially and overlappingly connected residual convolutional neural network modules and second attention mechanism modules according to the face image sample for each face image sample in the plurality of face image samples to obtain a face training result of the face image comprises:

for each face image in the plurality of face images, performing feature extraction on a j-1 th second face feature map sample through a j-th residual convolutional neural network to obtain a j-th first face feature map sample; j is an integer greater than or equal to 1;

performing feature extraction on the jth first face feature map through a jth second attention mechanism module to obtain a jth second face feature map sample; the face feature information of the second face feature map sample is larger than the face feature information of the first face feature map sample;

and adding 1 to j, determining j +1 as new j, and returning to the step of extracting features through the jth residual convolutional neural network module until the value of j reaches the second preset value, so as to obtain the training result of each face image in the plurality of face images.

9. The method according to claim 7, wherein the adjusting model parameters of a plurality of sequentially and overlappingly connected residual convolutional neural network modules and second attention mechanism modules according to the difference between the face training result of the face image and the second label corresponding to the face image, and repeating the training until the training is finished to obtain the face recognition model comprises:

adjusting model parameters of a plurality of sequentially and overlapped connected residual convolution neural network modules and a second attention mechanism module according to the face training result, and repeating training until a preset third training end condition is reached to obtain an intermediate face recognition model;

testing the intermediate face recognition model according to the obtained face test data set through a local server of the intelligent home system, and uploading a test result to a cloud server, so that the cloud server optimizes model parameters of the intermediate face recognition model when determining that the performance parameters of the intermediate face recognition model do not meet the requirements of preset performance parameters according to the test result, and returns the optimized model parameters to the local server; the face test data set comprises a plurality of face image test samples for model test;

and receiving the optimized model parameters, and continuing training the intermediate face recognition model according to the optimized model parameters until a preset fourth training end condition is reached to obtain the face recognition model.

10. The method according to claim 1, wherein the acquiring a track video and/or a face image of a user comprises:

acquiring a track video of the user from a track video acquisition unit through a 5G network and a wifi network;

and/or the presence of a gas in the atmosphere,

and acquiring the facial image of the user from a facial image acquisition unit through a 5G network and a wifi network.

11. The utility model provides a controlling means of intelligent household equipment which characterized in that includes:

the acquisition module is used for acquiring a track video and/or a face image of a user; the track video comprises a plurality of frames of track images;

the input module is used for inputting the track video into a pre-trained track prediction model to obtain a track prediction result of the user, and/or inputting the face image into a pre-trained face recognition model to obtain a face recognition result of the user;

the control module is used for controlling the intelligent household equipment according to the track prediction result and/or the face recognition result;

12. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-10.

13. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1-10.

14. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1-10.