WO2020181523A1 - 唤醒屏幕的方法和装置 - Google Patents

唤醒屏幕的方法和装置 Download PDF

Info

Publication number
WO2020181523A1
WO2020181523A1 PCT/CN2019/077991 CN2019077991W WO2020181523A1 WO 2020181523 A1 WO2020181523 A1 WO 2020181523A1 CN 2019077991 W CN2019077991 W CN 2019077991W WO 2020181523 A1 WO2020181523 A1 WO 2020181523A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial image
image
screen
neural network
facial
Prior art date
Application number
PCT/CN2019/077991
Other languages
English (en)
French (fr)
Inventor
刘翠君
那柏林
吴学成
占望鹏
黄庚帅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19919494.5A priority Critical patent/EP3910507A4/en
Priority to CN201980010395.4A priority patent/CN111936990A/zh
Priority to PCT/CN2019/077991 priority patent/WO2020181523A1/zh
Publication of WO2020181523A1 publication Critical patent/WO2020181523A1/zh
Priority to US17/408,738 priority patent/US20210382542A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/325Power saving in peripheral device
    • G06F1/3265Power saving in display device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the field of electronic equipment, and more specifically, to a method and device for waking up a screen.
  • AI technology can be used to perceive people and user behavior at all times, improve user stickiness, and provide users with more intelligent services.
  • a standby wake-up technology is required.
  • the standby wake-up technology means that the terminal device in a sleep or standby state can be awakened in advance, that is, the screen of the terminal device needs to be awakened before subsequent functional operations can be performed.
  • the user’s iris information is obtained for gaze discrimination and the authentication of the owner’s discrimination.
  • this technology relies on the obtained pupil images for discrimination and uses a general front camera to obtain In the pupil image, the image quality is low, which seriously affects the accuracy of the wake-up result.
  • a special iris camera can be used to obtain high-quality pupil head images, which makes the wake-up results more accurate, but increases the production cost of the device. Therefore, how to improve the accuracy of the device to wake up the screen without significantly increasing the cost has become an urgent problem to be solved.
  • the present application provides a method and device for waking up the screen, which can improve the accuracy of the device waking up the screen without significantly increasing the cost.
  • a method for waking up a screen including: acquiring M image frames, where each image frame includes a first facial image, and M is an integer greater than or equal to 1; according to a pre-configured neural network , Determine whether each first facial image matches a preset facial image and belongs to a user looking at the screen of the device; when each first facial image matches the preset facial image and belongs to the user, The screen is switched from the off-screen state to the on-screen state.
  • the method for waking up the screen uses the acquired first facial image of each image frame in the M image frames to perform gaze discrimination and facial image discrimination through a pre-configured neural network, that is, to determine each first face Whether the image matches the preset facial image and belongs to the user who is looking at the screen of the device, it can avoid the problem of low image quality when the front camera acquires pupil images, which seriously affects the accuracy of the wake-up result, thereby improving the accuracy of the device's wake-up screen , Will not significantly increase the cost of equipment.
  • a current image frame may be obtained, and the current image frame includes a first facial image, and it is determined whether the first facial image in the current image frame matches a pre-configured neural network according to a pre-configured neural network.
  • the facial image belongs to the user who is looking at the screen of the device; when the first facial image in the current image frame matches the preset facial image and belongs to the user who is looking at the screen of the device, the screen is turned off The screen state switches to the bright screen state.
  • the determining whether each first facial image matches a preset facial image and belongs to a user who is looking at the screen of the device according to the pre-configured neural network includes: determining whether each first facial image belongs to the user by using the pre-configured neural network; when each first facial image belongs to the user, using the pre-configured neural network It is determined whether each of the first facial images matches the preset facial image.
  • the method for awakening the screen may use a pre-configured neural network to first determine whether the user corresponding to a first facial image is looking at the screen.
  • a pre-configured neural network can be used to determine whether the first facial image is a preset facial image.
  • the screen can be switched from the off-screen state to the on-screen state, so as to wake up the device's screen.
  • the determining whether each first facial image matches a preset facial image and belongs to a user looking at the screen of the device according to the pre-configured neural network includes: using the pre-configured The neural network determines whether each first facial image matches the preset facial image; when each first facial image matches the preset facial image, use the pre-configured neural network to determine Whether each of the first facial images belongs to the user.
  • the method for awakening the screen may use a pre-configured neural network to first determine whether a first facial image matches a preset facial image.
  • a pre-configured neural network determines whether the corresponding user is looking at the screen, that is, the first facial image belongs to the user.
  • the screen can be switched from the off-screen state to the on-screen state, so as to wake up the device's screen.
  • the pre-configured neural network used to determine whether the first facial image belongs to the user looking at the screen of the device and whether it matches the preset facial image can be the same pre-configured neural network, or two. Different pre-configured neural networks.
  • the determining whether each first facial image belongs to the user by using the pre-configured neural network includes: using the pre-configured neural network The neural network determines the probability value of each first facial image belonging to the user; when the probability value is greater than a preset threshold, it is determined that each first facial image belongs to the user.
  • the screen wake-up method provided by the embodiment of the present application can use a pre-configured neural network to determine the probability value of a user corresponding to a first facial image looking at the screen of the device, thereby determining whether the user corresponding to a first facial image is looking at The user of the device's screen.
  • the method further includes: determining a first face frame in each image frame, where the first face frame is At least one face frame with the largest area in the face frame included in each image frame; the first face image is determined according to a second face image located in the first face frame.
  • the first face frame with the largest area can be determined by the area size of at least one face frame included in an image frame, and the face in the image frame can be positioned according to the first face frame Image area, thereby determining the second facial image.
  • the method further includes: acquiring facial direction information, where the facial direction information is used to indicate the direction of the second facial image;
  • the second facial image in the frame to determine the first facial image includes: when the direction of the second facial image does not match a preset standard direction, rotating the second facial image to obtain a matching The first facial image in the preset standard direction.
  • facial direction information can be obtained.
  • the aspect direction information does not match
  • rotation processing can be performed, and the processed facial image can be input to the pre-configured neural network.
  • the method further includes: acquiring face direction information, where the face direction information is used to indicate the direction of the second face image; and the method according to the second face located in the first face frame
  • the image determining the first facial image includes: when the direction of the second facial image matches a preset standard direction, using the second facial image as the first facial image.
  • the acquiring M image frames includes: acquiring the M image frames when the screen is in the off-screen state.
  • the method for waking up the screen provided in this application can periodically detect whether the screen of the device is in the off-screen state, and when it is detected that the screen is in the off-screen state, M image frames can be acquired. For example, one or more image frames may be acquired at a time, thereby acquiring M image frames.
  • the pre-configured neural network is a deep neural network.
  • the pre-configured neural network can be a fully connected neural network or a convolutional neural network.
  • a device for waking up a screen includes: an acquiring unit configured to acquire M image frames, wherein each image frame includes a first facial image, and M is an integer greater than or equal to 1;
  • the processing unit is used to determine whether each first facial image matches the preset facial image and belongs to the user who is looking at the screen of the device according to the pre-configured neural network;
  • the wake-up unit is used to perform When the images all match the preset facial image and belong to the user, the screen is switched from the off-screen state to the on-screen state.
  • the device for waking up the screen provided by the present application, the first facial image of each of the M image frames is acquired, and the gaze determination and facial image determination are performed through a pre-configured neural network, that is, each first surface is determined Whether the partial image matches the preset facial image and belongs to the user who is looking at the screen of the device, it can avoid the problem of low image quality when the front camera acquires the pupil image, which seriously affects the accuracy of the wake-up result, thereby improving the accuracy of the device's wake-up screen It does not significantly increase the cost of equipment.
  • the acquiring unit may be used to acquire a current image frame, the current image frame includes a first facial image, and the processing unit may be used to determine the current image according to a pre-configured neural network Whether the first facial image in the frame matches the preset facial image and belongs to the user looking at the screen of the device; the wake-up unit can be used when the first facial image in the current image frame matches the preset facial image and When the user belongs to the screen of the watching device, the screen is switched from the off-screen state to the on-screen state.
  • the processing unit is specifically configured to: use the pre-configured neural network to determine whether each of the first facial images belongs to the user; When each of the first facial images belongs to the user, the pre-configured neural network is used to determine whether each of the first facial images matches the preset facial image.
  • the processing unit is specifically configured to: use the pre-configured neural network to determine whether each first facial image matches the preset facial image; when each first facial image When the image matches the preset facial image, the pre-configured neural network is used to determine whether each first facial image belongs to the user.
  • the pre-configured neural network used to determine whether the first facial image belongs to the user looking at the screen of the device and whether it matches the preset facial image can be the same pre-configured neural network, or two. Different pre-configured neural networks.
  • the processing unit is specifically configured to: determine the probability value of each first facial image belonging to the user by using the pre-configured neural network; When the probability value is greater than a preset threshold, it is determined that each first facial image belongs to the user.
  • the processing unit is further configured to: determine a first face frame in each image frame, and the first face frame is each At least one face frame with the largest area in the face frame included in each image frame; and the first face image is determined according to a second face image located in the first face frame.
  • the acquisition unit is further configured to: acquire facial direction information, where the facial direction information is used to indicate the direction of the second facial image; the processing unit It is specifically used for: when the direction of the second facial image does not match the preset standard direction, rotating the second facial image to obtain the first facial image matching the preset standard direction .
  • the acquisition unit is further configured to: acquire facial orientation information, where the facial orientation information is used to indicate the direction of the second facial image; and the processing unit is specifically configured to: When the direction of the second facial image matches the preset standard direction, the second facial image is used as the first facial image.
  • the acquiring unit is specifically configured to acquire the M image frames when the screen is in the off-screen state.
  • the pre-configured neural network is a deep neural network.
  • a device for waking up a screen including a processor, a memory, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the device for waking up the screen executes the first On the one hand and its various possible implementations, the method of waking up the screen.
  • processors there are one or more processors and one or more memories.
  • the memory may be integrated with the processor, or the memory and the processor may be provided separately.
  • a computer program product includes: a computer program (also called code, or instruction) that, when the computer program is executed, causes a computer or any one of at least one processing
  • the device executes the methods in the first aspect and its various implementations.
  • a computer-readable medium stores a computer program (also called code, or instruction) when it runs on a computer or any at least one processor, so that The computer or the processor executes the methods in the first aspect and various implementations thereof.
  • a computer program also called code, or instruction
  • a chip system in a sixth aspect, includes a processor for supporting a server in a computer to implement the functions involved in the first aspect and its various implementation manners.
  • FIG. 1 is a schematic diagram of a model of a convolutional neural network provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of a method for waking up a screen according to an embodiment of the present application
  • Fig. 3 is a flowchart of a method for waking up a screen according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of a device for waking up a screen according to an embodiment of the present application
  • Fig. 5 is a schematic diagram of a device for waking up a screen according to another embodiment of the present application.
  • Fig. 6 is a schematic diagram of a device for waking up a screen according to another embodiment of the present application.
  • FIG. 7 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • the method for controlling the screen state provided by the embodiments of the present application can be applied to scenes such as waking up the screen, looking at and taking pictures, identifying reading content that the user is interested in, and human-computer interaction.
  • the method for controlling the screen state in the embodiment of the present application can be applied to the standby and wake-up state scenarios. The following briefly introduces the standby and wake-up state scenarios.
  • the terminal device in the sleep/standby state needs to be awakened in advance, that is, the screen corresponding to the terminal device needs to be awakened (also referred to as activation) to perform subsequent functional operations.
  • Wake up the screen of the terminal device can be to control the state of the screen to switch from the off-screen state to the on-screen state, or it can be regarded as waking the terminal device from the sleep state to the active state. This technology can be called standby wake-up technology.
  • the unlocking technology refers to some security authentications that need to be performed after the terminal device wakes up, that is, after the terminal device is in the on-screen state, such as entering a password, entering a fingerprint, Iris authentication, face authentication, etc., unlocking technology can obtain and use the complete functions of the terminal device.
  • the standby wake-up technology is the first step for users to use the terminal device, and is the primary technology of the terminal device.
  • the commonly used standby wake-up technology that is, the method of waking up the screen
  • key-press wake-up, double-click wake-up, and lift-up wake-up require manual operation by the user, which is inconvenient and unsafe to use
  • voice wake-up requires the user to wake up by sound, which is not convenient enough to use and the wake-up rate is not high.
  • this application proposes a method for waking up the screen, which can complete the gaze judgment and the owner judgment according to the first facial image in the current image frame by acquiring the current image frame, that is, it can be based on the first facial image and
  • the pre-trained deep neural network determines whether the user corresponding to the first facial image is looking at the screen, and further determines whether the user is the owner.
  • the first facial image satisfies both the above-mentioned gaze determination and the owner determination, that is, the current The image frame can meet the wake-up condition.
  • the screen state of the device can be switched from the off-screen state to the on-screen state.
  • the device may be a terminal device with a screen, for example, it may be a user equipment, a mobile device, a user terminal, a terminal, a wireless communication device, or a user device.
  • the terminal device can also be a cellular phone, a cordless phone, a session initiation protocol (SIP) phone, a wireless local loop (WLL) station, a personal digital assistant (PDA), and a wireless communication Functional handheld devices, computing devices, or other processing devices connected to wireless modems, in-vehicle devices, wearable devices, terminal devices in the future 5G network or future evolution of the public land mobile network (PLMN) Terminal equipment, etc., this embodiment of the present application does not limit this.
  • SIP session initiation protocol
  • WLL wireless local loop
  • PDA personal digital assistant
  • PLMN public land mobile network
  • the embodiments of the present application involve a large number of applications of neural networks. To facilitate understanding, the following first introduces related terms and neural networks and other related concepts involved in the embodiments of the present application.
  • the facial image can be rotated from one pose angle to another and the corresponding rotated image can be obtained.
  • a neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept b as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with many hidden layers. There is no special metric for "many” here.
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer.
  • the coefficient from the kth neuron of the L-1th layer to the jth neuron of the Lth layer is defined as It should be noted that the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way to extract image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. Therefore, the image information obtained by the same learning can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the pre-trained deep neural network may be a deep learning model, for example, it may be a fully connected neural network or a convolutional neural network (convolutional neural network, CNN).
  • CNN convolutional neural network
  • the deep neural network can also be a fully connected neuron Network model.
  • the deep neural network can also be a CNN model.
  • the neural network training method involves computer vision processing, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning to symbolize and formalize training data for intelligent information modeling, Extraction, preprocessing, training, etc., finally get a trained neural network.
  • the pre-configured neural network may be a gaze discrimination network, that is, it can determine whether the user corresponding to the facial image is looking at the screen according to the acquired facial image and the pre-trained neural network.
  • the method for waking up the screen provided by the embodiment of the application can use the above-mentioned pre-configured neural network to input input data (for example, the first facial image in this application) into the trained gaze discrimination network to obtain output data (Such as the first probability in this application).
  • the pre-configured neural network training method and the method of waking up the screen provided in the embodiments of the present application can be based on the same idea, or can be understood as two parts in a system, or two parts of an overall process.
  • Stages such as model training stage and model application stage.
  • the model training stage can be trained in advance
  • the model application stage can be the method of waking up the screen in this application, that is, the application of a pre-trained deep neural network.
  • the training data in the embodiments of this application may include: the user's characteristic image, that is, the image that can reflect the current user's characteristics, for example, the user's gaze image, non-gaze image, head posture information, and gaze direction information; according to the training data, Obtain the target model/rule, that is, the target model can be a pre-trained neural network to determine whether the user is looking at the screen.
  • the pre-configured neural network may be obtained by training according to the facial feature image of at least one user.
  • the facial feature image of at least one user may be the facial feature image of at least one user in a pre-collected and purchased database, or may be an image collected through a front camera after authorization by the user.
  • the facial feature image of at least one user may be a non-gazing facial image, a gazing facial image, head posture information, and gaze direction information of at least one user.
  • facial images can directly reflect the head posture, and the head posture of a person is often strongly related to the gaze direction.
  • the head posture is basically the same as the gaze information.
  • the neural network can learn the gaze results of this special scene purposefully by constructing a rich data set, thereby improving the accuracy of the neural network's judgment Sex.
  • the target model/rule in the embodiment of the present application can be used to implement the method for waking up the screen provided in the embodiment of the present application, that is, the first facial image in the current image frame obtained is input into the target model after relevant preprocessing /Rule to obtain the output result, that is, determine whether the user corresponding to the first facial image is looking at the screen.
  • the target model/rule in the embodiment of the present application may be a pre-configured neural network.
  • the obtained target model/rule can be applied to different devices.
  • the obtained target model/rule can be applied to terminal devices, such as mobile phones, tablets, laptops, AR/VR, vehicle-mounted terminals, etc. .
  • the pre-configured neural network in this application may be a convolutional neural network.
  • a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture.
  • the deep learning architecture refers to the use of machine learning algorithms.
  • Multi-level learning is carried out on the abstract level of
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 100 may include an input layer 110, a convolutional layer/pooling layer 120 (the pooling layer is optional), and a neural network layer 130.
  • the convolutional layer/pooling layer 120 may include layers 121-126 as shown in the examples.
  • layer 121 is a convolutional layer
  • layer 122 is a pooling layer
  • layer 123 is a convolutional layer.
  • Layers, 124 are pooling layers
  • 125 are convolutional layers
  • 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, and 124 and 125 are convolutional layers.
  • Layer, 126 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 can include many convolution operators.
  • the convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. , It depends on the value of stride), so as to complete the work of extracting specific features from the image.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 100 can make correct predictions. .
  • the initial convolutional layer (such as 121) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 126) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the convolutional layer can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the only purpose of the pooling layer is to reduce the size of the image space.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 100 After processing by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required information or other related information), the convolutional neural network 100 needs to use the neural network layer 130 to generate one or a group of required outputs of the number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 1) and an output layer 140. The parameters contained in the multiple hidden layers can be based on specific task types.
  • the task type may include image recognition, that is, it can be judged whether a first facial image input to the neural network matches a preset facial image, and the first Whether the user corresponding to the facial image is looking at the screen of the device.
  • the pre-configured neural network in this application can also be a fully connected neural network, that is, all neurons in each layer are connected to all neurons in the next layer (the weight of each neuron in each layer is w is not 0).
  • the convolutional neural network 100 shown in FIG. 1 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.
  • Fig. 2 is a method 200 for waking up a screen provided by an embodiment of the present application.
  • the method 200 may include steps 210-230.
  • the steps 210-230 are described in detail below.
  • Step 210 Obtain M image frames, where each image frame includes a first facial image, and M is an integer greater than or equal to 1.
  • the state information of the screen can be detected.
  • the device can detect whether the current screen is off or on.
  • M image frames can be acquired.
  • the face in the first facial image may belong to the user, for example, the owner or another person.
  • the screen state can be periodically detected, and in a preset time period, whether the current state of the device is in the off-screen state is periodically detected. If the current state of the screen is off, the latest information in the image storage module can be obtained. Image frame.
  • acquiring M image frames may be acquiring one current image frame at a time, and acquiring M image frames through M times. It is also possible to acquire multiple image frames at once.
  • acquiring each image frame can be the latest image frame in the memory of the acquisition device, the latest image frame can be the image collected by the front camera, and the latest image frame can be regarded as the stored image or the time distance for acquiring the image is currently The image frame with the shortest time difference.
  • Obtaining M image frames may be acquiring the current image frame and the first M-1 image frames of the current image frame at one time, that is, acquiring M image frames at a time.
  • step 210 may further include: after acquiring M image frames, a first face frame may be determined in each image frame, for example, the first face frame is included in each image frame. At least one face frame with the largest area in the face frame; the first face image can be determined according to the second face image located in the first face frame.
  • one image frame collected by the front camera may include an image of one face, or may include images of multiple faces, and the number of facial images included in an image frame is related to the number of users in front of the camera when the image is collected.
  • each acquired image frame may include at least one facial image, that is, at least one facial image, and the first facial image may be the facial image with the largest area among the at least one facial image included in the current image frame.
  • the first facial image may also be another facial image in at least one facial image included in the current image frame, for example, a facial image located at or close to the center of the current image frame.
  • the step of determining the first face frame in each image frame may further include: acquiring face direction information, which is used to indicate the direction of the second face image; when the direction of the second face image does not match When the standard direction is preset, the second facial image is rotated to obtain the first facial image matching the preset standard direction.
  • the second facial image may be used as the first facial image.
  • an image frame is used as an example for description. After acquiring an image frame, it can be determined whether there is a facial image in the current image frame, that is, the facial image in the current image frame and the background image are distinguished; in the current image frame When there is a facial image, the face frame and face direction of at least one facial image included in the current image frame can be determined, where the face frame can be a rectangular frame, and the position of the face frame can be obtained, that is, the upper left corner and the lower right corner of the rectangular face frame.
  • the coordinates, the direction of the face can be that the face image in the face frame is up, down, left, right, etc., relative to the rectangular face frame.
  • the area of the face frame may be calculated according to the coordinates of at least one face frame, and the face frame with the largest area, that is, the first face frame, can be selected according to the area sorting.
  • the first face frame the first face frame area can be located from the current image frame, and the face area image corresponding to the first face frame is intercepted, that is, the second face image is acquired.
  • the first facial image can be determined according to the first facial frame and facial direction information, and the first facial image is vertically upward with respect to the screen. Face image. That is to say, the second facial image corresponding to the first facial frame can be acquired. If the second facial image does not match the preset standard direction, for example, the preset standard direction can be a vertical upward direction relative to the screen. The direction information performs rotation processing on the second facial image, and corrects the second facial image to a facial image that is vertically upward with respect to the screen. If the second facial image matches the preset standard direction, the second facial image can be used as the first facial image.
  • the face in the image collected by the camera may have various directions.
  • the face in various directions is corrected to a vertical face, which is a face image that is vertically upward with respect to the screen.
  • the vertical direction in this embodiment may be a vertical direction relative to the ground plane.
  • Step 220 Determine whether each first facial image matches a preset facial image and belongs to a user looking at the screen of the device according to the pre-configured neural network.
  • the pre-configured neural network in this application may be a deep neural network, and the pre-configuration process of the deep neural network may be as shown in FIG. 1 above.
  • the pre-configured neural network may be obtained by training according to the facial feature image of at least one user.
  • the facial feature image of at least one user may be the facial feature image of at least one user in a pre-collected and purchased database, or may be an image collected through a front camera after authorization by the user.
  • the facial feature image of at least one user may be a non-gazing facial image, a gazing facial image, head posture information, and gaze direction information of at least one user.
  • facial images can directly reflect the head posture, and the head posture of a person is often strongly related to the gaze direction.
  • the head posture is basically the same as the gaze information.
  • the neural network can learn the gaze results of this special scene purposefully by constructing a rich data set, thereby improving the accuracy of the neural network's judgment Sex.
  • a pre-configured neural network can be used to determine whether each first facial image belongs to a user looking at the screen of the device; when each first facial image belongs to the user, the pre-configured The neural network determines whether each first facial image matches a preset facial image.
  • the gaze determination can be performed first, that is, whether the user corresponding to the first facial image in the current image frame is gazing at the screen.
  • the owner judgment is performed again, that is, whether the user corresponding to the first facial image matches the preset facial image.
  • the gaze discrimination can be performed before the machine Main judgment; in addition, it is also possible to perform the owner judgment first and then the gaze judgment based on the first facial image.
  • a pre-configured neural network can be used to determine whether each first facial image matches a preset facial image; when each first facial image matches a preset facial image, the pre-configured neural network can be used to determine Whether each first facial image belongs to the user looking at the screen of the device.
  • a pre-configured neural network can be used to determine the probability value that each first facial image belongs to the user who is looking at the screen of the device; when the probability value is greater than a preset threshold, it is determined that each first facial image belongs to the user who is looking at the device. The user of the screen.
  • each first facial image can be input to a pre-configured neural network, output the feature vector of the first facial image, and calculate the distance from the feature vector of the preset facial image on the device to determine whether it satisfies The match.
  • the preset facial image may be a facial image of the owner of the device.
  • the device can guide the host to collect the host’s facial image, execute it through the host recognition network to obtain the corresponding feature vector, and store it in a fixed location (such as memory) as the feature vector of the host’s face ( face ID), for the owner to identify and determine.
  • face ID the feature vector of the host’s face
  • Step 230 When each of the first facial images matches the preset facial image and belongs to the user, switch the screen from the off-screen state to the on-screen state.
  • the current image frame can be obtained, and the current image frame includes the first facial image; according to the pre-configured neural network, it is determined whether the first facial image matches the preset facial image and belongs to the gaze device When the first facial image matches the preset facial image and belongs to the user, switch the screen from the off-screen state to the on-screen state.
  • the continuous M In the image frames whether each first facial image matches a preset facial image and belongs to a user who looks at the screen of the device. Under normal circumstances, assuming that the owner intentionally wakes up the device, the owner will watch for a certain interval. Therefore, considering the rationality of the scene while satisfying real-time wake-up, it can be detected that each first facial image in the continuous M image frames matches the When the preset facial image belongs to the user who is watching the screen, the screen is switched from the off-screen state to the on-screen state.
  • the execution flow of detecting that each first facial image in the continuous M image frames matches the preset facial image and belongs to the user who is watching the screen may be to process and determine each image frame in turn, that is, The first image frame may be acquired.
  • the first image frame may be acquired.
  • the second image frame is acquired and the same determination operation is performed until the first facial image is acquired.
  • the screen of the device is switched from the off-screen state to the on-screen state.
  • each first facial image matches the preset facial image and When a user is watching the screen, switch the screen from the off-screen state to the on-screen state.
  • the value of M can be set by itself, and it should be ensured that the value of M should not be too large or too small. If the value of M is too large, the wake-up delay will be large and the wake-up rate will be low; If it is small, it will cause serious wake-up jump.
  • the method for waking up the screen uses the acquired first facial image of each image frame in the M image frames to perform gaze discrimination and facial image discrimination through a pre-configured neural network, that is, to determine each first face Whether the image matches the preset facial image and belongs to the user who is looking at the screen of the device, it can avoid the problem of low image quality when the front camera acquires the pupil image, which seriously affects the accuracy of the wake-up result, thereby improving the accuracy of the device's wake-up screen , Will not significantly increase the cost of equipment.
  • FIG. 3 is a schematic flowchart of a method for waking up a screen provided by an embodiment of the application.
  • the method shown in FIG. 3 includes steps 301 to 315, and steps 301 to 315 are respectively described in detail below.
  • Step 301 The periodic screen-off detection module periodically detects whether the current state of the terminal device is in the screen-off state in a set period.
  • step 302 is executed.
  • step 305 is executed when the image frame ends, and the process ends, and the next cycle of off-screen detection is waited for.
  • Step 302 Obtain the latest image frame in the image storage module as an input image for subsequent gaze judgment or machine owner judgment.
  • the time difference between the time of storing or acquiring the image frame and the current time is the shortest.
  • Step 303 Determine whether there is a facial image in the latest acquired image frame. If there is a facial image, go to step 304; if there is no facial image, go to step 305, when the image frame ends, the process ends.
  • Step 304 Calculate the size of the face frame, and obtain the largest face frame and its face direction.
  • face detection can be performed on the image to determine the background area and the face area in the latest image frame.
  • an end-to-end multi-task network structure can be used to complete the three tasks of face frame positioning, face classification, and face direction classification. That is, a fixed-size image can be input and the output Face frame coordinates and face direction.
  • the face direction is the relative position information of the face image in the face frame with respect to the face frame.
  • the face direction points may be upward, left, right, and downward.
  • the face frame coordinates can be the coordinates of the upper left corner and the lower right corner of the rectangular frame, that is, it can be 4 values. Calculate the area of the face frame, sort according to the area, select the face with the largest area and output its face frame coordinates and face direction.
  • the image collected by the front camera may include one facial image, or may also include multiple facial images.
  • the owner wants to wake up the device, he will often approach the device screen intentionally, so in the image captured by the camera
  • the largest facial image, or the facial image can also be other facial images in at least one facial image included in the current image frame, for example, a facial image located at or close to the center of the current image frame, as the most likely to wake up the screen user.
  • Step 306 Locate the face frame from the latest acquired image frame according to the largest face frame, and intercept the facial region image in the face frame.
  • Step 307 Determine whether the captured facial region image is a vertical face according to the facial direction of the largest facial frame, that is, whether the captured facial region image is a vertical upward facial image relative to the screen. If the captured facial image is a vertical facial image (a vertically upward facial image), step 309 is performed; if the captured facial image is not a vertical facial image, step 308 is performed.
  • Step 308 Correct the facial image by flipping and transposing to obtain a corrected facial image, that is, obtaining a facial image that is vertically upward with respect to the screen of the terminal device.
  • the face in the image captured by the camera may have various orientations.
  • the captured facial region image can be corrected to the vertical direction Face image.
  • the facial image obtained by the interception is a facial image in a vertical upward direction
  • correction processing may not be performed; if the facial image obtained by interception is a facial image in the downward direction, the upward flip processing may be performed; If the facial image of is a leftward facial image, it can be flipped upwards and then transposed; if the captured facial image is a rightward facial image, it can be flipped to the left and then transposed; finally the correction can be output It is a vertical facial area image.
  • Step 309 The face image is used as input, the gaze discrimination network is executed, and the gaze probability is output.
  • the gaze discrimination network may be a pre-trained deep neural network. Inputting the acquired facial image to the pre-trained deep neural network can output the probability of gaze or non-gaze.
  • the input of the gaze determination network may be a facial image corrected to a vertical direction, or may be a facial image that has not been corrected.
  • the accuracy of the gaze result output by the network is higher.
  • the face correction process can also be eliminated, and the original facial image can be directly input to determine the corresponding gaze accuracy. Even if the original face image that is not corrected to the vertical direction is directly input, the gaze accuracy of the gaze discrimination network is higher than that of the traditional gaze discrimination method based on obtaining pupil images.
  • Step 310 Compare the output gaze probability with a preset gaze threshold, and output whether to look at the screen. If the gaze probability is greater than the preset gaze threshold, it is determined that the face is looking at the screen, indicating that the face is trying to wake up the device, that is, it is determined that the user corresponding to the face input to the gaze discrimination network is looking at the screen, and step 311 is performed; if the gaze probability is less than the preset With the gaze threshold, it is determined that the face is not looking at the screen, indicating that the face has no intention of awakening, that is, it is determined that the user corresponding to the face inputting the gaze judgment network is not looking at the screen, and the process is ended when the image frame ends in step 305.
  • Step 311 The facial image is used as input, and the host discrimination network is executed to output the feature vector of the facial image.
  • the host discriminating network and the gaze discriminating network can be two different end-to-end lightweight neural network algorithms.
  • Step 312 Calculate the distance between the output feature vector and the preset feature vector of the host's face.
  • Step 313 Compare the calculated distance with the preset distance threshold, and output whether it is the owner.
  • the distance between the feature vector corresponding to the current facial image and the feature vector of the host’s face is calculated and compared with the preset distance threshold. If it is less than the preset distance threshold, it is determined to be the host’s face and execute Step 314; otherwise, it is determined that it is a non-host face, and then step 305 is executed, the image frame ends, that is, the process ends.
  • the terminal device can guide the host to collect the host's face, and execute the corresponding feature vector through the host identification network, and store it in a fixed location as the feature vector of the host's face for the host Recognition and judgment use.
  • the gaze determination is performed first, namely step 309 and step 310; when the gaze probability is greater than the preset gaze threshold, the machine owner identification is performed again, that is, step 311 to step 313.
  • the owner determination can also be performed first.
  • the owner recognition and gaze determination can be performed at one time through the neural network, that is, the results of the owner recognition and gaze determination can be output at one time through the processing of the neural network.
  • Step 314 Determine whether consecutive M frames are watched by the host. That is, it is statistically determined whether the number of consecutive frames the host is looking at the screen has reached the preset M frames, if the consecutive M frames are reached, step 315 is executed; otherwise, step 305 is executed to exit the process.
  • Step 315 Wake up the bright screen, that is, adjust the screen state to the bright screen state.
  • FIG. 3 is only to help those skilled in the art understand the embodiments of the present application, and is not intended to limit the embodiments of the present application to the specific scenarios illustrated. Those skilled in the art can obviously make various equivalent modifications or changes based on the example of FIG. 3 given, and such modifications or changes also fall within the scope of the embodiments of the present application.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, rather than corresponding to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the gaze determination and facial image determination are performed through the pre-configured neural network through the acquired M image frames, that is, the determination of each image frame Whether the first facial image matches the preset facial image and belongs to a user who is looking at the screen of the device can improve the accuracy of the wake-up screen of the device without significantly increasing the cost of the device.
  • the device for waking up the screen in the embodiment of the present application can execute the various methods of the foregoing embodiment of the present application, that is, the specific working process of the following various products, and the corresponding process in the foregoing method embodiment may be referred to.
  • FIG. 4 is a schematic block diagram of a device 400 for waking up a screen in an embodiment of the present application. It should be understood that the apparatus 400 can execute each step in the method of FIG. 2 or FIG. 3, and in order to avoid repetition, it will not be described in detail here.
  • the device 400 includes: an acquisition unit 410, a processing unit 420, and a wake-up unit 430.
  • the acquiring unit 410 is configured to acquire M image frames, where each of the M image frames includes a first facial image, and M is an integer greater than or equal to 1;
  • the processing unit 420 is configured to According to the pre-configured neural network, it is determined whether each first facial image matches the preset facial image and belongs to the user who is looking at the screen of the device; the wake-up unit 430 is used for when each first facial image matches all When the preset facial image belongs to the user, the screen is switched from the off-screen state to the on-screen state.
  • the specific implementation form of the acquisition unit 410 in FIG. 4 may be specifically the image acquisition module shown in FIG. 5, where the image acquisition module is used to acquire the latest image frame in the image storage module as a subsequent network The input image.
  • the specific implementation form of the processing unit 420 in FIG. 4 may include a periodic screen-off detection module, a face detection + direction determination module, a gaze determination module, and a machine owner identification module as shown in FIG.
  • Detection module It can be used to periodically detect whether the current state of the device is in the off-screen state in a set cycle. If it is in the off-screen state, continue to the next process, otherwise jump out of the process and continue to the next cycle of off-screen detection.
  • Face detection + direction discrimination module It can be used to run the multi-task neural network algorithm of face detection + direction discrimination, and obtain the latest stored image of the image storage module as the input image of the network.
  • Gaze discrimination module It can be used to run a neural network algorithm for gaze discrimination.
  • the output image (face area image) of the face detection + direction discrimination module is used as input.
  • the algorithm directly outputs the gaze probability, compares it with the preset probability threshold, and outputs the gaze result.
  • Host recognition module It can be used to run the neural network algorithm of the host recognition module, taking the output image (face area image) of the face detection + direction discrimination module as input, and the algorithm directly outputs the facial feature vector, calculating and pre-stored the host face Based on the distance of the feature vector, it is judged whether the output is the owner of the machine according to the preset distance threshold.
  • the specific implementation form of the wake-up unit 430 in FIG. 4 may be the wake-up processing module shown in FIG. 5, where the wake-up processing module may be used to smoothly process the module according to the gaze information and the host information obtained by the above-mentioned module. It is judged whether it is satisfied that there is the owner watching the screen in the continuous M image frames. If it is satisfied, the related wake-up unit is reported to wake up the bright screen; if it is not satisfied, it is not reported.
  • the processing unit 420 is specifically configured to: use the pre-configured neural network to determine whether each first facial image belongs to the user; when each first facial image When the partial image belongs to the user, the pre-configured neural network is used to determine whether each first facial image matches the preset facial image.
  • the processing unit 420 is specifically configured to: use the pre-configured neural network to determine the probability value that each first facial image belongs to the user; when the probability value is greater than When the threshold is preset, it is determined that each of the first facial images belongs to the user.
  • the processing unit 420 is further configured to: determine a first face frame in each image frame, where the first face frame is included in each image frame At least one face frame with the largest area in the face frame; the first face image is determined according to a second face image located in the first face frame.
  • the acquiring unit 410 is further configured to: acquire facial orientation information, where the facial orientation information is used to indicate the direction of the second facial image; the processing unit 420 is specifically configured to: When the direction of the second facial image does not match the preset standard direction, rotating the second facial image is performed to obtain the first facial image matching the preset standard direction.
  • the acquiring unit 410 is specifically configured to acquire the M image frames when the screen is in the off-screen state.
  • the pre-configured neural network is a deep neural network.
  • the device 400 for waking up the screen here is embodied in the form of a functional unit.
  • the term "unit” herein can be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit, or a combination of the two that realize the above-mentioned functions.
  • the hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, a processor for executing one or more software or firmware programs (such as a shared processor, a dedicated processor, or a group processor). Etc.) and memory, merged logic circuits and/or other suitable components that support the described functions.
  • the units of the examples described in the embodiments of the present application can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiments of the present application also provide a device, which may be a terminal device or a circuit device built in the terminal device.
  • the device can be used to execute the functions/steps in the above method embodiments.
  • the device 600 includes a processor 610 and a transceiver 620.
  • the device 600 may further include a memory 630.
  • the processor 610, the transceiver 620, and the memory 630 can communicate with each other through an internal connection path to transfer control and/or data signals.
  • the memory 630 is used to store a computer program, and the processor 610 is used to download from the memory 630. Call and run the computer program.
  • the device 600 may further include an antenna 640 for transmitting the wireless signal output by the transceiver 620.
  • the above-mentioned processor 610 and the memory 630 may be integrated into a processing device, and more commonly, they are components independent of each other.
  • the processor 610 is configured to execute the program code stored in the memory 630 to implement the above-mentioned functions.
  • the memory 630 may also be integrated in the processor 610 or independent of the processor 610.
  • the processor 610 may correspond to the processing unit 420 in the device 400 in FIG. 4, and may also correspond to the periodic detection module, face detection+direction determination module, gaze determination module, and host identification module in the device 500 in FIG.
  • the device 600 may also include one or more of an input unit 660, a display unit 670, an audio circuit 680, a camera 690, and a sensor 601.
  • the audio circuit also It may include a speaker 682, a microphone 684, and so on.
  • the camera 690 may correspond to the acquisition unit 410 in the apparatus 400 in FIG. 4, or may correspond to the image acquisition module in the apparatus 500 in FIG. 5;
  • the display unit 870 may include a screen, and the display unit 870 may correspond to the image acquisition module in the apparatus 400 in FIG.
  • the wake-up unit 430 of is corresponding to the wake-up processing module in the device 500 shown in FIG. 5.
  • the camera 690 or an image processing channel corresponding to the camera 690 may be used to obtain M image frames.
  • the foregoing device 600 may further include a power supply 650, which is used to provide power to various devices or circuits in the terminal device.
  • a power supply 650 which is used to provide power to various devices or circuits in the terminal device.
  • the device 600 shown in FIG. 6 can implement each process of the method embodiments shown in FIG. 2 and FIG. 3.
  • the operation and/or function of each module in the device 600 is to implement the corresponding process in the foregoing method embodiment.
  • FIG. 7 is a chip hardware structure provided by an embodiment of the application, and the chip includes a neural network processor 70.
  • the chip may be set in the device 600 as shown in FIG. 6, for example, may be set in the processor 610 of the device 600.
  • the algorithms of each layer in the convolutional neural network as shown in Fig. 1 can be implemented in the chip as shown in Fig. 7.
  • the processor 610 in the device 600 shown in FIG. 6 may be a system on a chip (SOC), and the processor 610 may include a central processing unit (CPU) as shown in FIG.
  • the (neural-network processing unit, NPU) 70 may further include other types of processors, such as an image signal processor (ISP) corresponding to the camera 690, and the ISP may include the image processing channel mentioned in the previous embodiment.
  • the CPU may be called the main CPU, and the neural network processor NPU 70 is mounted on the main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks.
  • Host CPU main CPU
  • the Host CPU allocates tasks.
  • Each part of the processor cooperates to implement the previous method flow, and each part of the processor can selectively execute a part of the software driver.
  • step 210 in FIG. 2 may be executed by the ISP
  • 220 may be executed by the NPU 70
  • 230 may be executed by the CPU.
  • 301, 314, and 315 in Figure 3 are executed by the CPU
  • 302 is executed by the ISP
  • 303-313 can be executed by the NPU.
  • 303 may be executed by the CPU instead of the NPU.
  • each part of the processor or processing unit inside the processor 610 can cooperate to implement the previous method flow, and the corresponding software program of each part of the processor or processing unit can be stored in the memory 630.
  • the above NPU 70 is only used as an example, and the actual neural network function can be replaced by a processing device other than the NPU 70, such as an image processing unit (GPU) can also be used for neural network processing, which is not limited in this embodiment.
  • GPU image processing unit
  • the core part of the NPU is the arithmetic circuit 703, and the controller 704 controls the arithmetic circuit 703 to extract the data in the weight memory 702 or the input memory 701 and perform operations.
  • the arithmetic circuit 703 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 703 is a two-dimensional array. The arithmetic circuit 703 may also be a one-dimensional array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 703 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 702 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit takes the matrix A data and matrix B from the input memory 701 to perform matrix operations, and the partial or final result of the obtained matrix is stored in the accumulator 708.
  • the vector calculation unit 707 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector calculation unit 707 can be used for network calculations in the non-convolution/non-FC layer of the neural network, such as pooling, batch normalization, and local response normalization (LRN). )Wait.
  • the vector calculation unit 707 can store the processed output vector to the unified buffer 706.
  • the vector calculation unit 707 may apply a nonlinear function to the output of the arithmetic circuit 703, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 707 generates normalized values, combined values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 703, for example for use in subsequent layers in a neural network.
  • the unified memory 706 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 701 and/or the unified memory 706 through the direct memory access controller (DMAC) 705, and stores the weight data in the external memory into the weight memory 702, And the data in the unified memory 706 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 710 is used to implement interaction between the main CPU, the DMAC, and the fetch memory 709 through the bus.
  • the instruction fetch buffer 709 connected to the controller 704 is used to store the instructions used by the controller 704; the controller 704 is used to call the instructions cached in the instruction fetch memory 709 to control the working process of the computing accelerator .
  • the operations of each layer in the convolutional neural network shown in FIG. 1 may be executed by the arithmetic circuit 703 or the vector calculation unit 707.
  • a method for training a neural network which can be trained according to a characteristic image of at least one user to determine a pre-trained deep neural network.
  • training may be performed according to the non-gazing facial image, the gazing facial image, head posture information, and gaze direction information of at least one user to determine a pre-trained deep neural network.
  • the training method may be processed by the CPU, or jointly processed by the CPU and GPU, or GPU may not be used, and other processors suitable for neural network calculations may be used, which is not limited in this application.
  • the present application also provides a computer-readable storage medium that stores instructions in the computer-readable storage medium.
  • the computer executes the above-mentioned method for waking up the screen as shown in FIGS. 2 and 3 The various steps in.
  • This application also provides a computer program product containing instructions.
  • the computer program product runs on a computer or any one of at least one processor, the computer executes the methods for waking up the screen as shown in FIGS. 2 and 3 The various steps.
  • This application also provides a chip including a processor.
  • the processor is used to read and run the computer program stored in the memory to execute the corresponding operation and/or process performed by the method for waking up the screen provided in this application.
  • the chip further includes a memory, the memory and the processor are connected to the memory through a circuit or a wire, and the processor is used to read and execute the computer program in the memory.
  • the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive data and/or information that needs to be processed, and the processor obtains the data and/or information from the communication interface, and processes the data and/or information.
  • the communication interface can be an input and output interface.
  • the processor 610 involved may include, for example, a central processing unit (CPU), a microprocessor, a microcontroller, or a digital signal processor, and may also include a GPU, an NPU, and an ISP. It may also include necessary hardware accelerators or logic processing hardware circuits, such as application-specific integrated circuits (ASICs), or one or more integrated circuits used to control the execution of the technical solutions of the present application.
  • the processor may have a function of operating one or more software programs, and the software programs may be stored in the memory.
  • the memory can be read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage ( Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can Any other medium accessed by the computer, etc.
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.
  • magnetic disk storage media or other magnetic storage devices or can be used to carry or store desired program codes in the form of instructions or data structures and can Any
  • At least one refers to one or more
  • multiple refers to two or more.
  • And/or describes the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean the existence of A alone, both A and B, and B alone. Among them, A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are in an “or” relationship.
  • “The following at least one item” and similar expressions refer to any combination of these items, including any combination of single items or plural items.
  • At least one of a, b, and c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c can be single or multiple.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请提供了一种唤醒屏幕的方法和装置,其中,唤醒屏幕的方法包括:获取M个图像帧,其中,每个图像帧包括第一面部图像,M为大于或等于1的整数;根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;当该每个第一面部图像均匹配该预设面部图像且属于该用户时,将该屏幕由灭屏状态切换至亮屏状态。本申请提供的技术方案,能够在不显著增加成本的前提下,提高设备唤醒屏幕的准确性。

Description

唤醒屏幕的方法和装置 技术领域
本申请涉及电子设备领域,并且更具体地,涉及一种唤醒屏幕的方法和装置。
背景技术
随着人工智能(artificial intelligence,AI)技术在终端设备的发展,可以通过AI技术时刻感知人、感知用户行为,提高用户粘性,为用户提供更加智能化的服务。在使用终端设备之前,需要进行待机唤醒技术,待机唤醒技术是指可以预先唤醒睡眠或者待机状态下的终端设备,即需要唤醒终端设备的屏幕,进而才能够进行后续的功能性操作。
现有技术中,对于待机唤醒技术而言,例如,采用获取用户的虹膜信息进行注视判别和机主判别的认证,但是,该技术依赖于获取的瞳孔图像进行判别,采用一般的前置摄像头获取瞳孔图像时,图像画质较低,严重影响唤醒结果的准确性。还例如,采用特殊的虹膜摄像头,可以获取高质量的瞳孔头像,使得唤醒结果更加准确,但是提高了设备的生产成本。因此,如何在不显著增加成本的前提下提高设备唤醒屏幕的准确性,成为亟待解决的问题。
发明内容
本申请提供一种唤醒屏幕的方法和装置,能够在不显著增加成本的前提下,提高设备唤醒屏幕的准确性。
第一方面,提供了一种唤醒屏幕的方法,包括:获取M个图像帧,其中,每个图像帧包括第一面部图像,M为大于或等于1的整数;根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;当所述每个第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
本申请提供的唤醒屏幕的方法,通过获取的M个图像帧中每个图像帧的第一面部图像,通过预先配置的神经网络进行注视判别和面部图像判别,即确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,能够避免前置摄像头获取瞳孔图像时,图像画质较低,严重影响唤醒结果的准确性的问题,从而提高设备唤醒屏的准确性,不会显著增加设备的成本。
在一种可能的实现方式中,可以获取当前图像帧,所述当前图像帧中包括第一面部图像,根据预先配置的神经网络确定所述当前图像帧中的第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;当所述当前图像帧中的第一面部图像匹配所述预设面部图像且属于所述注视设备的屏幕的用户时,将所述屏幕由灭屏状态切换至亮屏状态。
结合第一方面,在第一方面的某些实现方式中,所述根据所述预先配置的神经网络,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,包括:利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户;当所述每个第一 面部图像属于所述用户时,利用所述预先配置的神经网络确定所述每个第一面部图像是否匹配所述预设面部图像。
本申请实施例提供的屏幕唤醒的方法,可以利用预先配置的神经网络,先确定一个第一面部图像对应的用户是否在注视屏幕,当第一面部图像对应的用户注视屏幕时,即第一面部图像属于所述用户时,可以利用预先配置的神经网络确定第一面部图像是否为预设面部图像。当同时满足上述条件时,可以将屏幕由灭屏状态切换至亮屏状态,从而实现唤醒设备的屏幕。
在一种可能的实现方式中,所述根据所述预先配置的神经网络,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,包括:利用所述预先配置的神经网络确定每个第一面部图像是否匹配所述预设的面部图像;当所述每个第一面部图像匹配所述预设的面部图像时,利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户。
本申请实施例提供的屏幕唤醒的方法,可以利用预先配置的神经网络,先确定一个第一面部图像是否匹配预设面部图像,当第一面部图像匹配预设面部图像时,可以利用预先配置的神经网络确定对应的用户是否在注视屏幕,即第一面部图像属于所述用户。当同时满足上述条件时,可以将屏幕由灭屏状态切换至亮屏状态,从而实现唤醒设备的屏幕。
需要说明的是,本申请中确定第一面部图像是否属于注视设备的屏幕的用户以及是否匹配预设面部图像利用的预配置的神经网络可以是同一个预配置的神经网络,也可以是两个不同的预配置的神经网络。
结合第一方面,在第一方面的某些实现方式中,所述利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户,包括:利用所述预先配置的神经网络确定所述每个第一面部图像属于所述用户的概率值;当所述概率值大于预设阈值时,确定所述每个第一面部图像属于所述用户。
本申请实施例提供的屏幕唤醒的方法,可以利用预配置的神经网络确定一个第一面部图像对应的用户注视设备的屏幕的概率值,从而确定一个第一面部图像对应的用户是否属于注视设备的屏幕的用户。
结合第一方面,在第一方面的某些实现方式中,在获取M个图像帧后,还包括:在所述每个图像帧中确定第一面部框,所述第一面部框为所述每个图像帧中包括的至少一个面部框中面积最大的面部框;根据位于所述第一面部框中的第二面部图像确定所述第一面部图像。
本申请实施例提供的屏幕唤醒的方法,可以通过一个图像帧中包括的至少一个面部框的面积大小确定面积最大的第一面部框,根据第一面部框可以定位于图像帧中的面部图像区域,从而确定第二面部图像。
结合第一方面,在第一方面的某些实现方式中,还包括:获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;所述根据位于所述第一面部框中的第二面部图像确定所述第一面部图像包括:当所述第二面部图像的方向不匹配预设标准方向时,对所述第二面部图像进行旋转处理,以得到匹配于所述预设标准方向的所述第一面部图像。
本申请实施例提供的屏幕唤醒的方法,为了避免预配置的神经网络的算法复杂度较高 以及降低所述预配置的神经网络的功耗,可以获取面部方向信息,当方面部向信息不匹配预设标准方向时,可以进行旋转处理,将处理后的面部图像再输入至预配置的神经网络。
在一种可能的实现方式中,还包括:获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;所述根据位于所述第一面部框中的第二面部图像确定所述第一面部图像包括:当所述第二面部图像的方向匹配预设标准方向时,将所述第二面部图像作为所述第一面部图像。
结合第一方面,在第一方面的某些实现方式中,所述获取M个图像帧包括:在所述屏幕处于所述灭屏状态时,获取所述M个图像帧。
本申请提供的唤醒屏幕的方法,可以周期性地检测设备的屏幕是否处于灭屏状态,当检测到屏幕处于灭屏状态时,可以获取M个图像帧。例如,可以一次获取一个或多个图像帧,从而获取M个图像帧。
结合第一方面,在第一方面的某些实现方式中,所述预先配置的神经网络为深度神经网络。
例如,预先配置的神经网络可以是全连接神经网络,也可以是卷积神经网络。
第二方面,提供一种唤醒屏幕的装置,所述装置包括:获取单元,用于获取M个图像帧,其中,每个图像帧包括第一面部图像,M为大于或等于1的整数;处理单元,用于根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;唤醒单元,用于当所述每个第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
根据本申请提供的唤醒屏幕的装置,通过获取的M个图像帧中每个图像帧的第一面部图像,通过预先配置的神经网络进行注视判别和面部图像判别,即确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,能够避免前置摄像头获取瞳孔图像时,图像画质较低,严重影响唤醒结果的准确性的问题,从而提高设备唤醒屏的准确性,不会显著增加设备的成本。
在一种可能的实现方式中,获取单元,可以用于获取当前图像帧,所述当前图像帧中包括第一面部图像,处理单元,可以用于根据预先配置的神经网络确定所述当前图像帧中的第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;唤醒单元,可以用于当所述当前图像帧中的第一面部图像匹配所述预设面部图像且属于所述注视设备的屏幕的用户时,将所述屏幕由灭屏状态切换至亮屏状态。
结合第二方面,在第二方面的某些实现方式中,所述处理单元具体用于:利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户;当所述每个第一面部图像属于所述用户时,利用所述预先配置的神经网络确定所述每个第一面部图像是否匹配所述预设面部图像。
在一种可能的实现方式中,处理单元具体用于:利用所述预先配置的神经网络确定每个第一面部图像是否匹配所述预设的面部图像;当所述每个第一面部图像匹配所述预设的面部图像时,利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户。
需要说明的是,本申请中确定第一面部图像是否属于注视设备的屏幕的用户以及是否匹配预设面部图像利用的预配置的神经网络可以是同一个预配置的神经网络,也可以是两 个不同的预配置的神经网络。
结合第二方面,在第二方面的某些实现方式中,所述处理单元具体用于:利用所述预先配置的神经网络确定所述每个第一面部图像属于所述用户的概率值;当所述概率值大于预设阈值时,确定所述每个第一面部图像属于所述用户。
结合第二方面,在第二方面的某些实现方式中,所述处理单元还用于:在所述每个图像帧中确定第一面部框,所述第一面部框为所述每个图像帧中包括的至少一个面部框中面积最大的面部框,;根据位于所述第一面部框中的第二面部图像确定所述第一面部图像。
结合第二方面,在第二方面的某些实现方式中,所述获取单元还用于:获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;所述处理单元具体用于:当所述第二面部图像的方向不匹配预设标准方向时,对所述第二面部图像进行旋转处理,以得到匹配于所述预设标准方向的所述第一面部图像。
在一种可能的实现方式中,所述获取单元还用于:获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;;所述处理单元具体用于:当所述第二面部图像的方向匹配预设标准方向时,将所述第二面部图像作为所述第一面部图像。
结合第二方面,在第二方面的某些实现方式中,所述获取单元具体用于:在所述屏幕处于所述灭屏状态时,获取所述M个图像帧。
结合第二方面,在第二方面的某些实现方式中,所述预先配置的神经网络为深度神经网络。
第三方面,提供了一种唤醒屏幕的装置,包括,处理器,存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该唤醒屏幕的装置执行第一方面及其各种可能实现方式中的唤醒屏幕的方法。
可选地,所述处理器为一个或多个,所述存储器为一个或多个。
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
第四方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序被运行时,使得计算机或任一至少一种处理器执行上述第一方面及其各种实现方式中的方法。
第五方面,提供了一种计算机可读介质,所述计算机可读介质存储有计算机程序(也可以称为代码,或指令)当其在计算机或任一至少一种处理器上运行时,使得计算机或该处理器执行上述第一方面及其各种实现方式中的方法。
第六方面,提供了一种芯片系统,该芯片系统包括处理器,用于支持计算机中的服务器实现上述第一方面及其各种实现方式中所涉及的功能。
附图说明
图1是本申请实施例提供的卷积神经网络的模型示意图;
图2是根据本申请一个实施例的唤醒屏幕的方法的示意图;
图3是根据本申请一个实施例的唤醒屏幕的方法的流程图;
图4是根据本申请一个实施例的唤醒屏幕的装置的示意图;
图5是根据本申请另一个实施例的唤醒屏幕的装置的示意图;
图6是根据本申请另一个实施例的唤醒屏幕的装置的示意图;
图7是本申请实施例提供的一种芯片硬件结构示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例提供的控制屏幕状态的方法能够应用在唤醒屏幕、注视拍照、识别用户感兴趣的阅读内容、人机交互等场景。具体而言,本申请实施例的控制屏幕状态的方法能够应用在待机唤醒状态的场景中,下面分别对待机唤醒状态的场景进行简单的介绍。
唤醒屏幕的场景
在使用终端设备的各种应用之前,需要预先唤醒睡眠/待机状态下的终端设备,即需要唤醒(又可以称为激活)该终端设备对应的屏幕,才可进行后续的功能性操作。唤醒终端设备的屏幕可以是控制屏幕的状态由灭屏状态切换至亮屏状态,也可以看作是将终端设备由睡眠状态唤醒为激活状态,该技术可称之为待机唤醒技术。
应理解,待机唤醒技术与解锁技术存在着一定的差异,解锁技术是指在终端设备设备唤醒之后,即终端设备处于亮屏状态之后,需要进行的一些安全认证,例如,输入密码、输入指纹、虹膜认证、面部认证等等,解锁技术可以获得并使用终端设备的完整功能。待机唤醒技术则是用户使用终端设备的第一步,是终端设备的首要技术。
目前,例如,常用的待机唤醒技术即唤醒屏幕的方法可以是按键唤醒、双击唤醒、抬起唤醒、语音唤醒等等。其中,按键唤醒、双击唤醒和抬起唤醒需要用户手动操作,使用不方便且不安全;语音唤醒需要用户发声唤醒,使用还是不够方便且唤醒率不高。这些唤醒方式都没有进行安全认证,即并没有限制只有机主才能够进行控制屏幕切换至亮屏状态,使得非机主可以轻而易举的唤醒屏幕,极易泄露屏幕上的短信信息,无法有效保护机主显示在屏幕上的隐私信息。
有鉴于此,本申请提出了一种唤醒屏幕的方法,可以通过获取当前图像帧,根据当前图像帧中的第一面部图像完成注视判别和机主判别,即可以根据第一面部图像和预先训练的深度神经网络确定第一面部图像对应的用户是否注视屏,并进一步地确定该用户是否为机主,当第一面部图像既满足上述注视判别又满足机主判别时,即当前图像帧可以满足唤醒条件,当连续的M图像帧满足上述唤醒条件时,可以将设备的屏幕状态由灭屏状态切换至亮屏状态。通过本申请中的控制屏幕状态的方法,通过获取当前图像帧中的面部图像和预先训练的深度神经网络,避免了采用一般的前置摄像头获取瞳孔图像进行注视判别时,图像画质较低,严重影响唤醒结果的准确性的问题,从而提高了控制屏幕状态的准确性。
在本申请中,设备可以是具有屏幕的终端设备,例如,可以是用户设备、移动设备、用户终端、终端、无线通信设备或用户装置。终端设备还可以是蜂窝电话、无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字助理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、可穿戴设备,未来5G网络中的终端设备或者未来演进的公用陆地移动通信网络(public land mobile network,PLMN)中的终端设备等,本申请实施例对此并不限定。
本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
(1)旋转处理(面部旋转)
利用图像处理、机器学习、计算机图形学等相关方法中的任一项,可以将面部图像从一个姿态(pose)角度旋转到另一个姿态角度并得到相应的旋转后图像。
(2)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距b为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2019077991-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(3)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。例如,全连接神经网络中层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2019077991-appb-000002
其中,
Figure PCTCN2019077991-appb-000003
是输入向量,
Figure PCTCN2019077991-appb-000004
是输出向量,
Figure PCTCN2019077991-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2019077991-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2019077991-appb-000007
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2019077991-appb-000008
的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2019077991-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2019077991-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(4)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的 神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(5)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
在本申请中,预先训练的深度神经网络可以是深度学习模型,例如,可以是全连接的神经网络,也可以是卷积神经网络(convolutional neural network,CNN)。在每一层的所有神经元与下一层的所有神经元连接(每一层的每一个神经元的权重w均不为0)的情况下,该深度神经网络还可以是一个全连接的神经网络模型。在每一层的所有神经元不与下一层的所有神经元连接(每一层的每一个神经元上的权重w部分为0)的情况下,该深度神经网络还可以是一个CNN模型。
下面从模型训练侧和模型应用侧对本申请提供的方法进行描述:
本申请实施例提供的神经网络的训练方法,涉及计算机视觉的处理,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的神经网络。
示例性地,预先配置的神经网络可以是注视判别网络,即能够根据获取的面部图像和预先训练的神经网络,确定该面部图像对应的用户是否注视屏幕。本申请实施例提供的唤醒屏幕的方法可以运用上述预先配置的神经网络,将输入数据(例如,本申请中的第一面部图像)输入到所述训练好的注视判别网络中,得到输出数据(如本申请中的第一概率)。
需要说明的是,本申请实施例提供的预先配置的神经网络的训练方法和唤醒屏幕的方法可以是基于同一个构思,也可以理解为一个系统中的两个部分,或一个整体流程的两个阶段:如模型训练阶段和模型应用阶段。例如,模型训练阶段可以提前训练的,模型应用阶段可以是本申请中的唤醒屏幕的方法,即对预先训练的深度神经网络的应用。
首先,对本申请中预先配置的神经网络的训练过程进行简要说明。
本申请实施例中训练数据可以包括:用户的特征图像,即可以反映当前用户特征的图像,例如,可以是用户的注视图像、非注视图像、头部姿态信息以及注视方向信息;根据训练数据可以得到目标模型/规则,即目标模型可以是预先训练的神经网络,用于确定用户是否注视屏幕。
在本申请中,预先配置的神经网络可以是根据至少一个用户的面部特征图像进行训练得到。其中,至少一个用户的面部特征图像可以是预先采集购买的数据库中的至少一个用户的面部特征图像,或者,可以是经过用户授权后通过前置摄像头采集到的图像。
例如,至少一个用户的面部特征图像可以是至少一个用户的非注视面部图像、注视面部图像、头部姿态信息以及注视方向信息等。由于面部图像能直接反应头部姿态,而人的头部姿态与注视方向往往强相关,通常情况下头部姿态与注视信息基本一致,当用户侧脸角度、俯仰角度较大时,通常没有在注视屏幕;用户正脸时,则一般是在注视屏幕等等。为了避免现实场景中也存在少部分头部姿态与注视情况不一致的场景,通过构建丰富的数据集来让神经网络有目的的学习到此种特殊场景的注视结果,从而提高神经网络进行判别的准确性。
本申请的实施例中该目标模型/规则能够用于实现本申请实施例提供的唤醒屏幕的方法,即,将获取的当前图像帧中的第一面部图像通过相关预处理后输入该目标模型/规则,得到输出结果,即确定第一面部图像对应的用户是否注视屏幕。本申请实施例中的目标模型/规则可以为预先配置的神经网络。
本申请中,得到的目标模型/规则可以应用于不同的设备中,例如,可以将得到的目标模型/规则应用于是终端设备,如手机终端,平板电脑,笔记本电脑,AR/VR,车载终端等。
示例性地,本申请中的预先配置的神经网络可以是卷积神经网络。
如前文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。
如图1所示,卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120(其中池化层为可选的),以及神经网络层130。
如图1所示卷积层/池化层120可以包括如示例121-126层,举例来说:在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
下面将以卷积层121为例,介绍一层卷积层的内部工作原理。
卷积层121可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素,这取决于步长 stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络100进行正确的预测。
当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如126)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图1中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。
神经网络层130
在经过卷积层/池化层120的处理后,卷积神经网络100还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层120只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的信息或其他相关信息),卷积神经网络100需要利用神经网络层130来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层130中可以包括多层隐含层(如图1所示的131、132至13n)以及输出层140,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如,在本申请中该任务类型可以包括图像识别,即可以进行判断输入至神经网络中的一个第一面部图像是否匹配预设面部图像,以及该第一面部图像对应的用户是否在注视设备的屏幕。
示例性地,本申请中的预先配置的神经网络还可以是全连接的神经网络,即每一层的所有神经元与下一层的所有神经元连接(每一层的每一个神经元的权重w均不为0)。
需要说明的是,如图1所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在。
图2是本申请实施例提供的一种唤醒屏幕的方法200,该方法200可以包括步骤210-230,下面对步骤210-230进行详细的描述。
步骤210:获M个图像帧,其中,每个图像帧包括第一面部图像,M为大于或等于1的整数。在一个可能的实现方式中,获取M个图像帧之前,可以检测屏幕的状态信息。
例如,可以检测当前屏幕处于灭屏状态还是亮屏状态。在屏幕处于灭屏状态时,可以获取M个图像帧。第一面部图像中的面部可能属于用户,例如,机主或其他人。
示例性地,可以周期性的检测屏幕状态,以预设的时间周期,周期性地检测设备当前状态是否在灭屏状态,若屏幕的当前状态为灭屏时,可以获取图像存储模块中的最新图像帧。
示例性地,获M个图像帧可以是一次获取一个当前图像帧,通过M次获取M个图像帧。也可以一次获取多个图像帧。
需要说明的是,获取每个图像帧可以是获取设备的存储器中的最新图像帧,最新图像帧可以是前置摄像头采集的图像,最新图像帧可以看作存储图像或者获取图像的时间距离当前为时间差最短的图像帧。获取M个图像帧,可以是一次获取当前图像帧,以及当前图像帧的前M-1个图像帧,即一次可以获取M个图像帧。
在本申请中,步骤210还可以包括:在获取M个图像帧后,可以在每个图像帧中确定第一面部框,例如,第一面部框为所述每个图像帧中包括的至少一个面部框中面积最大的面部框;根据位于第一面部框中的第二面部图像可以确定第一面部图像。
应理解,前置摄像头采集的一个图像帧中可以包括一个面部的图像,也可以包括多面部的图像,一图像帧中包括的面部图像的数量与采集图像时摄像头前用户的数量相关。考虑到唤醒设备时,往往会有意凑近设备屏幕,即一个图像帧中的面积最大的面部图像最有可能是唤醒屏幕的真实用户。因此,在本申请中,获取的每个图像帧中可以包括至少一个面部图像,即至少一个面部图像,第一面部图像可以是当前图像帧中包括的至少一个面部图像中面积最大的面部图像。可以理解,第一面部图像也可以是当前图像帧中包括的至少一个面部图像中其他的面部图像,例如,位于或接近于当前图像帧中央的面部图像。
在一个示例中,在每个图像帧中确定第一面部框的步骤还可以包括:获取面部方向信息,面部方向信息用于指示第二面部图像的方向;当第二面部图像的方向不匹配预设标准方向时,对第二面部图像进行旋转处理,以得到匹配于预设标准方向的所述第一面部图像。
例如,当第二面部图像的方向匹配预设标准方向时,可以将第二面部图像作为第一面部图像。
示例性地,以一个图像帧举例进行描述,在获取一个图像帧后,可以判断当前图像帧中是否存在面部图像,即对当前图像帧中的面部图像以及背景图像进行区分;在当前图像帧中存在面部图像时,可以确定当前图像帧中包括的至少一个面部图像的面部框和面部方向,其中,面部框可以是矩形框,可以获取面部框的定位即矩形面部框中左上角和右下角的坐标,面部方向即可以是面部框中的面部图像相对于矩形面部框为朝上、朝下、朝左、朝右等方向的。例如,可以根据至少一个面部框的坐标计算面部框的面积,依据面积排序,选出面积最大的面部框即第一面部框。根据第一面部框可以从当前图像帧中定位到第一面部框区域,并截取该第一面部框对应的面部区域图像,即获取第二面部图像。
进一步地,为了降低预先配置的神经网络的计算量以及计算复杂度,可以根据第一面部框和面部方向信息确定第一面部图像,该第一面部图像为相对于屏幕为竖直向上的面部图像。也就是说,可以在获取第一面部框对应的第二面部图像,若第二面部图像不匹配预设标准方向,例如,预设标准方向可以是相对于屏幕竖直向上的方向,根据面部方向信息对第二面部图像进行旋转处理,将第二面部图像校正为相对于屏幕为竖直向上的面部图像。若第二面部图像匹配预设标准方向,则可以将第二面部图像作为第一面部图像。
在本申请中,摄像头采集到的图像中的面部存在各种方向的可能,为了降低预先配置的神经网络的计算量以及计算复杂度,即为了满足后续预先训练的深度神经网络的输入约束,可以将各种方向的面部都校正为竖直面部,该竖直面部即为相对于屏幕为竖直向上的面部图像。本实施例的竖直向上可以是相对于地平面的竖直向上。
步骤220:根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户。
应理解,本申请中预先配置的神经网络可以是深度神经网络,该深度神经网络的预先配置的过程可以如上述图1所示。
在本申请中,预先配置的神经网络可以是根据至少一个用户的面部特征图像进行训练得到。其中,至少一个用户的面部特征图像可以是预先采集购买的数据库中的至少一个用户的面部特征图像,或者,可以是经过用户授权后通过前置摄像头采集到的图像。
例如,至少一个用户的面部特征图像可以是至少一个用户的非注视面部图像、注视面部图像、头部姿态信息以及注视方向信息等。由于面部图像能直接反应头部姿态,而人的头部姿态与注视方向往往强相关,通常情况下头部姿态与注视信息基本一致,当用户侧脸角度、俯仰角度较大时,通常没有在注视屏幕;用户正脸时,则一般是在注视屏幕等等。为了避免现实场景中也存在少部分头部姿态与注视情况不一致的场景,通过构建丰富的数据集来让神经网络有目的的学习到此种特殊场景的注视结果,从而提高神经网络进行判别的准确性。
在一示例中,可以利用预先配置的神经网络确定每个第一面部图像是否属于注视设备的屏幕的用户;当每个第一面部图像属于所述用户时,可以利用所述预先配置的神经网络确定每个第一面部图像是否匹配预设面部图像。
在本申请中,可以先进行注视判别,即确定当前图像帧中的第一面部图像对应的用户是否在注视屏幕。在满足注视判别的情况下,再进行机主判别,即确定第一面部图像对应的用户是否匹配预设面部图像。其中,由于机主判别需要进行图像中多个特征向量的对比,从而机主判别的功耗是高于注视判别的功耗的,因此,为了节约系统功耗,可以先进行注视判别再进行机主判别;此外,也可以根据第一面部图像先进行机主判别再进行注视判别。
在一示例中,可以利用预先配置的神经网络确定每个第一面部图像是否匹配预设面部图像;当每个第一面部图像匹配预设面部图像时,可以利用预先配置的神经网络确定每个第一面部图像是否属于注视设备的屏幕的用户。
具体地,可以利用预先配置的神经网络确定每个第一面部图像属于注视设备的屏幕的用户的概率值;当概率值大于预设阈值时,确定每个第一面部图像属于注视设备的屏幕的用户。
具体地,可以将每个第一面部图像输入至预先配置的神经网路,输出该第一面部图像的特征向量,并与设备上的预设面部图像的特征向量计算距离,判断是否满足所述匹配。
在一个示例中,预设面部图像可以是设备的机主面部图像。机主在进行注册时,设备可以引导机主采集机主的面部图像,并通过该机主识别网络执行得到对应的特征向量,并存储在固定位置(如存储器)作为机主面部的特征向量(face ID),供机主识别判定使用。在判断当前面部是否匹配时,可以通过计算当前面部图像对应的特征向量与机主面部的特征向量的距离,并与预设的距离阈值比较大小,若小于预设的距离阈值,则可以确定当前面部图像匹配预设面部图像;若大于预设的距离阈值,则可以确定当前面部图像不匹配预设面部图像。
步骤230:当所述每个第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
例如,以一个图像帧举例说明,可以获取当前图像帧,该当前图像帧包括第一面部图像;根据预先配置的神经网路,确定第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;当该第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
进一步地,根据当前一个图像帧已经判定为是机主在注视屏幕后,为了避免用户无意看一眼手机后,设备的屏幕即由灭屏状态切换至亮屏状态的问题,可以进一步判断连续的M个图像帧中,每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户。通常情况下,假设机主有意唤醒设备,机主会注视一定间隔,因此考虑到场景的合理性同时满足实时唤醒,可以检测在连续的M个图像帧中每个第一面部图像均匹配所述预设面部图像且属于注视屏幕的用户时,将屏幕由灭屏状态切换至亮屏状态。其中,检测在连续的M个图像帧中每个第一面部图像均匹配所述预设面部图像且属于注视屏幕的用户的具有执行流程可以是分别对每个图像帧依次进行处理判断,即可以是获取第一图像帧,当第一图像帧中的第一面部图像匹配所述预设面部图像且属于注视屏幕的用户时,再获取第二图像帧进行相同的判别操作,直至获取第M个图像帧时,第M个图像帧中的第一面部图像也满足匹配所述预设面部图像且属于注视屏幕的用户时,将设备的屏幕由灭屏状态切换至亮屏状态。或者,也可以是获取连续的M个图像帧,同时可以对M个图像帧中的全部或者部分进行处理,在M个图像帧中每个第一面部图像均匹配所述预设面部图像且属于注视屏幕的用户时,将屏幕由灭屏状态切换至亮屏状态。上述为举例说明,并不对本申请的流程执行顺序作出任何限定。
应理解,其中,M的数值可以自行设定,理应保证M值不应太大也不应太小,若M值太大,会使得唤醒存在时延较大且唤醒率低;若M值太小,会使得唤醒跳变严重。
本申请提供的唤醒屏幕的方法,通过获取的M个图像帧中每个图像帧的第一面部图像,通过预先配置的神经网络进行注视判别和面部图像判别,即确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,能够避免前置摄像头获取瞳孔图像时,图像画质较低,严重影响唤醒结果的准确性的问题,从而提高设备唤醒屏的准确性,不会显著增加设备的成本。
下面结合图3,对本申请实施例中的唤醒屏幕的具体流程进行说明。
图3为本申请实施例提供的唤醒屏幕的方法的流程示意图。图3所示的方法包括步骤301至步骤315,下面分别对步骤301至步骤315进行详细描述。
步骤301:周期性灭屏检测模块,以设定的周期,周期性地检测终端设备当前状态是否在灭屏状态。
例如,若终端设备处于灭屏状态,即终端设备的屏幕处于灭屏状态,则执行步骤302。
例如,若终端设备处于亮屏状态,即终端设备的屏幕处于亮屏状态,则执行步骤305本图像帧结束即结束该流程,等待继续下个周期的灭屏检测。
步骤302:获取图像存储模块中的最新图像帧,作为后续注视判别或者机主判别的输入图像。
在本申请中,针对最新图像帧,存储该图像帧或者获取该图像帧的时间距离当前时间的时间差最短。
步骤303:判断获取的最新图像帧中是否存在面部图像。若存在面部图像,则执行步 骤304;若不存在面部图像,则执行步骤305本图像帧结束即结束该流程。
步骤304:计算面部框大小,获取最大面部框及其面部方向。
示例性地,获取最新的图像帧后,可以对图像进行面部检测,即可以确定最新图像帧中的背景区域和面部区域。
在一个示例中,可以采用一个端到端(end-to-end)的多任务网络结构,完成面部框定位、面部分类和面部方向分类三个任务,即可以输入固定尺寸的图像,获取输出的面部框坐标和面部方向。其中,面部方向为面部框中的面部图像相对于面部框的相对位置信息,例如,面部方向分可以为朝上、朝左、朝右、朝下。面部框坐标可以是矩形框的左上角和右下角的坐标值,即可以是4个数值。计算面部框的面积,依据面积排序,选出面积最大的面部输出其面部框坐标和及其面部方向。
应理解,前置摄像头采集的图像中可以包括一个面部图像,或者,也可以包括多个面部图像,考虑到机主想要唤醒设备时,往往会有意凑近设备屏幕,那么在摄像头采集的图像中最大的面部图像,或者,面部图像也可以是当前图像帧中包括的至少一个面部图像中其他的面部图像,例如,位于或接近于当前图像帧中央的面部图像,作为最有可能进行唤醒屏幕的用户。
步骤306:根据最大面部框从获取的最新图像帧中定位到该面部框并截取该面部框中的面部区域图像。
步骤307:根据最大面部框的面部方向确定截取的面部区域图像是否为竖直方向面部,即截取的面部区域图像相对于屏幕是否为竖直向上的面部图像。若截取的面部图像为竖直面部图像(竖直朝上的面部图像),则执行步骤309;若截取的面部图像不是竖直面部图像,则执行步骤308。
步骤308:通过翻转和转置校正面部图像,得到校正后的面部图像,即可以是得到相对于终端设备的屏幕为竖直向上的面部图像。
在本申请中,摄像头采集到的图像中的面部存在各种方向的可能,为了减小后续注视判别以及机主判别网络的负载以及计算复杂度,可以将截取的面部区域图像校正为竖直方向的面部图像。
示例性地,若截取获得的面部图像为竖直向上方向的面部图像,则可以不进行校正处理;若截取获得的面部图像为朝下方向的面部图像,则可以进行向上翻转处理;若截取获得的面部图像为朝左方向的面部图像,则可以先向上翻转然后转置处理;若截取获得的面部图像为朝右方向的面部图像,则可以先向左翻转然后转置处理;最终可以输出校正为竖直方向的面部区域图像。
步骤309:面部图像作为输入,执行注视判别网络,输出注视概率。
本申请中,注视判别网络可以是预先训练的深度神经网络,将获取的面部图像输入至预先训练的深度神经网络,可以输出注视或者非注视的概率。
应理解,注视判别网络的输入可以是校正为竖直方向的面部图像,也可以是输入未进行校正的面部图像。当输入是校正为竖直方向的面部图像比输入直接是面部图像时,网络输出的注视结果的准确率更高。此外,若考虑到校正过程可能会引入设备的功耗,也可以去掉面部校正过程,直接输入原始面部图像,确定相应的注视准确度。即使是直接输入未校正为竖直方向的原始面部图像,该注视判别网络的注视准确度也比传统的注视判别基于 获取瞳孔图像的方法的准确度更高。
步骤310:将输出的注视概率与预设注视阈值比较大小,输出是否注视屏幕。若注视概率大于预设注视阈值,则判定该面部在注视屏幕,说明该面部在试图唤醒设备,即确定输入注视判别网络的面部对应的用户在注视屏幕,执行步骤311;若注视概率小于预设注视阈值,则判定该面部未注视屏幕,说明该面部并无唤醒意图,即确定输入注视判别网络的面部对应的用户没有注视屏幕,则执行步骤305本图像帧结束即结束该流程。
步骤311:面部图像作为输入,执行机主判别网络,输出该面部图像的特征向量。需要说明的是,在本申请中机主判别网络和注视判别网络可以是两个不同的端到端的轻量级神经网络算法。
步骤312:计算输出的特征向量与预设的机主面部的特征向量的距离。
步骤313:将计算得到的距离与预设的距离阈值比较大小,输出是否为机主。
在一个示例中,通过计算当前面部图像对应的特征向量与机主面部的特征向量的距离,并与预设的距离阈值比较大小,若小于预设的距离阈值,则确定是机主面部,执行步骤314;否则,确定是非机主面部,则执行步骤305,本图像帧结束,即结束该流程。其中,机主在进行注册时,终端设备可以引导机主采集机主面部,并通过该机主识别网络执行得到对应的特征向量,并存储在固定位置作为机主面部的特征向量,供机主识别判定使用。
应理解,在图3中先执行注视判别,即步骤309和步骤310;在注视概率大于预设注视阈值时,再执行机主识别,即步骤311至步骤313。可以理解,图3所示的唤醒屏幕的方法的执行流程中,也可以先执行机主判别,在机主判别满足条件时即可以是输入的面部图像匹配机主的面部图像,再执行注视判别。或者,可以通过神经网络一次性执行机主识别和注视判别,即通过神经网络的处理,一次性地输出执行机主识别和注视判别的结果。上述为举例说明,并不对本申请的流程执行顺序作出任何限定。
步骤314:判断是否连续M帧为机主注视。即统计判定为机主在注视屏幕的连续帧数,是否达到预设的M帧,若达到连续M帧,则执行步骤315;否则执行步骤305退出流程。
步骤315:唤醒亮屏,即将屏幕状态调整至亮屏状态。
应注意,图3的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体场景。本领域技术人员根据所给出的图3的例子,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
上文详细描述了根据本申请实施例的唤醒屏幕的方法,在本申请中通过获取的M个图像帧,通过预先配置的神经网络进行注视判别和面部图像判别,即确定每个图像帧中的第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,能够提高设备唤醒屏的准确性,不会显著增加设备的成本。应理解,本申请实施例的唤醒屏幕的装置可以执行前述本申请实施例的各种方法,即以下各种产品的具体工作过程,可以参考前述方法实施例中的对应过程。
图4是本申请实施例的唤醒屏幕的装置400的示意性框图。应理解,装置400能够执 行图2或图3的方法中的各个步骤,为了避免重复,此处不再详述。装置400包括:获取单元410、处理单元420和唤醒单元430。其中,获取单元410,用于获取M个图像帧,其中,所述M个图像帧中的每个图像帧包括第一面部图像,M为大于或等于1的整数;处理单元420,用于根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;唤醒单元430,用于当所述每个第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
在一个示例中,图4中的获取单元410的具体实现形式可以具体是如图5所示的图像获取模块,其中,图像获取模块:用于获取图像存储模块中的最新图像帧,作为后续网络的输入图像。
在一个示例中,图4中的处理单元420具体实现形式可以包括如图5所示的周期性灭屏检测模块、面部检测+方向判别模块、注视判别模块以及机主识别模块,其中,周期性检测模块:可以用于以设定的周期,周期性地检测设备当前状态是否在灭屏状态,若在灭屏状态则继续下一流程,否则跳出流程继续下个周期的灭屏检测。面部检测+方向判别模块:可以用于运行面部检测+方向判别的多任务神经网络算法,获取图像存储模块的最新存储图像作为网络的输入图像,算法直接输出图像中的面部框坐标和面部方向信息,依据面部框的面积大小获取到最大面部框和其面部方向信息,从输入图像中截取到最大面部区域,并依据面部方向进行竖直方向校正,输出校正后的面部区域图像作为后续模块的输入。注视判别模块:可以用于运行注视判别的神经网络算法,以面部检测+方向判别模块的输出图像(面部区域图像)作为输入,算法直接输出注视概率,与预设的概率阈值比较大小,输出注视结果。机主识别模块:可以用于运行机主识别模块的神经网络算法,以面部检测+方向判别模块的输出图像(面部区域图像)作为输入,算法直接输出面部特征向量,计算与预存的机主面部特征向量的距离,根据预设的距离阈值判断输出是否为机主。
在一个示例中,图4中的唤醒单元430具体实现形式可以是如图5所示的唤醒处理模块,其中,唤醒处理模块可以用于依据上述模块得到的注视信息和机主信息,平滑处理模块判断当前是否满足已经连续M图像帧中存在机主注视屏幕,若满足,则上报相关唤醒单元唤醒亮屏;若不满足,则不上报。
可选地,作为一个实施例,所述处理单元420具体用于:利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户;当所述每个第一面部图像属于所述用户时,利用所述预先配置的神经网络确定所述每个第一面部图像是否匹配所述预设面部图像。
可选地,作为一个实施例,所述处理单元420具体用于:利用所述预先配置的神经网络确定所述每个第一面部图像属于所述用户的概率值;当所述概率值大于预设阈值时,确定所述每个第一面部图像属于所述用户。
可选地,作为一个实施例,所述处理单元420还用于:在所述每个图像帧中确定第一面部框,所述第一面部框为所述每个图像帧中包括的至少一个面部框中面积最大的面部框;根据位于所述第一面部框中的第二面部图像确定所述第一面部图像。
可选地,作为一个实施例,所述获取单元410还用于:获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;所述处理单元420具体用于:当所述第二面部图像的方向不匹配预设标准方向时,对所述第二面部图像进行旋转处理,以得到匹配于 所述预设标准方向的所述第一面部图像。
可选地,作为一个实施例,所述获取单元410具体用于:在所述屏幕处于所述灭屏状态时,获取所述M个图像帧。
可选地,作为一个实施例,所述预先配置的神经网络为深度神经网络。
应理解,这里的唤醒屏幕的装置400以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例还提供一种设备,该设备可以是终端设备也可以是内置于所述终端设备的电路设备。该设备可以用于执行上述方法实施例中的功能/步骤。
如图6所示,设备600包括处理器610和收发器620。可选地,该设备600还可以包括存储器630。其中,处理器610、收发器620和存储器630之间可以通过内部连接通路互相通信,传递控制和/或数据信号,该存储器630用于存储计算机程序,该处理器610用于从该存储器630中调用并运行该计算机程序。可选地,设备600还可以包括天线640,用于将收发器620输出的无线信号发送出去。
上述处理器610可以和存储器630可以合成一个处理装置,更常见的是彼此独立的部件,处理器610用于执行存储器630中存储的程序代码来实现上述功能。具体实现时,该存储器630也可以集成在处理器610中,或者独立于处理器610。该处理器610可以与图4中装置400中的处理单元420对应,也可以与图5中装置500中的周期性检测模块、面部检测+方向判别模块、注视判别模块以及机主识别模块对应。
除此之外,为了使得设备600的功能更加完善,该设备600还可以包括输入单元660、显示单元670、音频电路680、摄像头690和传感器601等中的一个或多个,所述音频电路还可以包括扬声器682、麦克风684等。其中,摄像头690可以与图4中装置400中的获取单元410对应,也可以与图5中装置500中图像获取模块对应;显示单元870可以包括屏幕,显示单元870可以与图4中装置400中的唤醒单元430对应,也可以与图5所示装置500中的唤醒处理模块对应。
例如,在一种实现方式中,摄像头690或对应于摄像头690的图像处理通道可以用于获取M个图像帧。
可选地,上述设备600还可以包括电源650,用于给终端设备中的各种器件或电路提供电源。
应理解,图6所示的设备600能够实现图2以及图3所示方法实施例的各个过程。设备600中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
图7为本申请实施例提供的一种芯片硬件结构,该芯片包括神经网络处理器70。该 芯片可以被设置在如图6所示的设备600中,例如可以设置在设备600的处理器610中。如图1所示的卷积神经网络中各层的算法均可在如图7所示的芯片中得以实现。
应理解,图6所示的设备600中的处理器610可以是片上系统(system on a chip,SOC),该处理器610中可以包括中央处理器(central processing unit,CPU)以及图7所示的(neural-network processing unit,NPU)70,还可进一步包括其他类型的处理器,例如对应于摄像头690的图像信号处理器(ISP),该ISP可包括之前实施例提到的图像处理通道。所述CPU可以叫主CPU,神经网络处理器NPU 70作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。各部分处理器配合工作实现之前的方法流程,并且每部分处理器可以选择性执行一部分软件驱动程序。
例如,图2中步骤210可以由ISP执行,220可以由NPU70执行,230可以由CPU执行。再例如,图3中301、314和315由CPU执行,302由ISP执行,303-313可以由NPU执行。或者可替换的,303可以由CPU而非NPU执行。总之,处理器610内部的各部分处理器或处理单元可以共同配合实现之前的方法流程,且各部分处理器或处理单元相应的软件程序可存储在存储器630中。以上NPU70仅用于举例,实际的神经网络功能可以由除了NPU70之外的处理设备代替,如图像处理器(graphics processing unit,GPU)也可用于神经网络的处理,本实施例对此不限定。
以图7为例,NPU的核心部分为运算电路703,控制器704控制运算电路703提取权重存储器702或者输入存储器701中的数据并进行运算。
在一些可能的实现方式中,运算电路703内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路703是二维阵列。运算电路703还可以是一维阵列或者能够执行例如,乘法和加法这样的数学运算的其它电子线路。
在一些实现中,运算电路703是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器702中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器701中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器708中。
向量计算单元707可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元707可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization,LRN)等。
在一些实现中,向量计算单元能707将经处理的输出的向量存储到统一缓存器706。例如,向量计算单元707可以将非线性函数应用到运算电路703的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元707生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路703的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器706用于存放输入数据以及输出数据。
权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)705将外部存储器中的输入数据搬运到输入存储器701和/或统一存储器706、将外部存储器中的权重数据存入权重存储器702,以及将统一存储器706中的数据存入外部存储器。
总线接口单元(bus interface unit,BIU)710,用于通过总线实现主CPU、DMAC和取指存储器709之间进行交互。与控制器704连接的取指存储器(instruction fetch buffer)709,用于存储控制器704使用的指令;控制器704,用于调用取指存储器709中缓存的指令,实现控制该运算加速器的工作过程。其中,图1所示的卷积神经网络中各层的运算可以由运算电路703或向量计算单元707执行。
示例性地,本申请实施例中,提供的一种神经网络的训练方法,可以根据至少一个用户的特征图像进行训练,确定预先训练的深度神经网络。例如,可以根据至少一个用户的非注视面部图像、注视面部图像、头部姿态信息以及注视方向信息进行训练,确定预先训练的深度神经网络。
可选的,训练方法可以由CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,本申请不做限制。
本申请还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机执行上述如图2、图3所示的唤醒屏幕的方法中的各个步骤。
本申请还提供了一种包含指令的计算机程序产品,当该计算机程序产品在计算机或任一至少一种处理器上运行时,使得计算机执行如图2、图3所示的唤醒屏幕的方法中的各个步骤。
本申请还提供一种芯片,包括处理器。该处理器用于读取并运行存储器中存储的计算机程序,以执行本申请提供的唤醒屏幕的方法执行的相应操作和/或流程。
可选地,该芯片还包括存储器,该存储器与该处理器通过电路或电线与存储器连接,处理器用于读取并执行该存储器中的计算机程序。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理。该通信接口可以是输入输出接口。
以上各实施例中,涉及的处理器610可以例如包括中央处理器(central processing unit,CPU)、微处理器、微控制器或数字信号处理器,还可包括GPU、NPU和ISP,该处理器还可包括必要的硬件加速器或逻辑处理硬件电路,如特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本申请技术方案程序执行的集成电路等。此外,处理器可以具有操作一个或多个软件程序的功能,软件程序可以存储在存储器中。
存储器可以是只读存储器(read-only memory,ROM)、可存储静态信息和指令的其它类型的静态存储设备、随机存取存储器(random access memory,RAM)或可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者还可以是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其它介质等。
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以 表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
本领域普通技术人员可以意识到,本文中公开的实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种唤醒屏幕的方法,其特征在于,包括:
    获取M个图像帧,其中,每个图像帧包括第一面部图像,M为大于或等于1的整数;
    根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;
    当所述每个第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述预先配置的神经网络,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户,包括:
    利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户;
    当所述每个第一面部图像属于所述用户时,利用所述预先配置的神经网络确定所述每个第一面部图像是否匹配所述预设面部图像。
  3. 如权利要求2所述的方法,其特征在于,所述利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户,包括:
    利用所述预先配置的神经网络确定所述每个第一面部图像属于所述用户的概率值;
    当所述概率值大于预设阈值时,确定所述每个第一面部图像属于所述用户。
  4. 如权利要求1至3中任一项所述的方法,其特征在于,在获取M个图像帧后,还包括:
    在所述每个图像帧中确定第一面部框,所述第一面部框为所述每个图像帧中包括的至少一个面部框中面积最大的面部框;
    根据位于所述第一面部框中的第二面部图像确定所述第一面部图像。
  5. 如权利要求4所述的方法,其特征在于,还包括:
    获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;
    所述根据位于所述第一面部框中的第二面部图像确定所述第一面部图像包括:
    当所述第二面部图像的方向不匹配预设标准方向时,对所述第二面部图像进行旋转处理,以得到匹配于所述预设标准方向的所述第一面部图像。
  6. 如权利要求1至5中任一项所述的方法,其特征在于,所述获取M个图像帧包括:
    在所述屏幕处于所述灭屏状态时,获取所述M个图像帧。
  7. 如权利要求1至6中任一项所述的方法,其特征在于,所述预先配置的神经网络为深度神经网络。
  8. 一种唤醒屏幕的装置,其特征在于,所述装置包括:
    获取单元,用于获取M个图像帧,其中,每个图像帧包括第一面部图像,M为大于或等于1的整数;
    处理单元,用于根据预先配置的神经网路,确定每个第一面部图像是否匹配预设面部图像且属于注视设备的屏幕的用户;
    唤醒单元,用于当所述每个第一面部图像均匹配所述预设面部图像且属于所述用户时,将所述屏幕由灭屏状态切换至亮屏状态。
  9. 如权利要求8所述的装置,其特征在于,所述处理单元具体用于:
    利用所述预先配置的神经网络确定所述每个第一面部图像是否属于所述用户;
    当所述每个第一面部图像属于所述用户时,利用所述预先配置的神经网络确定所述每个第一面部图像是否匹配所述预设面部图像。
  10. 如权利要求9所述的装置,其特征在于,所述处理单元具体用于:
    利用所述预先配置的神经网络确定所述每个第一面部图像属于所述用户的概率值;
    当所述概率值大于预设阈值时,确定所述每个第一面部图像属于所述用户。
  11. 如权利要求8至10中任一项所述的装置,其特征在于,所述处理单元还用于:
    在所述每个图像帧中确定第一面部框,所述第一面部框为所述每个图像帧中包括的至少一个面部框中面积最大的面部框,;
    根据位于所述第一面部框中的第二面部图像确定所述第一面部图像。
  12. 如权利要求11所述的装置,其特征在于,所述获取单元还用于:
    获取面部方向信息,所述面部方向信息用于指示所述第二面部图像的方向;
    所述处理单元具体用于:
    当所述第二面部图像的方向不匹配预设标准方向时,对所述第二面部图像进行旋转处理,以得到匹配于所述预设标准方向的所述第一面部图像。
  13. 如权利要求8至12中任一项所述的装置,其特征在于,所述获取单元具体用于:
    在所述屏幕处于所述灭屏状态时,获取所述M个图像帧。
  14. 如权利要求8至13中任一项所述的装置,其特征在于,所述预先配置的神经网络为深度神经网络。
  15. 一种唤醒屏幕的装置,包括存储器和处理器,所述存储器用于存储计算机程序,所述处理器用于从所述存储器中调用并运行所述计算机程序,以执行如权利要求1至7中任一项所述的方法。
  16. 一种计算机可读存储介质,其特征在于,包括计算机程序,当其在计算机设备或处理器上运行时,使得所述计算机设备或处理器执行如权利要求1至7中任一项所述的方法。
PCT/CN2019/077991 2019-03-13 2019-03-13 唤醒屏幕的方法和装置 WO2020181523A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP19919494.5A EP3910507A4 (en) 2019-03-13 2019-03-13 METHOD AND DEVICE FOR WAKE-UP A SCREEN
CN201980010395.4A CN111936990A (zh) 2019-03-13 2019-03-13 唤醒屏幕的方法和装置
PCT/CN2019/077991 WO2020181523A1 (zh) 2019-03-13 2019-03-13 唤醒屏幕的方法和装置
US17/408,738 US20210382542A1 (en) 2019-03-13 2021-08-23 Screen wakeup method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/077991 WO2020181523A1 (zh) 2019-03-13 2019-03-13 唤醒屏幕的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/408,738 Continuation US20210382542A1 (en) 2019-03-13 2021-08-23 Screen wakeup method and apparatus

Publications (1)

Publication Number Publication Date
WO2020181523A1 true WO2020181523A1 (zh) 2020-09-17

Family

ID=72427764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/077991 WO2020181523A1 (zh) 2019-03-13 2019-03-13 唤醒屏幕的方法和装置

Country Status (4)

Country Link
US (1) US20210382542A1 (zh)
EP (1) EP3910507A4 (zh)
CN (1) CN111936990A (zh)
WO (1) WO2020181523A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113628579A (zh) * 2021-08-09 2021-11-09 深圳市优聚显示技术有限公司 一种led节能显示方法、led显示屏系统及lcd显示设备
WO2022089216A1 (zh) * 2020-10-28 2022-05-05 华为技术有限公司 一种界面显示的方法和电子设备
US20220221932A1 (en) * 2021-01-12 2022-07-14 Microsoft Technology Licensing, Llc Controlling a function via gaze detection
US20220400228A1 (en) 2021-06-09 2022-12-15 Microsoft Technology Licensing, Llc Adjusting participant gaze in video conferences
EP4191992A4 (en) * 2021-08-09 2024-05-08 Honor Device Co., Ltd. PARAMETER SETTING METHOD, DISPLAY CONTROL METHOD, ELECTRONIC DEVICE AND MEDIUM

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796874A (zh) * 2020-06-28 2020-10-20 北京百度网讯科技有限公司 一种设备唤醒的方法、装置、计算机设备和存储介质
WO2023113994A1 (en) * 2021-12-17 2023-06-22 Google Llc Human presence sensor for client devices
CN114779916B (zh) * 2022-03-29 2024-06-11 杭州海康威视数字技术股份有限公司 一种电子设备屏幕唤醒方法、门禁管理方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912903A (zh) * 2016-04-06 2016-08-31 上海斐讯数据通信技术有限公司 一种移动终端的解锁方法及移动终端
CN106485123A (zh) * 2016-10-17 2017-03-08 信利光电股份有限公司 一种冷屏唤醒方法及智能终端
CN106878559A (zh) * 2017-02-17 2017-06-20 宇龙计算机通信科技(深圳)有限公司 一种屏幕状态调整方法和装置
CN108549802A (zh) * 2018-03-13 2018-09-18 维沃移动通信有限公司 一种基于人脸识别的解锁方法、装置以及移动终端
CN108830061A (zh) * 2018-05-25 2018-11-16 努比亚技术有限公司 基于人脸识别的终端解锁方法、移动终端及可读存储介质
CN108875333A (zh) * 2017-09-22 2018-11-23 北京旷视科技有限公司 终端解锁方法、终端和计算机可读存储介质
US20180367656A1 (en) * 2017-06-15 2018-12-20 Lg Electronics Inc. Mobile terminal and method for controlling the same

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8326001B2 (en) * 2010-06-29 2012-12-04 Apple Inc. Low threshold face recognition
CN104115485B (zh) * 2013-02-13 2018-12-28 华为技术有限公司 具有显示状态控制的移动电子装置
CN103677267A (zh) * 2013-12-09 2014-03-26 惠州Tcl移动通信有限公司 移动终端及其唤醒方法、装置
US10027883B1 (en) * 2014-06-18 2018-07-17 Amazon Technologies, Inc. Primary user selection for head tracking
US10127680B2 (en) * 2016-06-28 2018-11-13 Google Llc Eye gaze tracking using neural networks
WO2018132721A1 (en) * 2017-01-12 2018-07-19 The Regents Of The University Of Colorado, A Body Corporate Method and system for implementing three-dimensional facial modeling and visual speech synthesis
CN109212753B (zh) * 2017-07-06 2021-01-29 京东方科技集团股份有限公司 可变显示距离的抬头显示系统、抬头显示方法、驾驶设备
CN107633207B (zh) * 2017-08-17 2018-10-12 平安科技(深圳)有限公司 Au特征识别方法、装置及存储介质
CN107566650B (zh) * 2017-09-12 2020-01-31 Oppo广东移动通信有限公司 解锁控制方法及相关产品
CN108345780B (zh) * 2018-02-11 2020-06-02 维沃移动通信有限公司 一种解锁控制方法及移动终端
US10970571B2 (en) * 2018-06-04 2021-04-06 Shanghai Sensetime Intelligent Technology Co., Ltd. Vehicle control method and system, vehicle-mounted intelligent system, electronic device, and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912903A (zh) * 2016-04-06 2016-08-31 上海斐讯数据通信技术有限公司 一种移动终端的解锁方法及移动终端
CN106485123A (zh) * 2016-10-17 2017-03-08 信利光电股份有限公司 一种冷屏唤醒方法及智能终端
CN106878559A (zh) * 2017-02-17 2017-06-20 宇龙计算机通信科技(深圳)有限公司 一种屏幕状态调整方法和装置
US20180367656A1 (en) * 2017-06-15 2018-12-20 Lg Electronics Inc. Mobile terminal and method for controlling the same
CN108875333A (zh) * 2017-09-22 2018-11-23 北京旷视科技有限公司 终端解锁方法、终端和计算机可读存储介质
CN108549802A (zh) * 2018-03-13 2018-09-18 维沃移动通信有限公司 一种基于人脸识别的解锁方法、装置以及移动终端
CN108830061A (zh) * 2018-05-25 2018-11-16 努比亚技术有限公司 基于人脸识别的终端解锁方法、移动终端及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3910507A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022089216A1 (zh) * 2020-10-28 2022-05-05 华为技术有限公司 一种界面显示的方法和电子设备
CN114500732A (zh) * 2020-10-28 2022-05-13 华为技术有限公司 一种界面显示的方法和电子设备
CN114500732B (zh) * 2020-10-28 2023-03-31 华为技术有限公司 一种界面显示的方法和电子设备、存储介质
US20220221932A1 (en) * 2021-01-12 2022-07-14 Microsoft Technology Licensing, Llc Controlling a function via gaze detection
US20220400228A1 (en) 2021-06-09 2022-12-15 Microsoft Technology Licensing, Llc Adjusting participant gaze in video conferences
US11871147B2 (en) 2021-06-09 2024-01-09 Microsoft Technology Licensing, Llc Adjusting participant gaze in video conferences
CN113628579A (zh) * 2021-08-09 2021-11-09 深圳市优聚显示技术有限公司 一种led节能显示方法、led显示屏系统及lcd显示设备
EP4191992A4 (en) * 2021-08-09 2024-05-08 Honor Device Co., Ltd. PARAMETER SETTING METHOD, DISPLAY CONTROL METHOD, ELECTRONIC DEVICE AND MEDIUM

Also Published As

Publication number Publication date
EP3910507A1 (en) 2021-11-17
EP3910507A4 (en) 2022-01-26
CN111936990A (zh) 2020-11-13
US20210382542A1 (en) 2021-12-09

Similar Documents

Publication Publication Date Title
WO2020181523A1 (zh) 唤醒屏幕的方法和装置
CN111476306B (zh) 基于人工智能的物体检测方法、装置、设备及存储介质
US20220076000A1 (en) Image Processing Method And Apparatus
CN109902546B (zh) 人脸识别方法、装置及计算机可读介质
CN109299315B (zh) 多媒体资源分类方法、装置、计算机设备及存储介质
US11488293B1 (en) Method for processing images and electronic device
CN111274916B (zh) 人脸识别方法和人脸识别装置
CN110544272B (zh) 脸部跟踪方法、装置、计算机设备及存储介质
WO2020199611A1 (zh) 活体检测方法和装置、电子设备及存储介质
WO2021190296A1 (zh) 一种动态手势识别方法及设备
CN112036331B (zh) 活体检测模型的训练方法、装置、设备及存储介质
CN109543714A (zh) 数据特征的获取方法、装置、电子设备及存储介质
CN110147533B (zh) 编码方法、装置、设备及存储介质
WO2021047587A1 (zh) 手势识别方法、电子设备、计算机可读存储介质和芯片
CN111242273B (zh) 一种神经网络模型训练方法及电子设备
WO2020228181A1 (zh) 手掌图像裁剪方法、装置、计算机设备及存储介质
CN110765924A (zh) 一种活体检测方法、装置以及计算机可读存储介质
CN113269010B (zh) 一种人脸活体检测模型的训练方法和相关装置
CN112580472A (zh) 一种快速轻量的人脸识别方法、装置、机器可读介质及设备
CN114283448A (zh) 一种基于头部姿态估计的儿童坐姿提醒方法和系统
CN112818979B (zh) 文本识别方法、装置、设备及存储介质
CN111460416B (zh) 一种基于微信小程序平台的人脸特征与动态属性的认证方法
CN110232417B (zh) 图像识别方法、装置、计算机设备及计算机可读存储介质
CN112381064B (zh) 一种基于时空图卷积网络的人脸检测方法及装置
CN115660969A (zh) 图像处理方法、模型训练方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919494

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019919494

Country of ref document: EP

Effective date: 20210812

NENP Non-entry into the national phase

Ref country code: DE