WO2021196721A1 - 一种舱内环境的调整方法及装置 - Google Patents

一种舱内环境的调整方法及装置 Download PDF

Info

Publication number
WO2021196721A1
WO2021196721A1 PCT/CN2020/135500 CN2020135500W WO2021196721A1 WO 2021196721 A1 WO2021196721 A1 WO 2021196721A1 CN 2020135500 W CN2020135500 W CN 2020135500W WO 2021196721 A1 WO2021196721 A1 WO 2021196721A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample image
age
image
value
difference
Prior art date
Application number
PCT/CN2020/135500
Other languages
English (en)
French (fr)
Inventor
王飞
钱晨
Original Assignee
上海商汤临港智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤临港智能科技有限公司 filed Critical 上海商汤临港智能科技有限公司
Priority to KR1020227013199A priority Critical patent/KR20220063256A/ko
Priority to JP2022524727A priority patent/JP2022553779A/ja
Publication of WO2021196721A1 publication Critical patent/WO2021196721A1/zh
Priority to US17/722,554 priority patent/US20220237943A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/0098Details of control systems ensuring comfort, safety or stability not otherwise provided for
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0002Automatic control, details of type of controller or control system architecture
    • B60W2050/0004In digital systems, e.g. discrete-time systems involving sampling
    • B60W2050/0005Processor details or data handling, e.g. memory registers or chip architecture

Definitions

  • the present disclosure relates to the field of computer technology, and relates to a method and device for adjusting the cabin environment.
  • the process of setting the cabin environment for example, when it is necessary to adjust the cabin temperature and adjust the music played in the cabin, it is generally manually adjusted by the user.
  • face recognition technology With the development of face recognition technology, it can be adjusted in advance.
  • a user sets the corresponding environmental information. After the user gets in the car, the user's identity is recognized through face recognition technology, and then after the user's identity is recognized, the environmental information corresponding to the identity is obtained, and then the cabin environment is set.
  • the embodiments of the present disclosure provide at least a method and device for adjusting the cabin environment.
  • the embodiments of the present disclosure provide a method for adjusting the in-cabin environment, including:
  • the cabin environment is adjusted.
  • the attribute information includes age information
  • the age information is obtained through recognition of a first neural network
  • the first neural network is obtained according to the following method: Perform age prediction on the sample images in the image collection to obtain the predicted age value corresponding to the sample image; based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, the sample The difference between the predicted age values of the sample images in the image set and the age value of the age labels of the sample images in the sample image set is used to adjust the network parameter values of the first neural network.
  • the sample image set is multiple; the sample image is based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image.
  • the difference between the predicted age values of the sample images in the set and the age value of the age labels of the sample images in the sample image set, adjusting the network parameter value of the first neural network includes: based on each of the sample images The difference between the corresponding predicted age value and the age value of the age label of the sample image, the difference between the predicted age value of any two sample images in the same sample image set, and the age value of the age label of any two sample images , Adjust the network parameter value of the first neural network.
  • the sample image set includes a plurality of initial sample images, and an enhanced sample image corresponding to each of the initial sample images, and the enhanced sample image is information about the initial sample image.
  • the difference between the age value of the label and the difference between the predicted age value of the initial sample image and the predicted age value of the enhanced sample image corresponding to the initial sample image adjusts the network parameter value of the first neural network; wherein, the sample The image is an initial sample image or an enhanced sample image.
  • each sample image set includes multiple initial sample images, and an enhanced sample image corresponding to each initial sample image.
  • the sample image is an image after information transformation processing is performed on the initial sample image, and multiple initial sample images in the same sample image set are acquired by the same image acquisition device; the predicted age is based on each sample image The difference between the value and the age value of the age label of the sample image, the difference between the predicted age value of the sample image in the sample image set, and the difference between the age value of the age label of the sample image in the sample image set, Adjusting the network parameter value of the first neural network includes: predicting any two sample images in the same sample image set based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image The difference between the age values, the difference between the age values of the age labels of any two sample images, and the difference between the predicted age value of the initial sample image and the predicted age value of the enhanced sample image corresponding to the initial sample image are calculated The loss value in this
  • the predicted age of any two sample images in the same sample image set is based on the difference between the predicted age value corresponding to each of the sample images and the age value of the age label of the sample image.
  • the difference between the values, the difference between the age values of the age labels of any two sample images, and the difference between the predicted age value of the initial sample image and the predicted age value of the enhanced sample image corresponding to the initial sample image are calculated.
  • the loss value in the second training process includes: according to the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, the predicted age value of any two sample images in the same sample image set Calculate the first loss value based on the difference between the age values of the age labels of any two sample images and the difference between the age values of the age labels of the two sample images; The difference between the predicted age values is calculated, and the second loss value is calculated; the sum of the first loss value and the second loss value is used as the loss value in this training process.
  • the enhanced sample image corresponding to the initial sample image is determined according to the following method: generating a three-dimensional face model corresponding to the face region image in the initial sample image; performing the three-dimensional face model Rotate at different angles to obtain the first enhanced sample image at different angles; and, add the value of each pixel in the initial sample image on the RGB channel and different light influence values to obtain The second enhanced sample image under the light influence value; the enhanced sample image is the first enhanced sample image or the second enhanced sample image.
  • the attribute information includes gender information
  • the gender information of the people in the cabin is determined according to the following method: input the face image into the second neural network for gender information extraction, Obtain the two-dimensional feature vector output by the second neural network, the element value in the first dimension in the two-dimensional feature vector is used to characterize the probability that the face image is male, and the element value in the second dimension is used for Characterizing the probability that the face image is a female; inputting the two-dimensional feature vector into a classifier, and determining a gender with a probability greater than a set threshold as the gender of the face image.
  • the set threshold is determined according to the following method: acquiring multiple sample images collected in the cabin by the image acquisition device that collects the face image, and each sample image corresponds to The gender label; input the multiple sample images into the second neural network to obtain the predicted gender corresponding to each of the sample images under each of the multiple candidate thresholds; for each The candidate threshold determines the prediction accuracy rate under the candidate threshold according to the predicted gender and gender label corresponding to each of the sample images under the candidate threshold; the candidate threshold corresponding to the maximum prediction accuracy rate is determined as the candidate threshold.
  • the setting threshold is acquiring multiple sample images collected in the cabin by the image acquisition device that collects the face image, and each sample image corresponds to The gender label; input the multiple sample images into the second neural network to obtain the predicted gender corresponding to each of the sample images under each of the multiple candidate thresholds; for each The candidate threshold determines the prediction accuracy rate under the candidate threshold according to the predicted gender and gender label corresponding to each of the sample images under the candidate threshold; the candidate threshold corresponding to the maximum prediction accuracy rate is determined as the candidate threshold.
  • the multiple candidate thresholds are determined according to the following method: according to a set step size, the multiple candidate thresholds are selected from a preset value range.
  • the status information includes open and closed eyes information
  • the open and closed eyes information of the cabin personnel is determined according to the following method: feature extraction is performed on the face image to obtain a multi-dimensional feature vector, so The element value in each dimension in the multi-dimensional feature vector is used to characterize the probability that the eyes in the face image are in the state corresponding to the dimension; the state corresponding to the dimension with the probability greater than the preset value is determined as the cabin Insider’s eye-opening message.
  • the state of the eye includes at least one of the following states: invisible to the human eye; visible to the human eye and open; and visible to the human eye and closed.
  • the state information includes emotional information
  • the emotional information of the people in the cabin is determined according to the following steps: according to the face image, at least two of the faces represented by the face image are identified An action of each of the organs in the organ; determining the emotion information of the cabin personnel based on the recognized action of each of the organs and a preset mapping relationship between facial actions and emotional information.
  • the actions of the organs on the human face include at least two of the following actions: frowning; staring; the corners of the mouth are raised; the upper lip is raised; the corners of the mouth are lowered; and the mouth is opened.
  • the action of recognizing each of the at least two organs on the face represented by the face image according to the face image is executed by a third neural network, and
  • the third neural network includes a backbone network and at least two classification branch networks, each of the classification branch networks is used to identify an action of an organ on a human face; according to the facial image to identify what the facial image represents
  • the actions of each of the at least two organs on the face include: extracting features of the face image using a backbone network to obtain a feature map of the face image; using each of the classification branches separately
  • the network performs action recognition on the feature map of the face image, and obtains the occurrence probability of the action that can be recognized by each of the classification branch networks; the action with the occurrence probability greater than the preset probability is determined as the face represented by the face image On the movement of the organs.
  • the environmental settings in the adjustment cabin include at least one of the following types of adjustments: adjusting the music type; adjusting the temperature; adjusting the light type; adjusting the smell.
  • the embodiments of the present disclosure also provide a device for adjusting the cabin environment, including:
  • the acquisition module is configured to acquire facial images of people in the cabin
  • the determining module is configured to determine the attribute information and status information of the person in the cabin based on the face image;
  • the adjustment module is configured to adjust the cabin environment based on the attribute information and status information of the cabin personnel.
  • the attribute information includes age information, and the age information is obtained through identification of the first neural network;
  • the device also includes a training module configured to obtain the first neural network according to the following method: perform age prediction on the sample images in the sample image set through the first neural network to be trained to obtain the The predicted age value corresponding to the sample image; based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, the difference between the predicted age value of the sample images in the sample image set, And the difference between the age values of the age labels of the sample images in the sample image set, and adjust the network parameter values of the first neural network.
  • the training module is further configured to: based on the predicted age value corresponding to each sample image and the age of the age label of the sample image The value difference, the difference between the predicted age values of any two sample images in the same sample image set, and the difference between the age values of the age labels of any two sample images, adjust the network parameter values of the first neural network.
  • the sample image set includes a plurality of initial sample images, and an enhanced sample image corresponding to each of the initial sample images, and the enhanced sample image is information about the initial sample image.
  • the training module is further configured to: based on the difference between the predicted age value corresponding to each of the sample images and the age value of the age label of the sample image, and the prediction of the initial sample image Adjust the network parameter value of the first neural network by adjusting the difference between the age value and the predicted age value of the enhanced sample image corresponding to the initial sample image; wherein the sample image is the initial sample image or the enhanced sample image.
  • each sample image set includes multiple initial sample images, and an enhanced sample image corresponding to each initial sample image.
  • the sample image is an image after information transformation processing is performed on the initial sample image, and multiple initial sample images in the same sample image set are acquired by the same image acquisition device;
  • the training module is further configured to: The difference between the predicted age value corresponding to the sample image and the age value of the age label of the sample image, the difference between the predicted age value of any two sample images in the same sample image set, and the age label of any two sample images The difference between the age value of and the predicted age value of the initial sample image and the predicted age value of the enhanced sample image corresponding to the initial sample image, calculate the loss value during this training process, and based on the calculated loss Value, adjust the network parameter value of the first neural network; wherein, the sample image is an initial sample image or an enhanced sample image.
  • the training module is further configured to: according to the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, the same sample image set Calculate the first loss value based on the difference between the predicted age values of any two sample images and the difference between the age values of the age labels of the any two sample images; and, according to the predicted age value of the initial sample image and the The difference between the predicted age value of the enhanced sample image corresponding to the initial sample image is calculated, and the second loss value is calculated; the sum of the first loss value and the second loss value is used as the loss value in this training process.
  • the training module is further configured to determine the enhanced sample image corresponding to the initial sample image according to the following method: generate a three-dimensional face model corresponding to the face region image in the initial sample image; The three-dimensional face model is rotated at different angles to obtain the first enhanced sample image at different angles; and, the value of each pixel in the initial sample image on the RGB channel is compared with different light influence values Adding to obtain a second enhanced sample image under different light influence values; the enhanced sample image is the first enhanced sample image or the second enhanced sample image.
  • the attribute information includes gender information
  • the determining module is further configured to determine the gender information of the cabin personnel according to the following method: inputting the face image for performing gender In the second neural network for information extraction, the two-dimensional feature vector output by the second neural network is obtained, and the element value in the first dimension in the two-dimensional feature vector is used to represent the probability that the face image is male, The element value on the second dimension is used to characterize the probability that the face image is female; the two-dimensional feature vector is input to the classifier, and the gender with the probability greater than the set threshold is determined as the gender of the face image .
  • the determining module is further configured to determine the set threshold according to the following method: acquiring a plurality of sample images collected in the cabin by the image acquisition device that collects the face image , And the gender label corresponding to each of the sample images; input the multiple sample images into the second neural network to obtain each of the sample images in each of the multiple candidate thresholds Lower the corresponding predicted gender; for each candidate threshold, determine the prediction accuracy rate under the candidate threshold according to the predicted gender and gender label corresponding to each sample image under the candidate threshold; The candidate threshold corresponding to the accuracy rate is determined as the set threshold.
  • the determining module is further configured to determine the multiple candidate thresholds according to the following method: according to a set step size, the multiple candidate thresholds are selected from a preset value range.
  • the status information includes open and closed eyes information
  • the determining module is further configured to determine the open and closed eyes information of the cabin personnel according to the following method: Feature extraction to obtain a multi-dimensional feature vector.
  • the element value in each dimension of the multi-dimensional feature vector is used to represent the probability that the eyes in the face image are in the state corresponding to the dimension; the dimension whose probability is greater than the preset value
  • the corresponding state is determined to be the information of the open and closed eyes of the personnel in the cabin.
  • the state of the eye includes at least one of the following states: invisible to the human eye; visible to the human eye and open; and visible to the human eye and closed.
  • the state information includes emotional information
  • the determining module is further configured to determine the emotional information of the people in the cabin according to the following steps: recognize the facial image according to the facial image Represents the action of each of the at least two organs on the human face; based on the recognized action of each of the organs and the preset mapping relationship between facial actions and emotional information, determine the State the emotional information of the people in the cabin.
  • the actions of the organs on the human face include at least two of the following actions: frowning; staring; the corners of the mouth are raised; the upper lip is raised; the corners of the mouth are lowered; and the mouth is opened.
  • the action of recognizing each of the at least two organs on the face represented by the face image according to the face image is executed by a third neural network, and
  • the third neural network includes a backbone network and at least two classification branch networks, each of the classification branch networks is used to recognize an action of an organ on a human face;
  • the determining module is further configured to: use a backbone network to perform feature extraction on the face image to obtain a feature map of the face image; and use each of the classification branch networks to analyze the features of the face image.
  • the graph performs action recognition to obtain the occurrence probability of the actions that can be recognized by each of the classification branch networks; the action with the occurrence probability greater than the preset probability is determined as the action of the organ on the face represented by the face image.
  • the environmental settings in the adjustment cabin include at least one of the following types of adjustments: adjusting the music type; adjusting the temperature; adjusting the light type; adjusting the smell.
  • embodiments of the present disclosure also provide an electronic device, including a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processing When the electronic device is running, the processing The processor and the memory communicate through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect or the steps in any one of the possible implementation manners of the first aspect are executed.
  • the embodiments of the present disclosure also provide a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the computer program executes the first aspect or any of the first aspects when the computer program is run by a processor. Steps in a possible implementation.
  • the embodiments of the present disclosure also provide a computer program, including computer-readable code.
  • the processor in the electronic device executes the above-mentioned first aspect. And any possible way to achieve it.
  • FIG. 1 shows a schematic flowchart of a method for adjusting the cabin environment provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic flowchart of a first neural network training method provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic flowchart of a method for determining an enhanced sample image provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic flowchart of a method for determining gender information of cabin personnel provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic flowchart of a method for determining a setting threshold provided by an embodiment of the present disclosure
  • FIG. 6 shows a schematic flowchart of a method for determining information about opening and closing eyes of a cabin crew provided by an embodiment of the present disclosure
  • FIG. 7 shows a schematic flowchart of a method for determining attribute information provided by an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of a network structure of an information extraction neural network provided by an embodiment of the present disclosure
  • FIG. 9 shows a schematic flowchart of a method for determining emotional information of cabin personnel provided by an embodiment of the present disclosure
  • FIG. 10 shows a schematic structural diagram of a device for adjusting an in-cabin environment provided by an embodiment of the present disclosure
  • FIG. 11 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the related technology in the process of adjusting the environment settings in the cabin, one is to manually adjust, and the other is to preset the environment setting information corresponding to each user, and then identify the passengers in the cabin.
  • Identity information based on the identified identity information, adjust the environmental settings according to the environmental setting information corresponding to the identity information; if the passengers in the cabin have not preset the corresponding environmental setting information, or the passengers in the cabin do not want to follow the preset settings
  • Good environmental setting information is used to set the cabin environment, which still requires passengers to manually adjust the cabin environment settings.
  • the embodiments of the present disclosure provide a method for adjusting the cabin environment, which can obtain the facial images of the cabin personnel in real time, and determine the attribute information and emotional information of the cabin personnel based on the facial images, and then based on the cabin environment Attribute information and emotional information of the personnel inside, adjust the environment settings in the cabin.
  • the determined attribute information and emotional information of the cabin personnel can represent the current state of the cabin personnel, and the cabin environment can be adjusted according to the current state of the cabin personnel Settings can automatically dynamically adjust the cabin environment settings.
  • the execution subject of the method for adjusting the cabin environment provided by the embodiments of the present disclosure generally has certain computing capabilities.
  • the cabins may include, but are not limited to, car cabins, train cabins, boat cabins, etc.
  • the methods provided in the embodiments of the present disclosure are applicable.
  • a schematic flow chart of a method for adjusting the cabin environment provided by an embodiment of the present disclosure includes the following steps:
  • Step 101 Obtain face images of people in the cabin.
  • Step 102 Determine the attribute information and status information of the person in the cabin based on the face image.
  • Step 103 Adjust the environment settings in the cabin based on the attribute information and status information of the personnel in the cabin.
  • the face image of the cabin crew can be obtained in real time, and the attribute information and emotional information of the cabin crew can be determined according to the face image, and then the cabin environment can be adjusted based on the cabin crew's attribute information and emotional information set up.
  • the determined attribute information and emotional information of the cabin personnel can represent the current state of the cabin personnel, and the cabin environment can be adjusted according to the current state of the cabin personnel Settings can automatically dynamically adjust the cabin environment settings.
  • the face image of the person in the cabin may be an image including the complete face of the person in the cabin.
  • the image to be inspected may be collected in real time and acquired in real time.
  • the image to be inspected may be captured in real time by a camera installed in the cabin.
  • the face area information in the image to be detected includes the coordinates of the center point of the detection frame corresponding to the face area and the size information of the detection frame.
  • the size information of the detection frame can be enlarged according to a preset ratio to obtain the enlarged size information, and then based on the center point coordinate information and the enlarged size information , To intercept the face image from the image to be detected.
  • the area corresponding to the detection frame output by the face detection neural network may not contain all the face information of the people in the cabin. Therefore, the detection frame can be enlarged to make the obtained face image include all the face information. Face information.
  • the size information may include the length of the detection frame and the width of the detection frame.
  • the length of the detection frame may be separately
  • the width of the detection frame is enlarged according to a corresponding preset ratio, wherein the preset ratio corresponding to the length of the detection frame and the preset ratio corresponding to the width of the detection frame may be the same.
  • the length of the detection frame is a and the width is b
  • the length of the detection width is 1.1a
  • the detection The width of the frame is 1.1b.
  • the point corresponding to the center point coordinate information can be used as the intersection of the diagonals, and then the enlarged size information
  • the length and width in are used as the length and width of the detection frame to determine the position of the detection frame in the image to be detected.
  • the detection frame is used as the dividing line to intercept the image from the image to be detected, and the intercepted image is the face image.
  • the sample data of the face detection neural network can be sample images, each sample image has corresponding label data, and the label data corresponding to the sample image includes the center point coordinate information in the sample image
  • the size information corresponding to the detection frame after each sample image is input to the face detection neural network, the face detection neural network can obtain the predicted center point coordinate information and the predicted detection frame size information, and then based on the predicted center point
  • the coordinate information, the size information of the predicted detection frame, and the label data corresponding to the sample image are used to determine the loss value during this training process, and if the loss value does not meet the preset conditions, adjust the face detection during this training process
  • the network parameter value of the neural network is used to determine the loss value during this training process, and if the loss value does not meet the preset conditions, adjust the face detection during this training process.
  • step 102 For step 102:
  • the attribute information of the cabin personnel may include at least one of the following information: age information; gender information; race information.
  • the status information of the cabin personnel may include the emotional information of the cabin personnel and the information of opening and closing their eyes. Among them, the information of opening and closing the eyes can be used to detect whether the cabin personnel are in a sleep state.
  • the emotional information may include, but is not limited to, the following expressions Any kind: angry, sad, calm, happy, depressed, etc.
  • the attributes of the cabin personnel can be recognized based on the face image, the attribute information of the cabin personnel can be determined, and the facial expression recognition and/or the closure of the cabin personnel can be performed based on the face image. Eye recognition to determine the status information of the personnel in the cabin.
  • the age information can be obtained through identification of the first neural network.
  • the training process of the first neural network may include the following steps according to the method shown in Figure 2:
  • Step 201 Perform age prediction on the sample images in the sample image set through the first neural network to be trained to obtain the predicted age value corresponding to the sample image.
  • Step 202 Based on the difference between the predicted age value corresponding to each of the sample images and the age value of the age label of the sample image, the difference between the predicted age values of the sample images in the sample image set, and the sample image The difference between the age values of the age labels of the sample images in the set is adjusted to the network parameter values of the first neural network.
  • the steps of adjusting the network parameters of the first neural network described above can be divided into the following situations:
  • Case 1 There are multiple sample image sets.
  • the difference between the predicted age value corresponding to each of the sample images and the age value of the age label of the sample image may be based on the predicted age value of each sample image and the age label of the sample image Adjust the network parameter value of the first neural network.
  • model loss value in the training process can be calculated by the following formula (1):
  • Age loss represents the loss value during this training process
  • N represents the number of sample images
  • predict n represents the predicted age value of the nth sample image
  • gt n represents the age value of the age label of the nth sample image
  • i traverses from 0 to N-1
  • j traverses from 0 to N-1
  • i and j are not equal.
  • the network parameter value of the first neural network can be adjusted according to the calculated loss value.
  • the first neural network trained by this method the supervised data corresponding to the first neural network, in addition to predicting the difference between the age value and the age of the age label, the difference between the predicted age value of the sample image in the sample image collection and the age label
  • the difference between the age values of is also used as the supervision data, and the first neural network trained from this has higher accuracy in age recognition.
  • the sample image set includes a plurality of initial sample images and an enhanced sample image corresponding to each sample image, wherein the enhanced sample image is an image after information transformation processing is performed on the initial sample image.
  • Step 301 Generate a three-dimensional face model corresponding to the face area image in the initial sample image.
  • Step 302 Rotate the three-dimensional face model at different angles to obtain the first enhanced sample image at different angles; and, the value of each pixel in the initial sample image on the RGB channel is different from Add the light influence values of to obtain the second enhanced sample image under different light influence values.
  • first enhanced sample image and the second enhanced sample image are both enhanced sample images corresponding to the initial sample image.
  • the value of each pixel in the initial sample image on the RGB three-channel includes three values.
  • the initial sample image can be The values of all pixels on the three channels are added to N, where N is the light influence value, and its value is a three-dimensional vector. In one possible case, N can follow a Gaussian distribution.
  • the difference between the age value of the age label of the sample image in the sample image set may be based on the predicted age value of each sample image and the age label of the sample image.
  • the loss value during the training of the first neural network can be calculated according to the following formula (2):
  • Age loss represents the loss value during this training process
  • N represents the number of sample images
  • precct n represents the predicted age value of the nth sample image
  • gt n represents the age value of the age label of the nth sample image
  • predict_aug n represents the predicted age value of the enhanced sample image corresponding to the nth sample image.
  • the enhanced sample image is the sample image under the influence of the angle and light added to the initial sample image.
  • the neural network trained by the initial sample image and the enhanced sample image can avoid the angle and The influence of light on the accuracy of neural network recognition improves the accuracy of age recognition.
  • Case 3 There are multiple sample image sets. Each sample image set includes an initial sample image and an enhanced sample image corresponding to each initial sample image. Multiple initial sample images in the same sample image set pass through the same image acquisition device Collected.
  • the difference between the age value of the age label of the sample image in the sample image set may be based on the predicted age value corresponding to each sample image and the age label of the sample image.
  • the difference between the age values of, the difference between the predicted age values of any two sample images in the same sample image set, the difference between the age values of the age labels of any two sample images, and the predicted age value of the initial sample image The difference between the predicted age value of the enhanced sample image corresponding to the initial sample image is calculated, the loss value in this training process is calculated, and the network parameter value of the first neural network is adjusted based on the calculated loss value.
  • the predicted age value of any two sample images in the same sample image set may be based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image
  • Calculate the first loss value based on the difference between the age values of the age labels of any two sample images and the difference between the age values of the age labels of the two sample images; The difference between the predicted age values is calculated, and the second loss value is calculated; and then the sum of the first loss value and the second loss value is used as the loss value in this training process.
  • the first loss value in the training process of the first neural network can be calculated by the following formula (3):
  • Age loss1 represents the first loss value
  • M represents the number of sample image collections
  • N represents the number of sample images contained in each sample image collection
  • predict mn represents the nth sample image in the mth sample image collection
  • the predicted age value of gt mn represents the age value of the age label of the nth sample image in the mth sample image set.
  • the second loss value in the training process of the first neural network is calculated by the following formula (4):
  • Age loss2 represents the second loss value
  • predict mn represents the predicted age value of the n-th sample image in the m-th sample image set
  • predict_aug mn represents the prediction of the enhanced sample image corresponding to n sample images in the m-th sample image set Age value.
  • each sample image set can also be greater than N, but in the training process of the first neural network, N sample images are randomly selected from each sample image set .
  • the network structure of the first neural network may include a feature extraction layer and an age information extraction layer. After the face image is input to the feature extraction layer, a feature map corresponding to the face image can be obtained, and then Then input the feature map to the age information extraction layer, and output the predicted age value of the face image.
  • the initial sample images in the same sample image set are collected by the same image acquisition device. Therefore, when training the neural network through the sample images, it is possible to avoid the influence of errors caused by the difference in image acquisition devices; at the same time, use The initial sample image and the enhanced sample image train the neural network, which can avoid the influence of errors caused by light and angle, so the trained neural network has higher accuracy.
  • the attribute information includes gender information
  • the method described in Figure 4 can be referred to, including the following steps:
  • Step 401 Input the face image into a second neural network for gender information extraction, to obtain a two-dimensional feature vector output by the second neural network, and elements in the first dimension in the two-dimensional feature vector The value is used to characterize the probability that the face image is male, and the element value in the second dimension is used to characterize the probability that the face image is female.
  • Step 402 Input the two-dimensional feature vector into a classifier, and determine a gender with a probability greater than a set threshold as the gender of the face image.
  • the set threshold can be determined according to the image acquisition device that acquires the face image and the acquisition environment.
  • the recognition accuracy rate of the set threshold may be different for different image acquisition equipment and the collected face images in the acquisition environment. Therefore, in order to avoid the image acquisition equipment and the acquisition environment
  • the embodiment of the present disclosure provides a method for adaptively determining the set threshold.
  • the method for determining the threshold value described in FIG. 5 can be referred to, including the following steps:
  • Step 501 Acquire a plurality of sample images collected in the cabin by the image collection device that collects the face image, and a gender label corresponding to each of the sample images.
  • the set threshold determined by these sample images can meet the requirements of the current environment.
  • Step 502 Input the multiple sample images into the second neural network to obtain the predicted gender corresponding to each of the sample images under each of the multiple candidate thresholds.
  • the network structure of the second neural network may include a feature extraction layer and a gender information extraction layer.
  • the sample image can be input to the feature extraction layer first.
  • Obtain the feature map corresponding to the sample image and then input the feature map to the gender information extraction layer, output the two-dimensional feature vector, and then use the classifier to determine the predicted gender corresponding to the sample image.
  • a plurality of candidate thresholds may be selected from a preset value range according to a setting step.
  • the preset value range can be 0 to 1
  • the set step size can be, for example, 0.001.
  • the candidate threshold can be determined by the following formula (5):
  • thrd represents the candidate threshold
  • k takes every positive integer from 0 to 1000.
  • Step 503 For each candidate threshold, determine the prediction accuracy rate under the candidate threshold according to the predicted gender and gender label corresponding to each sample image under the candidate threshold.
  • the following method can be used to determine:
  • TP represents the number of gender labels that are male and the predicted gender is male under the thrd threshold
  • TN represents the number of gender labels that are male and the predicted gender is female under the thrd threshold
  • FP represents the gender label is female and predicted under the thrd threshold
  • the gender is the number of males
  • FN represents the number of females whose gender label is female and the predicted gender is female under the thrd threshold.
  • the accuracy rate can be calculated by the following formula (6):
  • Step 504 Determine the candidate threshold corresponding to the maximum prediction accuracy rate as the set threshold.
  • the collected sample images are collected in the cabin by the image acquisition device that collects the face image, which can ensure the influence of the acquisition device and the acquisition environment on the set threshold, and the setting is determined.
  • the candidate threshold with the highest prediction accuracy is used as the set threshold, so that the set threshold can be adjusted adaptively, thereby improving the accuracy of gender recognition.
  • the method shown in Figure 6 can be used to determine the eye open and closed information of the cabin personnel, including the following steps:
  • Step 601 Perform feature extraction on the face image to obtain a multi-dimensional feature vector.
  • the element value in each dimension in the multi-dimensional feature vector is used to characterize that the eyes in the face image are in a state corresponding to the dimension. Probability.
  • the face image can be input to a pre-trained fourth neural network for detecting open and closed eyes information.
  • the fourth neural network can include a feature extraction layer and open and closed eye information extraction. Layer, after the face image is input to the fourth neural network, the face image can be input to the feature extraction layer, the feature map corresponding to the face image is output, and then the feature map corresponding to the face image is input to the open and closed
  • the eye information extraction layer, the output gets multi-dimensional feature vectors.
  • the state of the eyes may include at least one of the following states: invisible to human eyes, visible to human eyes and open eyes, and visible to human eyes and closed eyes.
  • the left eye state may be any of the above states
  • the right eye state may also be any of the above states.
  • the first The output of the three neural network can be a nine-dimensional feature vector, and the element value in each dimension of the nine-dimensional feature vector represents the probability that the two eyes in the face image are in the state of the two eyes corresponding to the dimension.
  • Step 602 Determine the state corresponding to the dimension whose probability is greater than the preset value as the eye open and closed information of the person in the cabin.
  • the face image can be input to the fifth neural network used for race information extraction.
  • the fifth neural network includes a feature extraction layer and a race information extraction layer. After the five neural network, you can first input the face image to the feature extraction layer to obtain the feature map corresponding to the face image, and then input the feature map to the ethnic information extraction layer to obtain the three-dimensional feature vector. Different dimensions of the three-dimensional feature vector The element values above are respectively used to represent the probability that the face image is the race corresponding to the dimension, and the race includes "yellow race", "white race", and "black race”.
  • FIG. 7 is a method for determining attribute information provided by an embodiment of the present disclosure, which includes the following steps:
  • Step 701 Input the face image to the feature extraction layer in the second neural network for attribute recognition to obtain a feature map corresponding to the face image.
  • the feature extraction layer is used to extract features of the input face features.
  • the feature extraction layer can use the inception network, the lightweight network mobilenet-v2, etc.
  • Step 702 Input the feature map to each attribute information extraction layer of the information extraction neural network to obtain attribute information output by each attribute information extraction layer, wherein different attribute information extraction layers are used to detect different attribute information.
  • each attribute information extraction layer in the information extraction neural network includes a first fully connected layer and a second fully connected layer, and the feature map is input to the attribute information extraction layer of the information extraction neural network.
  • the feature map is input to the attribute information extraction layer of the information extraction neural network.
  • M is a preset positive integer corresponding to any attribute information
  • the M-dimensional vector is input to
  • the second fully connected layer of the attribute information extraction layer obtains the N-dimensional vector corresponding to the feature map, where N is a positive integer, and M is greater than N, and N is the number of attribute information corresponding to the attribute information extraction layer.
  • the attribute information corresponding to the N-dimensional vector is determined.
  • N is the number of values corresponding to the attribute information extraction layer. It can be exemplarily understood that if the attribute information extracted by the attribute information extraction layer is gender, the value of the attribute information includes “male” and “female”. "Two, then the value of N corresponding to the attribute information extraction layer is 2.
  • the following will take the attribute information including age information, gender information, and race information as an example to illustrate the structure of the information extraction neural network.
  • the network structure of the information extraction neural network can be as shown in FIG. 8.
  • the feature map corresponding to the face image can be obtained, and then the feature map is input into the age information extraction layer, gender information extraction layer, race information extraction layer, and open and closed eyes information extraction layer. .
  • the age information extraction layer includes the first fully connected layer and the second fully connected layer. After the feature map is input to the first fully connected layer, the K 1 dimensional feature vector can be obtained, and then the K 1 dimensional feature vector can be input to The second fully connected layer obtains a one-dimensional vector output, and the element value in the one-dimensional vector is the value of the predicted age. In addition, considering that the value of the age should be an integer, the element value in the one-dimensional vector can be rounded to obtain the predicted age information, where K 1 is greater than 1.
  • the gender information extraction layer includes the first fully connected layer and the second fully connected layer. After the feature map is input to the first fully connected layer, the K 2 dimensional feature vector can be obtained, and then the K 2 dimensional feature vector can be input to The second fully connected layer obtains a two-dimensional vector output. The element values in the two-dimensional vector represent the probability that the user is male and the probability of a female in the input face image. Finally, the output of the second fully connected layer can be After a two-classification network, determine the gender information of the input face image predicted by the gender information extraction layer according to the two-classification result, where K 2 is greater than 2.
  • the K 3 dimensional feature vector is obtained by inputting the feature map, and then the K 3 dimensional feature vector is input to the second fully connected layer to obtain a three-dimensional vector output.
  • the element value in the three-dimensional vector represents For the probability that the user is "yellow”, "black” and “white” in the input face image, finally, the output of the second fully connected layer can be connected to a classification network, according to The classification result of the classification network determines the race information of the input face image predicted by the race information extraction layer, where K 3 is greater than 3.
  • the open and closed eyes information in the state information can also be extracted using the above-mentioned information extraction neural network.
  • the extracted state is the state of the two eyes of the crew in the cabin, where the state of the eyes includes " Invisible to the human eye” (the invisible human eye means that the eye cannot be detected in the picture, for example, the person in the cabin wears sunglasses), “the human eye is visible and the eyes are open”, and the “human eye is visible and the eyes are closed” are three types, so For both eyes, there are 9 optional states. Therefore, for the open and closed eye information extraction layer, the output of the first fully connected layer is a K 4 -dimensional feature vector, and the output of the second fully connected layer is a nine-dimensional feature vector.
  • the value of each element in the vector is used to represent The eye state of the person in the cabin in the face image is the probability of the state represented by the element value.
  • the output of the second fully connected layer is connected to a classification network, and the open and closed eye information extraction layer can be determined according to the classification result of the classification network The predicted eye opening and closing information of the input face image, where K 4 is greater than 9.
  • each attribute information extraction layer is trained together.
  • the loss value of each attribute information extraction layer is calculated separately, and then according to The loss value of each attribute information extraction layer adjusts the network parameter value of the corresponding attribute information extraction layer, and the loss value of each attribute information extraction layer is summed as the total loss value, and then the feature extraction layer is adjusted according to the total loss value
  • the training process of the information extraction neural network will not be introduced here.
  • the method as shown in FIG. 9 can be used to include the following steps:
  • Step 901 According to the face image, recognize the action of each of the at least two organs on the face represented by the face image.
  • Step 902 Determine the emotion information of the cabin personnel based on the recognized movements of each of the organs and the preset mapping relationship between facial movements and emotion information.
  • the face image can be recognized through a third neural network, which includes a backbone network and at least two classification branch networks , Each classification branch network is used to identify an action of an organ on the face.
  • the backbone network can be used to extract the feature of the face image to obtain the feature map of the face image, and then use each category separately
  • the branch network performs action recognition according to the feature map of the face image, and obtains the occurrence probability of the action that each classification branch network can recognize, and then determines the action with the occurrence probability greater than the preset probability as the organ on the face represented by the face image action.
  • the face image before the face image is input to the third neural network, the face image can also be preprocessed to enhance the key information in the face image, and then the preprocessed person The face image is input to the third neural network.
  • the preprocessing of the face image may be to first determine the position information of the key points in the face image, and then perform affine transformation on the face image based on the position information of the key points to obtain the corresponding face image After the corrected image, the normalized face image is then normalized to obtain the processed face image.
  • the normalization processing on the face image after the correction includes: calculating the mean value of the pixel value of each pixel contained in the face image and the standard deviation of the pixel value of each pixel contained in the face image; Based on the average value of the pixel value and the standard deviation of the pixel value, the pixel value of each pixel in the face image is normalized.
  • Z represents the pixel value after the pixel is normalized
  • X represents the pixel value before the pixel is normalized
  • represents the average value of the pixel value
  • represents the standard deviation of the pixel value
  • the face in the face image can be processed to be normalized, which is more accurate in determining the facial expression.
  • the action detected by the action unit includes at least one of the following:
  • the emotional information of the people in the cabin can be determined.
  • the emotional information of the cabin personnel is calm. If it is detected that the facial movements of the cabin personnel are staring and opening the mouth, it can be determined that the emotional information of the cabin personnel is surprise.
  • the facial image can be used to recognize the movements of the organs. Compared with the direct recognition of facial expressions and gestures, the accuracy can be improved.
  • adjusting the environment settings in the cabin it may include at least one of the following types of adjustments:
  • Adjust the music type adjust the temperature; adjust the light type; adjust the smell.
  • the environment settings in the cabin according to the attribute information and emotional information of the cabin personnel when adjusting the environment settings in the cabin according to the attribute information and emotional information of the cabin personnel, if there is only one person in the cabin, it can be directly based on the attribute information and emotions of the cabin personnel Information, find the corresponding adjustment information from the preset mapping relationship, and then adjust the environment settings in the cabin according to the adjustment information, wherein the mapping relationship is used to indicate the mapping relationship between attribute information and emotion information and adjustment information .
  • the type of music played can be adjusted according to "sadness".
  • the value of each attribute information is also limited, and the value of the state information is also limited. Therefore, the value of each attribute information can be preset The adjustment information corresponding to the value of the emotion information is then searched for the corresponding adjustment information according to the detected attribute information and emotion information of the people in the cabin.
  • the environment settings in the cabin can be adjusted in real time according to the changes in the emotional information of the cabin personnel at any time.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the execution order of each step should be based on its function and possible inherent Logic is determined.
  • the embodiment of the present disclosure also provides an adjustment device for the cabin environment corresponding to the method for adjusting the cabin environment.
  • the principle of the device in the embodiment of the disclosure to solve the problem is the same as the above-mentioned cabin environment in the embodiment of the disclosure.
  • the adjustment method is similar, so the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.
  • FIG. 10 it is a schematic structural diagram of a device for adjusting an in-cabin environment provided by an embodiment of the present disclosure.
  • the device includes: an acquisition module 1001, a determination module 1002, an adjustment module 1003, and a training module 1004; wherein,
  • the obtaining module 1001 is configured to obtain face images of persons in the cabin;
  • the determining module 1002 is configured to determine the attribute information and status information of the person in the cabin based on the face image;
  • the adjustment module 1003 is configured to adjust the cabin environment based on the attribute information and status information of the cabin personnel.
  • the attribute information includes age information, and the age information is obtained through identification of the first neural network;
  • the device further includes a training module 1004, which is configured to obtain the first neural network according to the following method: perform age prediction on the sample images in the sample image set through the first neural network to be trained to obtain The predicted age value corresponding to the sample image; based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, and the predicted age value of the sample image in the sample image set The difference and the difference between the age values of the age labels of the sample images in the sample image set are adjusted to adjust the network parameter values of the first neural network.
  • the training module 1004 is further configured to: based on the predicted age value corresponding to each sample image and the age label of the sample image Adjusting the network parameter value of the first neural network by adjusting the difference between the age values, the difference between the predicted age values of any two sample images in the same sample image set, and the difference between the age values of the age labels of the any two sample images.
  • the sample image set includes a plurality of initial sample images, and an enhanced sample image corresponding to each of the initial sample images, and the enhanced sample image is information about the initial sample image.
  • Transform the processed image; the training module 1004 is further configured to: based on the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, and the initial sample image The difference between the predicted age value and the predicted age value of the enhanced sample image corresponding to the initial sample image is adjusted, and the network parameter value of the first neural network is adjusted; wherein the sample image is the initial sample image or the enhanced sample image.
  • each sample image set includes multiple initial sample images, and an enhanced sample image corresponding to each initial sample image.
  • the sample image is an image after information transformation processing is performed on the initial sample image, and multiple initial sample images in the same sample image set are acquired by the same image acquisition device; the training module 1004 is further configured to: 1.
  • the difference between the predicted age value corresponding to the sample image and the age value of the age label of the sample image, the difference between the predicted age value of any two sample images in the same sample image set, and the age of any two sample images calculate the loss value in this training process, and based on the calculated The loss value adjusts the network parameter value of the first neural network; wherein the sample image is an initial sample image or an enhanced sample image.
  • the training module 1004 is further configured to: according to the difference between the predicted age value corresponding to each sample image and the age value of the age label of the sample image, the same sample image set Calculate the first loss value based on the difference between the predicted age values of any two sample images and the age values of the age labels of the any two sample images; and, according to the predicted age value of the initial sample image and the The difference between the predicted age value of the enhanced sample image corresponding to the initial sample image is calculated, and the second loss value is calculated; the sum of the first loss value and the second loss value is used as the loss value in this training process.
  • the training module 1004 is further configured to determine the enhanced sample image corresponding to the initial sample image according to the following method: generating a three-dimensional face model corresponding to the face region image in the initial sample image; Rotate the three-dimensional face model at different angles to obtain the first enhanced sample image at different angles; and, the value of each pixel in the initial sample image on the RGB channel is affected by different light The values are added to obtain a second enhanced sample image under different light influence values; the enhanced sample image is the first enhanced sample image or the second enhanced sample image.
  • the attribute information includes gender information
  • the determining module 1002 is further configured to determine the gender information of the cabin personnel according to the following method: inputting the face image for performing In the second neural network for extracting gender information, the two-dimensional feature vector output by the second neural network is obtained, and the element value in the first dimension in the two-dimensional feature vector is used to represent the probability that the face image is male , The element value in the second dimension is used to characterize the probability that the face image is female; the two-dimensional feature vector is input into the classifier, and the gender with the probability greater than the set threshold is determined as the face image gender.
  • the determining module 1002 is further configured to determine the set threshold value according to the following method: acquiring multiple samples collected in the cabin by the image acquisition device that collects the face image Image, and the gender label corresponding to each of the sample images; input the multiple sample images into the second neural network, and obtain each sample image in a plurality of candidate thresholds for each candidate The predicted gender corresponding to the threshold; for each candidate threshold, according to the predicted gender and gender label corresponding to each sample image under the candidate threshold, the prediction accuracy rate under the candidate threshold is determined; The candidate threshold corresponding to the prediction accuracy rate is determined as the set threshold.
  • the determining module 1002 is further configured to determine the multiple candidate thresholds according to the following method: selecting the multiple candidate thresholds from a preset value range according to a set step size .
  • the status information includes open and closed eyes information
  • the determining module 1002 is configured to determine the open and closed eyes information of the cabin personnel according to the following method: Feature extraction to obtain a multi-dimensional feature vector.
  • the element value in each dimension of the multi-dimensional feature vector is used to represent the probability that the eyes in the face image are in the state corresponding to the dimension; the dimension whose probability is greater than the preset value
  • the corresponding state is determined to be the information of the open and closed eyes of the personnel in the cabin.
  • the state of the eye includes at least one of the following states: invisible to the human eye; visible to the human eye and open; and visible to the human eye and closed.
  • the state information includes emotional information
  • the determining module 1002 is further configured to determine the emotional information of the people in the cabin according to the following steps: recognize the human face according to the facial image The image represents the action of each of the at least two organs on the face; based on the recognized action of each of the organs and the preset mapping relationship between facial actions and emotion information, it is determined Emotional information of the people in the cabin.
  • the actions of the organs on the human face include at least two of the following actions: frowning; staring; the corners of the mouth are raised; the upper lip is raised; the corners of the mouth are lowered; and the mouth is opened.
  • the action of recognizing each of the at least two organs on the face represented by the face image according to the face image is executed by a third neural network, and
  • the third neural network includes a backbone network and at least two classification branch networks, each of the classification branch networks is used to recognize an action of an organ on a human face;
  • the determining module 1002 is further configured to: use a backbone network to perform feature extraction on the face image to obtain a feature map of the face image; use each of the classification branch networks to perform a feature extraction on the face image.
  • the feature map performs action recognition to obtain the occurrence probability of the action that can be recognized by each of the classification branch networks; the action with the occurrence probability greater than the preset probability is determined as the action of the organ on the face represented by the face image.
  • the environmental settings in the adjustment cabin include at least one of the following types of adjustments: adjusting the music type; adjusting the temperature; adjusting the light type; adjusting the smell.
  • an embodiment of the present application also provides an electronic device.
  • a schematic structural diagram of an electronic device 1100 provided in an embodiment of this application includes a processor 1101, a memory 1102 and a bus 1103.
  • the memory 1102 is configured to store execution instructions, including a memory 11021 and an external memory 11022; here, the memory 11021 is also called an internal memory, and is configured to temporarily store arithmetic data in the processor 1101 and exchange with external memory 11022 such as a hard disk.
  • the processor 1101 exchanges data with the external memory 11022 through the memory 11021.
  • the electronic device 1100 When the electronic device 1100 is running, the processor 1101 and the memory 1102 communicate through the bus 1103, so that the processor 1101 executes the method described in the above method embodiment. Steps of the method of adjusting the cabin environment.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is run by a processor, the method for adjusting the in-cabin environment described in the above method embodiment is executed. step.
  • the storage medium may be a volatile or non-volatile computer readable storage medium.
  • the computer program product of the method for adjusting the in-cabin environment includes a computer-readable storage medium storing program code.
  • the instructions included in the program code can be configured to execute the method described in the foregoing method embodiment.
  • the embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments.
  • the computer program product can be implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a software development kit (SDK) and so on.
  • SDK software development kit
  • the working process of the system and device described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation.
  • multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solutions of the embodiments of the present disclosure essentially or contribute to the prior art or parts of the technical solutions can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the facial image of the cabin personnel is obtained; the attribute information and status information of the cabin personnel are determined based on the facial image; the cabin interior is adjusted based on the attribute information and status information of the cabin personnel environment.
  • the determined attribute information and status information of the cabin personnel can represent the current status of the cabin personnel.
  • the environment settings in the cabin can be adjusted according to the current status of the cabin personnel. Automatically dynamically adjust the cabin environment settings.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Automation & Control Theory (AREA)
  • Data Mining & Analysis (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Ophthalmology & Optometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种舱内环境的调整方法及装置,该方法包括:获取舱内人员的人脸图像;基于人脸图像,确定舱内人员的属性信息和状态信息;基于舱内人员的属性信息和状态信息,调整舱内环境;该装置包括:获取模块(1001)、确定模块(1002)和调整模块(1003)。一种电子设备(1100)、计算机可读存储介质及计算机程序,能够执行舱内环境的调整方法。

Description

一种舱内环境的调整方法及装置
相关申请的交叉引用
本公开基于申请号为202010237887.1、申请日为2020年03月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机技术领域,涉及一种舱内环境的调整方法及装置。
背景技术
相关技术中,在设置舱内的环境的过程中,例如需要调整舱内温度、调整舱内播放的音乐时,一般是通过用户去手动调整,随着人脸识别技术的发展,可以预先为每一个用户设置对应的环境信息,在用户上车之后,通过人脸识别技术识别用户身份,然后在识别出用户的身份之后,获取与该身份对应的环境信息,然后进行舱内环境的设置。
发明内容
本公开实施例至少提供一种舱内环境的调整方法及装置。
第一方面,本公开实施例提供了一种舱内舱内环境的调整方法,包括:
获取舱内人员的人脸图像;
基于人脸图像,确定所述舱内人员的属性信息和状态信息;
基于所述舱内人员的属性信息和状态信息,调整舱内环境。
在一种可能的实现方式中,所述属性信息包括年龄信息,所述年龄信息通过第一神经网络识别得到;根据以下方法得到所述第一神经网络:通过待训练的第一神经网络对样本图像集合中的样本图像进行年龄预测,得到所述样本图像对应的预测年龄值;基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,所述样本图像集合为多个;所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值,包括:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像;所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值,包括:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,调整第一神经网络的网络参数值;其中,所述样本图像为初始样本图像或者增强样本图像。
在一种可能的实现方式中,所述样本图像集合为多个,每一所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像,同一样本图像集合中的多个初始样本图像为通过同一图像采集设备采集得到;所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之 差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值,包括:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,并基于计算出的损失值,调整第一神经网络的网络参数值;其中,所述样本图像为初始样本图像或者增强样本图像。
在一种可能的实现方式中,所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,包括:根据每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,计算第一损失值;以及,根据所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算第二损失值;将所述第一损失值和所述第二损失值之和作为本次训练过程中的损失值。
在一种可能的实现方式中,根据以下方法确定所述初始样本图像对应的增强样本图像:生成所述初始样本图像中人脸区域图像对应的三维人脸模型;对所述三维人脸模型进行不同角度的旋转,得到不同角度下的第一增强样本图像;以及,将所述初始样本图像中每一像素点在RGB通道上的取值,与不同的光线影响值相加,得到在不同的光线影响值下的第二增强样本图像;所述增强样本图像为所述第一增强样本图像或所述第二增强样本图像。
在一种可能的实现方式中,所述属性信息包括性别信息,根据以下方法确定所述舱内人员的性别信息:将所述人脸图像输入用于进行性别信息提取的第二神经网络中,得到所述第二神经网络输出的二维特征向量,所述二维特征向量中第一维度上的元素值用于表征所述人脸图像为男性的概率,第二维度上的元素值用于表征所述人脸图像为女性的概率;将所述二维特征向量输入至分类器中,将概率大于设定阈值的性别确定为所述人脸图像的性别。
在一种可能的实现方式中,根据以下方法确定所述设定阈值:获取采集所述人脸图像的图像采集设备在所述舱内采集的多张样本图像,以及每一所述样本图像对应的性别标签;将所述多张样本图像输入至所述第二神经网络中,得到每一所述样本图像分别在多个候选阈值中每一所述候选阈值下对应的预测性别;针对每一所述候选阈值,根据所述候选阈值下的每一所述样本图像对应的预测性别和性别标签,确定所述候选阈值下的预测准确率;将最大的预测准确率对应的候选阈值确定为所述设定阈值。
在一种可能的实现方式中,根据以下方法确定所述多个候选阈值:按照设定步长,从预设取值范围内选取所述多个候选阈值。
在一种可能的实现方式中,所述状态信息包括睁闭眼信息,根据以下方法确定所述舱内人员的睁闭眼信息:对所述人脸图像进行特征提取,得到多维特征向量,所述多维特征向量中每一维度上的元素值用于表征所述人脸图像中的眼睛处于所述维度对应的状态的概率;将概率大于预设值的维度对应的状态,确定为所述舱内人员的睁闭眼信息。
在一种可能的实现方式中,眼睛的状态包括以下状态中的至少之一:人眼不可见;人眼可见且睁眼;人眼可见且闭眼。
在一种可能的实现方式中,所述状态信息包括情绪信息,根据以下步骤确定舱内人员的情绪信息:根据所述人脸图像,识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作;基于识别到的所述每一所述器官的动作、以及预先设置的面部动作与情绪信息之间的映射关系,确定所述舱内人员的情绪信息。
在一种可能的实现方式中,人脸上的器官的动作包括以下动作中的至少两种:皱眉;瞪眼;嘴角上扬;上唇上抬;嘴角向下;张嘴。
在一种可能的实现方式中,根据所述人脸图像识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作是由第三神经网络执行的,所述第三神经网络包括主干网络和至少两个分类分支网络,每一所述分类分支网络用于识别人脸上的一个器官的一种动作;根据所述人脸图像识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作,包括:利用主干网络对所述人脸图像进行特征提取,得到所述人脸图像的特征图;分别利用每一所述分类分支网络对所述人脸图像的特征图进行动作识别,得到每一所述分类分支网络能够识别的动作的发生概率;将发生概率大于预设概率的动作确定为所述人脸图像代表的人脸上的器官的动作。
在一种可能的实现方式中,所述调整舱内的环境设置,包括以下类型的调整中的至少之一:调整音乐类型;调整温度;调整灯光类型;调整气味。
第二方面,本公开实施例还提供一种舱内环境的调整装置,包括:
获取模块,被配置为获取舱内人员的人脸图像;
确定模块,被配置为基于人脸图像,确定所述舱内人员的属性信息和状态信息;
调整模块,被配置为基于所述舱内人员的属性信息和状态信息,调整舱内环境。
在一种可能的实现方式中,所述属性信息包括年龄信息,所述年龄信息通过第一神经网络识别得到;
所述装置还包括训练模块,所述训练模块,被配置为根据以下方法得到所述第一神经网络:通过待训练的第一神经网络对样本图像集合中的样本图像进行年龄预测,得到所述样本图像对应的预测年龄值;基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,所述样本图像集合为多个,所述训练模块,被进一步配置为:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像;所述训练模块,被进一步配置为:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,调整第一神经网络的网络参数值;其中,所述样本图像为初始样本图像或者增强样本图像。
在一种可能的实现方式中,所述样本图像集合为多个,每一所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像,同一样本图像集合中的多个初始样本图像为通过同一图像采集设备采集得到;所述训练模块,被进一步配置为:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,并基于计算出的损失值,调整第一神经网络的网络参数值;其中,所述样本图像为初始样本图像或者增强样本图像。
在一种可能的实现方式中,所述训练模块,被进一步配置为:根据每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,计 算第一损失值;以及,根据所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算第二损失值;将所述第一损失值和所述第二损失值之和作为本次训练过程中的损失值。
在一种可能的实现方式中,所述训练模块,被进一步配置为根据以下方法确定初始样本图像对应的增强样本图像:生成所述初始样本图像中人脸区域图像对应的三维人脸模型;对所述三维人脸模型进行不同角度的旋转,得到不同角度下的第一增强样本图像;以及,将所述初始样本图像中每一像素点在RGB通道上的取值,与不同的光线影响值相加,得到在不同的光线影响值下的第二增强样本图像;所述增强样本图像为所述第一增强样本图像或所述第二增强样本图像。
在一种可能的实现方式中,所述属性信息包括性别信息,所述确定模块,被进一步配置为根据以下方法确定所述舱内人员的性别信息:将所述人脸图像输入用于进行性别信息提取的第二神经网络中,得到所述第二神经网络输出的二维特征向量,所述二维特征向量中第一维度上的元素值用于表征所述人脸图像为男性的概率,第二维度上的元素值用于表征所述人脸图像为女性的概率;将所述二维特征向量输入至分类器中,将概率大于设定阈值的性别确定为所述人脸图像的性别。
在一种可能的实现方式中,所述确定模块,被进一步配置为根据以下方法确定所述设定阈值:获取采集所述人脸图像的图像采集设备在所述舱内采集的多张样本图像,以及每一所述样本图像对应的性别标签;将所述多张样本图像输入至所述第二神经网络中,得到每一所述样本图像分别在多个候选阈值中每一所述候选阈值下对应的预测性别;针对每一所述候选阈值,根据所述候选阈值下的每一所述样本图像对应的预测性别和性别标签,确定所述候选阈值下的预测准确率;将最大的预测准确率对应的候选阈值确定为所述设定阈值。
在一种可能的实现方式中,所述确定模块,被进一步配置为根据以下方法确定所述多个候选阈值:按照设定步长,从预设取值范围内选取所述多个候选阈值。
在一种可能的实现方式中,所述状态信息包括睁闭眼信息,所述确定模块,被进一步配置为根据以下方法确定所述舱内人员的睁闭眼信息:对所述人脸图像进行特征提取,得到多维特征向量,所述多维特征向量中每一维度上的元素值用于表征所述人脸图像中的眼睛处于所述维度对应的状态的概率;将概率大于预设值的维度对应的状态,确定为所述舱内人员的睁闭眼信息。
在一种可能的实现方式中,眼睛的状态包括以下状态中的至少之一:人眼不可见;人眼可见且睁眼;人眼可见且闭眼。
在一种可能的实现方式中,所述状态信息包括情绪信息,所述确定模块,被进一步配置为根据以下步骤确定舱内人员的情绪信息:根据所述人脸图像,识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作;基于识别到的所述每一所述器官的动作、以及预先设置的面部动作与情绪信息之间的映射关系,确定所述舱内人员的情绪信息。
在一种可能的实现方式中,人脸上的器官的动作包括以下动作中的至少两种:皱眉;瞪眼;嘴角上扬;上唇上抬;嘴角向下;张嘴。
在一种可能的实现方式中,根据所述人脸图像识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作是由第三神经网络执行的,所述第三神经网络包括主干网络和至少两个分类分支网络,每一所述分类分支网络用于识别人脸上的一个器官的一种动作;
所述确定模块,被进一步配置为:利用主干网络对所述人脸图像进行特征提取,得到所述人脸图像的特征图;分别利用每一所述分类分支网络对所述人脸图像的特征图进行动作识别,得到每一所述分类分支网络能够识别的动作的发生概率;将发生概率大于预设概率的动作确定为所述人脸图像代表的人脸上的器官的动作。
在一种可能的实现方式中,所述调整舱内的环境设置,包括以下类型的调整中的至少一种:调整音乐类型;调整温度;调整灯光类型;调整气味。
第三方面,本公开实施例还提供一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实现方式中的步骤。
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实现方式中的步骤。
第五方面,本公开实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行如上述第一方面及其任意一种可能实现的方式的方法。
关于上述舱内环境的调整装置、电子设备、及计算机可读存储介质的效果描述参见上述舱内环境的调整方法的说明,这里不再赘述。
为使本公开实施例的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开实施例的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种舱内环境的调整方法的流程示意图;
图2示出了本公开实施例所提供的一种第一神经网络训练方法的流程示意图;
图3示出了本公开实施例所提供的一种增强样本图像确定方法的流程示意图;
图4示出了本公开实施例所提供的一种舱内人员性别信息确定方法的流程示意图;
图5示出了本公开实施例所提供的一种设定阈值确定方法的流程示意图;
图6示出了本公开实施例所提供的一种舱内人员睁闭眼信息确定方法的流程示意图;
图7示出了本公开实施例所提供的一种属性信息确定方法的流程示意图;
图8示出了本公开实施例所提供的一种信息提取神经网络的网络结构示意图;
图9示出了本公开实施例所提供的一种舱内人员情绪信息确定方法的流程示意图;
图10示出了本公开实施例所提供的一种舱内环境的调整装置的架构示意图;
图11示出了本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
相关技术中,在对车舱内的环境设置进行调整的过程中,一种是通过手动去进行调整,另一种是预先设置好每一用户对应的环境设置信息,然后识别舱内的乘客的身份信息,再 基于识别出的身份信息,按照该身份信息对应的环境设置信息,去调整环境设置;若舱内的乘客并未预先设置对应的环境设置信息,或者舱内乘客并不想按照预先设置好的环境设置信息进行舱内环境的设置,这就仍需乘客去手动调整舱内环境设置。
基于此,本公开实施例提供了一种舱内环境的调整方法,可以实时的获取舱内人员的人脸图像,并根据人脸图像,确定舱内人员的属性信息和情绪信息,然后基于舱内人员的属性信息和情绪信息,调整舱内的环境设置。通过这种方法,由于人脸图像是实时获取的,因此所确定出的舱内人员的属性信息和情绪信息就可以代表舱内人员当前的状态,根据舱内人员当前的状态调整舱内的环境设置,可以自动对于舱内环境设置进行动态调整。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该在本公开的保护范围之内。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种舱内环境的调整方法进行详细介绍,本公开实施例所提供的舱内环境的调整方法的执行主体一般为具有一定计算能力的电子设备。所述舱内可以包括但不仅限于汽车车舱、火车车舱、船舱等,对于其他可调整环境的设备,本公开实施例所提供的方法均适用。
参见图1所示,为本公开实施例提供的一种舱内环境的调整方法的流程示意图,包括以下几个步骤:
步骤101、获取舱内人员的人脸图像。
步骤102、基于人脸图像,确定所述舱内人员的属性信息和状态信息。
步骤103、基于所述舱内人员的属性信息和状态信息,调整舱内的环境设置。
通过上述方法,可以实时的获取舱内人员的人脸图像,并根据人脸图像,确定舱内人员的属性信息和情绪信息,然后基于舱内人员的属性信息和情绪信息,调整舱内的环境设置。通过这种方法,由于人脸图像是实时获取的,因此所确定出的舱内人员的属性信息和情绪信息就可以代表舱内人员当前的状态,根据舱内人员当前的状态调整舱内的环境设置,可以自动对于舱内环境设置进行动态调整。
以下是对上述步骤101至步骤103的详细说明。
针对步骤101:
其中,舱内人员的人脸图像可以是包括舱内人员完整人脸的图像。在获取舱内人员的人脸图像的过程中,可以先获取采集的待检测图像,然后基于训练的用于进行人脸检测的人脸检测神经网络,确定待检测图像中的人脸区域信息,最后基于人脸区域信息,确定人脸图像。
待检测图像可以是实时采集、并实时获取的,在一种可能的实现方式中,可以通过安装在舱内的摄像头实时拍摄待检测图像。
待检测图像中的人脸区域信息包括人脸区域对应的检测框的中心点坐标和该检测框的尺寸信息。在基于人脸区域信息,确定人脸图像的过程中,可以先将检测框的尺寸信息按照预设比例进行放大处理,得到放大后的尺寸信息,然后基于中心点坐标信息和放大后的尺寸信息,从待检测图像中截取人脸图像。
通过人脸检测神经网络输出的检测框所对应的区域中可能并未包含所有的舱内人员的人脸信息,因此,可以对检测框进行放大处理,以使得获得的人脸图像中包括所有的人脸信息。
在一种可能的实现方式中,尺寸信息中可以包括检测框的长和检测框的宽,在将检测框的尺寸信息按照预设比例进行放大处理的过程中,可以是分别将检测框的长和检测框的宽按照对应的预设比例进行放大处理,其中,检测框的长所对应的预设比例和检测框的宽 对应的预设比例可以相同。
示例性的,若检测框的长和检测框的宽对应的预设比例均为10%,检测框的长为a,宽为b,则经过放大处理后,检测宽的长为1.1a,检测框的宽为1.1b。
在基于中心点坐标信息和放大后的尺寸信息,从待检测图像中截取人脸图像的过程中,可以以中心点坐标信息对应的点作为对角线的交点,然后分别以放大后的尺寸信息中的长和宽作为检测框的长和宽,确定检测框在待检测图像中的位置,最后以检测框为分割线,从待检测图像中截取图像,截取出的图像即为人脸图像。
人脸检测神经网络在训练的过程中,该人脸检测神经网络的样本数据可以是样本图像,每一样本图像有对应的标签数据,样本图像对应的标签数据包括样本图像中的中心点坐标信息和检测框对应的尺寸信息,在将各样本图像输入至人脸检测神经网络之后,人脸检测神经网络可以得到预测的中心点坐标信息和预测的检测框的尺寸信息,然后基于预测的中心点坐标信息、预测的检测框的尺寸信息、样本图片对应的标签数据,确定本次训练过程中的损失值,并在损失值不满足预设条件的情况下,调整本次训练过程中人脸检测神经网络的网络参数值。
针对步骤102:
舱内人员的属性信息可以包括以下信息中的至少一种:年龄信息;性别信息;种族信息。舱内人员的状态信息可以包括舱内人员的情绪信息和睁闭眼信息,其中,睁闭眼信息可以用来检测舱内人员是否处于睡眠状态,情绪信息可以包括但不限于是以下表情中的任意一种:生气、忧愁、平静、开心、沮丧等。
在一种可能的实现方式中,可以基于人脸图像,对舱内人员进行属性识别,确定舱内人员的属性信息,以及,基于人脸图像,对舱内人员进行表情识别和/或睁闭眼识别,确定舱内人员的状态信息。
在一种可能的实现方式中,在属性信息包括年龄信息的情况下,可以通过第一神经网络识别得到年龄信息。
其中,第一神经网络在训练过程中,可以根据如图2所示的方法,包括以下几个步骤:
步骤201、通过待训练的第一神经网络对样本图像集合中的样本图像进行年龄预测,得到所述样本图像对应的预测年龄值。
步骤202、基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,可以根据样本图像集合的不同,对于上述调整第一神经网络的网络参数的步骤,可以分为以下几种情况:
情况一、样本图像集合为多个。
在这种情况下,在基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值时,可以基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。。
在一种可能的实现方式中,可以通过如下公式(1)计算训练过程中的模型损失值:
Figure PCTCN2020135500-appb-000001
其中,Age loss表示本次训练过程中的损失值,N表示样本图像的个数,predict n表示 第n个样本图像的预测年龄值,gt n表示第n个样本图像的年龄标签的年龄值,i遍历从0到N-1,j遍历从0到N-1,i和j不相等。
在通过上述公式计算出损失值之后,可以根据计算出的损失值去调整第一神经网络的网络参数值。
通过这种方法训练出的第一神经网络,该第一神经网络对应的监督数据除了预测年龄值和年龄标签的年龄之差外,将样本图像集合中样本图像的预测年龄值之差和年龄标签的年龄值之差也作为监督数据,由此训练出的第一神经网络,在进行年龄识别时精度更高。
情况二、样本图像集合中包括多个初始样本图像,以及每一样本图像对应的增强样本图像,其中,增强样本图像为对初始样本图像进行信息变换处理后的图像。
在确定初始样本图像对应的增强样本图像时,可以通过如图3所示的方法,包括以下几个步骤:
步骤301、生成所述初始样本图像中人脸区域图像对应的三维人脸模型。
步骤302、对所述三维人脸模型进行不同角度的旋转,得到不同角度下的第一增强样本图像;以及,将所述初始样本图像中每一像素点在RGB通道上的取值,与不同的光线影响值相加,得到在不同的光线影响值下的第二增强样本图像。
需要说明的是,第一增强样本图像和第二增强样本图像均为初始样本图像对应的增强样本图像。
在确定第二增强样本图像时,初始样本图像中每一像素点在RGB三通道上的取值包括三个值,在确定在光线影响值下的第二增强图像时,可以将初始样本图像中所有像素点在三通道上的取值均与N相加,N为光线影响值,其数值上为三维向量。在一种可能的情况下,N可以遵从高斯分布。
在这种情况下,在基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值时,可以基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,可以根据以下公式(2)计算第一神经网络训练过程中的损失值:
Figure PCTCN2020135500-appb-000002
其中,Age loss表示本次训练过程中的损失值,N表示样本图像的个数,precict n表示第n个样本图像的预测年龄值,gt n表示第n个样本图像的年龄标签的年龄值,predict_aug n表示第n个样本图像对应的增强样本图像的预测年龄值。
上述方法中,增强样本图像为将初始样本图像增加角度和光线的影响下的样本图像,通过初始样本图像和增强样本图像所训练出的神经网络,在进行年龄识别的过程中,可以避免角度和光线对于神经网络识别精度的影响,提高了年龄识别的精度。
情况三、样本图像集合为多个,每一样本图像集合中包括初始样本图像,以及每一初始样本图像对应的增强样本图像,同一样本图像集合中的多个初始样本图像为通过同一图像采集设备采集得到。
在这种情况下,在基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值时,可以基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样 本图像集合中任意两个样本图像的预测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,并基于计算出的损失值,调整第一神经网络的网络参数值。
在一种可能的实现方式中,可以根据每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,计算第一损失值;以及,根据所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算第二损失值;然后将所述第一损失值和所述第二损失值之和作为本次训练过程中的损失值。
在一种可能的实现方式中,可以通过如下公式(3)计算第一神经网络训练过程中的第一损失值:
Figure PCTCN2020135500-appb-000003
Age loss1表示第一损失值,M表示样本图像集合的个数,N表示每一样本图像集合中所包含的样本图像的个数,predict mn表示第m个样本图像集合中的第n个样本图像的预测年龄值,gt mn表示第m个样本图像集合中的第n个样本图像的年龄标签的年龄值。
通过如下公式(4)计算第一神经网络训练过程中的第二损失值:
Figure PCTCN2020135500-appb-000004
Age loss2表示第二损失值,predict mn表示第m个样本图像集合中的第n个样本图像的预测年龄值,predict_aug mn表示第m个样本图像集合中n个样本图像对应的增强样本图像的预测年龄值。
这里,需要说明的是,每一样本图像集合中所包含的样本图像的个数还可以大于N,但是在第一神经网络的训练过程中,从每一样本图像集合中随机抽取N个样本图像。
在一种可能的实现方式中,第一神经网络的网络结构可以包括特征提取层和年龄信息提取层,在将人脸图像输入至特征提取层之后,可以得到人脸图像对应的特征图,然后再将特征图输入至年龄信息提取层,输出得到人脸图像的预测年龄值。
这里,同一个样本图像集合中的初始样本图像是通过同一图像采集设备采集得到的,因此在通过样本图像训练神经网络时,可以避免图像采集设备的不同,所带来的误差影响;同时又利用初始样本图像和增强样本图像训练神经网络,由此又可以避免光线和角度所带来的误差影响,因此训练出的神经网络精度更高。
在属性信息包括性别信息的情况下,在确定舱内人员的性别信息时,可以参照如图4所述的方法,包括以下几个步骤:
步骤401、将所述人脸图像输入用于进行性别信息提取的第二神经网络中,得到所述第二神经网络输出的二维特征向量,所述二维特征向量中第一维度上的元素值用于表征所述人脸图像为男性的概率,第二维度上的元素值用于表征所述人脸图像为女性的概率。
步骤402、将所述二维特征向量输入至分类器中,将概率大于设定阈值的性别确定为所述人脸图像的性别。
其中,设定的阈值可以根据采集人脸图像的图像采集设备和采集环境确定。
其中,由于不同的图像采集设备和采集环境的影响,设定阈值对于不同的图像采集设备和采集环境下的采集的人脸图像的识别准确率可能不同,因此,为避免图像采集设备和采集环境的影响,本公开实施例提供了一种自适应确定设定阈值的方法。
在一种可能的实现方式中,可以参照图5所述的设定阈值的确定方法,包括以下几个 步骤:
步骤501、获取采集所述人脸图像的图像采集设备在所述舱内采集的多张样本图像,以及每一所述样本图像对应的性别标签。
由于样本图像与人脸图像的图像采集设备和采集环境相同,因此,通过这些样本图像所确定的设定阈值可以满足当前环境的需求。
步骤502、将所述多张样本图像输入至所述第二神经网络中,得到每一所述样本图像分别在多个候选阈值中每一所述候选阈值下对应的预测性别。
在一种可能的实现方式中,第二神经网络的网络结构可以包括特征提取层和性别信息提取层,在将样本图像输入至第二神经网络之后,可以先将样本图像输入至特征提取层,得到样本图像对应的特征图,再将特征图输入至性别信息提取层,输出得到二维特征向量,再通过分类器去确定样本图像对应的预测性别。
在一种可能的实现方式中,在确定候选阈值时,可以按照设定步骤,从预设取值范围内选取多个候选阈值。实际应用中,由于第二神经网络所输出的二维向量中不同维度上的值表示的是概率,因此,预设取值范围可以是0到1,设定步长例如可以为0.001,示例性的可以通过如下公式(5)确定候选阈值:
thrd=0+0.001k                                            公式(5);
其中,thrd表示候选阈值,k取遍0至1000中的每一正整数。
步骤503、针对每一所述候选阈值,根据所述候选阈值下的每一所述样本图像对应的预测性别和性别标签,确定所述候选阈值下的预测准确率。
在根据候选阈值下的样本图像的预测性别、以及样本图像的性别标签,确定该候选阈值下的预测准确率时,可以通过如下方法确定:
确定P张样本图像中,以下分类中每一类的取值,如下表1所示:
表1
Figure PCTCN2020135500-appb-000005
其中,TP表示性别标签为男性且在thrd阈值下预测性别为男性的数量,TN表示性别标签为男性且在thrd阈值下预测性别为女性的数量,FP表示性别标签为女性且在thrd阈值下预测性别为男性的数量,FN表示性别标签为女性且在thrd阈值下预测性别为女性的数量。
在确定上表1中每一类的取值之后,可以通过如下公式(6)计算准确率:
Figure PCTCN2020135500-appb-000006
其中,
Figure PCTCN2020135500-appb-000007
步骤504、将最大的预测准确率对应的候选阈值确定为所述设定阈值。
由于在确定设定阈值的过程中,所采集的样本图像为采集人脸图像的图像采集设备在舱内采集的,由此可以保证采集设备和采集环境对于设定阈值的影响,且在确定设定阈值的过程中,是将预测准确率最大的候选阈值作为设定阈值,由此可以做到自适应调节设定阈值,从而提高性别识别的精度。
在状态信息包括睁闭眼信息的情况下,可以根据如图6所示的方法确定舱内人员的睁闭眼信息,包括以下几个步骤:
步骤601、对所述人脸图像进行特征提取,得到多维特征向量,所述多维特征向量中每一维度上的元素值用于表征所述人脸图像中的眼睛处于所述维度对应的状态的概率。
在一种可能的实现方式中,可以将人脸图像输入至预先训练好的用于进行睁闭眼信息 检测的第四神经网络中,第四神经网络可以包括特征提取层和睁闭眼信息提取层,在将人脸图像输入至第四神经网络之后,可以是将人脸图像输入至特征提取层,输出得到人脸图像对应的特征图,然后将人脸图像对应的特征图输入至睁闭眼信息提取层,输出得到多维特征向量。
眼睛的状态可以包括以下状态中的至少之一:人眼不可见、人眼可见且睁眼、人眼可见且闭眼。
在一种可能的实现方式中,左眼状态可能是以上状态中的任意一种,右眼状态也可以是以上状态中的任意一种,则两只眼睛可能的状态有9种,因此,第三神经网络的输出可以为九维特征向量,九维特征向量中每一维度上的元素值表示人脸图像中的两只眼睛处于该维度对应的两只眼睛的状态的概率。
步骤602、将概率大于预设值的维度对应的状态,确定为所述舱内人员的睁闭眼信息。
在属性信息包括种族信息的情况下,可以将人脸图像输入用于进行种族信息提取的第五神经网络中,第五神经网络包括特征提取层和种族信息提取层,在将人脸图像输入第五神经网络中之后,可以是先将人脸图像输入至特征提取层,得到人脸图像对应的特征图,然后将特征图输入至种族信息提取层,得到三维特征向量,三维特征向量中不同维度上的元素值分别用于表征所述人脸图像为该维度对应的种族的概率,所述种族包括“黄种人”、“白种人”、以及“黑种人”。
通过这种方式,在确定舱内人员的睁闭眼信息时,无需对人脸图像进行分割,直接通过人脸图像便可确定人脸图像中的睁闭眼信息,提高了睁闭眼信息检测的效率。
由以上内容可知,用于进行年龄信息提取的第一神经网络、用于进行性别信息提取的第二神经网络、用于进行睁闭眼信息提取的第四神经网络、以及用于进行种族信息提取的第五神经网络中,均包括特征提取层,因此,这五个神经网络可以共用特征提取层。
示例性的,可以参照图7所示,图7为本公开实施例提供的一种属性信息确定的方法,包括以下几个步骤:
步骤701、将所述人脸图像输入至用于进行属性识别的第二神经网络中的特征提取层,得到所述人脸图像对应的特征图。
其中,特征提取层用于对输入的人脸特征进行特征提取,示例性的,特征提取层可以采用inception网络、轻量化网络mobilenet-v2等。
步骤702、将所述特征图分别输入至信息提取神经网络的各个属性信息提取层,得到每一属性信息提取层输出的属性信息,其中,不同属性信息提取层用于检测不同的属性信息。
在一种可能的实现方式中,信息提取神经网络中的每一属性信息提取层均包括第一全连接层和第二全连接层,在将特征图输入至信息提取神经网络的属性信息提取层之后,相当于先将特征图输入属性信息提取层的第一全连接层,得到特征图对应的M维向量;M为与任一属性信息对应的预设正整数,然后将M维向量输入至该属性信息提取层的第二全连接层,得到特征图对应的N维向量,其中N为正整数,且M大于N,N为改属性信息提取层所对应的属性信息的取值个数,最后基于得到的N维向量,确定与该N维向量对应的属性信息。
其中,N为该属性信息提取层所对应的取值个数,示例性的可以理解为,若属性信息提取层提取的属性信息为性别,则该属性信息的取值包括“男”和“女”两个,则该属性信息提取层所对应的N的取值为2。
下面将以属性信息包括年龄信息、性别信息、种族信息为例,对上述信息提取神经网络的结构做出说明,信息提取神经网络的网络结构可以如图8所示。
在将人脸图像输入至特征提取层之后,可以得到人脸图像对应的特征图,然后将特征图分别输入年龄信息提取层、性别信息提取层、种族信息提取层、以及睁闭眼信息提取层。
年龄信息提取层中包括第一全连接层和第二全连接层,在将特征图输入至第一全连接层之后,可以得到K 1维的特征向量,然后将K 1维的特征向量输入至第二全连接层,得到一维向量输出,该一维向量中的元素值即为预测的年龄的取值。另外,考虑到年龄的取值应为整数,则可以对该一维向量中的元素值进行四舍五入的取值,最终得到预测的年龄信息,其中,K 1大于1。
性别信息提取层中包括第一全连接层和第二全连接层,在将特征图输入至第一全连接层之后,可以得到K 2维的特征向量,然后将K 2维的特征向量输入至第二全连接层,得到二维向量输出,该二维向量中的元素值表示分别对于输入的人脸图像中用户为男性的概率和女性的概率,最后,在第二全连接层的输出可以接一个二分类网络,根据二分类结果确定性别信息提取层预测的输入的人脸图像的性别信息,其中,K 2大于2。
种族信息提取层中,在将特征图输入至可以得到K 3维的特征向量,然后将K 3维的特征向量输入至第二全连接层,得到三维向量输出,该三维向量中的元素值表示分别对于输入的人脸图像中用户为“黄种人”的概率、“黑种人”的概率以及“白种人”的概率,最后,在第二全连接层的输出可以接一个分类网络,根据分类网络的分类结果确定种族信息提取层预测的输入的人脸图像的种族信息,其中,K 3大于3。
另外,状态信息中的睁闭眼信息也可以利用上述的信息提取神经网络提取,对于睁闭眼信息提取层,所提取的为舱内人员的两只眼睛的状态,其中,眼睛的状态包括“人眼不可见”(人眼不可见即为在图片中无法检测出眼睛,例如舱内人员戴墨镜)、“人眼可见且睁眼”、以及“人眼可见且闭眼”三种,因此对于两只眼睛来说,共有9种可选的状态。因此,对于睁闭眼信息提取层来说,第一全连接层的输出是K 4维的特征向量,第二全连接层的输出是九维的特征向量,向量中每一元素值用于表征所述人脸图像中的舱内人员的眼睛状态为该元素值表示的状态的概率,在第二全连接层的输出接一个分类网络,可以根据分类网络的分类结果确定睁闭眼信息提取层预测的输入的人脸图像的睁闭眼信息,其中,K 4大于9。
信息提取神经网络在训练过程中,可以通过带有属性信息标签的样本图像进行训练,各个属性信息提取层一起训练,在计算损失值时,分别计算每一属性信息提取层的损失值,然后根据各个属性信息提取层的损失值去调整对应的属性信息提取层的网络参数值,将各个属性信息提取层的损失值进行求和运算,作为总损失值,然后根据总损失值,调整特征提取层的网络参数值,在一种可能的实现方式中信息提取神经网络的训练过程在此将不再展开介绍。
在一种可能的实现方式中,在确定舱内人员的情绪信息时,可以根据如图9所述的方法,包括以下几个步骤:
步骤901、根据所述人脸图像,识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作。
步骤902、基于识别到的所述每一所述器官的动作、以及预先设置的面部动作与情绪信息之间的映射关系,确定所述舱内人员的情绪信息。
在识别人脸图像代表的人脸上的至少两个器官中每一器官的动作时,可以通过第三神经网络对人脸图像进行识别,第三神经网络包括主干网络和至少两个分类分支网络,每一分类分支网络用于识别人脸上的一个器官的一种动作。
在一种可能的实现方式中,在利用第三神经网络对人脸图像进行识别时,可以先利用主干网络对人脸图像进行特征提取,得到人脸图像的特征图,然后分别利用每一分类分支网络根据人脸图像的特征图,进行动作识别,得到每一分类分支网络能够识别的动作的发生概率,然后将发生概率大于预设概率的动作确定为人脸图像代表的人脸上的器官的动作。
在一种可能的实现方式中,在将人脸图像输入至第三神经网络之前,还可以先对人脸图像进行预处理,以增强人脸图像中的关键信息,然后将经过预处理的人脸图像输入至第三神经网络中。
其中,所述对人脸图像进行预处理,可以是先确定人脸图像中的关键点的位置信息,然后基于关键点的位置信息,对人脸图像进行仿射变换,得到人脸图像对应的转正后图像,再对转正后的人脸图像进行归一化处理,得到处理后的人脸图像。
所述对转正后的人脸图像进行归一化处理,包括:计算人脸图像中所包含的各个像素点的像素值均值、以及人脸图像中所包含的各个像素点的像素值标准差;基于所述像素值均值、以及所述像素值标准差,对人脸图像中的每一像素点的像素值进行归一化处理。
在一种可能的实现方式中,在基于像素值均值、以及像素值标准差,对人脸图像中的每一像素点的像素值进行归一化处理时,可以参照下述公式(7):
Figure PCTCN2020135500-appb-000008
其中,Z表示像素点进行归一化处理后的像素值,X表示像素点进行归一化处理前的像素值,μ表示像素值均值,σ表示像素值标准差。
通过上述处理,可以将人脸图像中的人脸进行转正处理,在确定人脸表情时更加精确。
其中,动作单元所检测的动作包括以下至少一种:
皱眉、瞪眼、嘴角上扬、上唇上抬、嘴角向下、张嘴。
根据人脸的面部动作检测结果、以及预先设置的面部动作与情绪信息之间的映射关系,可以确定出舱内人员的情绪信息,示例性的,若未检测出任何一个面部动作,则可以确定舱内人员的情绪信息是平静,若检测出舱内人员的面部动作是瞪眼和张嘴,则可以确定舱内人员的情绪信息是惊讶等。
基于这种方式,不需要用户针对人脸图像进行表情状态的主观定义,另外,由于人脸上的器官的动作可以专注于某些特定的人脸特征,对人脸图像进行器官的动作的识别,相比直接进行表情姿态的识别,可以提升准确性。
针对步骤103:
在调整舱内的环境设置时,可以包括以下类型的调整中的至少之一:
调整音乐类型;调整温度;调整灯光类型;调整气味。
在一种可能的实现方式中,在根据舱内人员的属性信息和情绪信息,调整舱内的环境设置时,若舱内人员仅有一人,则可以直接根据该舱内人员的属性信息和情绪信息,从预先设置好的映射关系中,查找对应的调整信息,然后根据调整信息调整舱内的环境设置,其中,所述映射关系用于表示属性信息和情绪信息与调整信息之间的映射关系。
若舱内人员有多人,则可以确定不同舱内人员的属性信息取值中优先级较高的取值,以及不同舱内人员的情绪信息的取值中优先级较高的取值,然后根据优先级较高的属性信息取值和优先级较高的情绪信息的取值,调整舱内的环境设置。
示例性的,若舱内人员有两个,一个人的情绪信息为平静,一个人的情绪信息为伤心,则可以根据“伤心”来调整播放的音乐类型。
在另一种可能的实现方式中,由于属性信息是有限的,每种属性信息的取值也是有限的,状态信息的取值也是有限的,因此,可以预先设置好每种属性信息的取值和情绪信息的取值对应的调整信息,然后根据检测出的舱内人员的属性信息和情绪信息,去查找对应的调整信息。
这里,由于舱内人员的情绪信息可能是实时变化的,因此,可以随时根据舱内人员的情绪信息的变化情况,实时的对舱内的环境设置进行调整。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与舱内环境的调整方法对应的舱内环境的调整装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述舱内环境的调整方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图10所示,为本公开实施例提供的一种舱内环境的调整装置的架构示意图,所述装置包括:获取模块1001、确定模块1002、调整模块1003、以及训练模块1004;其中,
获取模块1001,被配置为获取舱内人员的人脸图像;
确定模块1002,被配置为基于人脸图像,确定所述舱内人员的属性信息和状态信息;
调整模块1003,被配置为基于所述舱内人员的属性信息和状态信息,调整舱内环境。
在一种可能的实现方式中,所述属性信息包括年龄信息,所述年龄信息通过第一神经网络识别得到;
所述装置还包括训练模块1004,所述训练模块1004,被配置为根据以下方法得到所述第一神经网络:通过待训练的第一神经网络对样本图像集合中的样本图像进行年龄预测,得到所述样本图像对应的预测年龄值;基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,所述样本图像集合为多个,所述训练模块1004,被进一步配置为:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
在一种可能的实现方式中,所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像;所述训练模块1004,被进一步配置为:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,调整第一神经网络的网络参数值;其中,所述样本图像为初始样本图像或者增强样本图像。
在一种可能的实现方式中,所述样本图像集合为多个,每一所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像,同一样本图像集合中的多个初始样本图像为通过同一图像采集设备采集得到;所述训练模块1004,被进一步配置为:基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,并基于计算出的损失值,调整第一神经网络的网络参数值;其中,所述样本图像为初始样本图像或者增强样本图像。
在一种可能的实现方式中,所述训练模块1004,被进一步配置为:根据每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,计算第一损失值;以及,根据所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算第二损失值;将所述第一损失值和所述第二损失值之和作为本次训练过程中的损失值。
在一种可能的实现方式中,所述训练模块1004,被进一步配置为根据以下方法确定初始样本图像对应的增强样本图像:生成所述初始样本图像中人脸区域图像对应的三维人脸模型;对所述三维人脸模型进行不同角度的旋转,得到不同角度下的第一增强样本图像;以及,将所述初始样本图像中每一像素点在RGB通道上的取值,与不同的光线影响值相加,得到在不同的光线影响值下的第二增强样本图像;所述增强样本图像为所述第一增强样本图像或所述第二增强样本图像。
在一种可能的实现方式中,所述属性信息包括性别信息,所述确定模块1002,被进一 步配置为根据以下方法确定所述舱内人员的性别信息:将所述人脸图像输入用于进行性别信息提取的第二神经网络中,得到所述第二神经网络输出的二维特征向量,所述二维特征向量中第一维度上的元素值用于表征所述人脸图像为男性的概率,第二维度上的元素值用于表征所述人脸图像为女性的概率;将所述二维特征向量输入至分类器中,将概率大于设定阈值的性别确定为所述人脸图像的性别。
在一种可能的实现方式中,所述确定模块1002,被进一步配置为根据以下方法确定所述设定阈值:获取采集所述人脸图像的图像采集设备在所述舱内采集的多张样本图像,以及每一所述样本图像对应的性别标签;将所述多张样本图像输入至所述第二神经网络中,得到每一所述样本图像分别在多个候选阈值中每一所述候选阈值下对应的预测性别;针对每一所述候选阈值,根据所述候选阈值下的每一所述样本图像对应的预测性别和性别标签,确定所述候选阈值下的预测准确率;将最大的预测准确率对应的候选阈值确定为所述设定阈值。
在一种可能的实现方式中,所述确定模块1002,被进一步配置为根据以下方法确定所述多个候选阈值:按照设定步长,从预设取值范围内选取所述多个候选阈值。
在一种可能的实现方式中,所述状态信息包括睁闭眼信息,所述确定模块1002,被配置为根据以下方法确定所述舱内人员的睁闭眼信息:对所述人脸图像进行特征提取,得到多维特征向量,所述多维特征向量中每一维度上的元素值用于表征所述人脸图像中的眼睛处于所述维度对应的状态的概率;将概率大于预设值的维度对应的状态,确定为所述舱内人员的睁闭眼信息。
在一种可能的实现方式中,眼睛的状态包括以下状态中的至少之一:人眼不可见;人眼可见且睁眼;人眼可见且闭眼。
在一种可能的实现方式中,所述状态信息包括情绪信息,所述确定模块1002,被进一步配置为根据以下步骤确定舱内人员的情绪信息:根据所述人脸图像,识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作;基于识别到的所述每一所述器官的动作、以及预先设置的面部动作与情绪信息之间的映射关系,确定所述舱内人员的情绪信息。
在一种可能的实现方式中,人脸上的器官的动作包括以下动作中的至少两种:皱眉;瞪眼;嘴角上扬;上唇上抬;嘴角向下;张嘴。
在一种可能的实现方式中,根据所述人脸图像识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作是由第三神经网络执行的,所述第三神经网络包括主干网络和至少两个分类分支网络,每一所述分类分支网络用于识别人脸上的一个器官的一种动作;
所述确定模块1002,被进一步配置为:利用主干网络对所述人脸图像进行特征提取,得到所述人脸图像的特征图;分别利用每一所述分类分支网络对所述人脸图像的特征图进行动作识别,得到每一所述分类分支网络能够识别的动作的发生概率;将发生概率大于预设概率的动作确定为所述人脸图像代表的人脸上的器官的动作。
在一种可能的实现方式中,所述调整舱内的环境设置,包括以下类型的调整中的至少一种:调整音乐类型;调整温度;调整灯光类型;调整气味。
基于同一技术构思,本申请实施例还提供了一种电子设备。参照图11所示,为本申请实施例提供的电子设备1100的结构示意图,包括处理器1101、存储器1102和总线1103。其中,存储器1102被配置为存储执行指令,包括内存11021和外部存储器11022;这里的内存11021也称内存储器,被配置为暂时存放处理器1101中的运算数据,以及与硬盘等外部存储器11022交换的数据,处理器1101通过内存11021与外部存储器11022进行数据交换,当电子设备1100运行时,处理器1101与存储器1102之间通过总线1103通信,使得处理器1101在执行上述方法实施例中所述的舱内环境的调整方法的步骤。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的舱内环境的调整方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供的舱内环境的调整方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可被配置为执行上述方法实施例中所述的舱内环境的调整方法的步骤,可参见上述方法实施例,在此不再赘述。
本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品体现为计算机存储介质,在另一个可选实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开实施例揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开实施例的保护范围应所述以权利要求的保护范围为准。
工业实用性
本公开实施例通过获取舱内人员的人脸图像;基于所述人脸图像,确定所述舱内人员的属性信息和状态信息;基于所述舱内人员的属性信息和状态信息,调整舱内环境。这样,由于人脸图像是实时获取的,因此所确定出的舱内人员的属性信息和状态信息就可以代表舱内人员当前的状态,根据舱内人员当前的状态调整舱内的环境设置,可以自动对于舱内环境设置进行动态调整。

Claims (20)

  1. 一种舱内环境的调整方法,包括:
    获取舱内人员的人脸图像;
    基于所述人脸图像,确定所述舱内人员的属性信息和状态信息;
    基于所述舱内人员的属性信息和状态信息,调整舱内环境。
  2. 根据权利要求1所述的方法,其中,所述属性信息包括年龄信息,所述年龄信息通过第一神经网络识别得到;
    根据以下方法得到所述第一神经网络:
    通过待训练的第一神经网络对样本图像集合中的样本图像进行年龄预测,得到所述样本图像对应的预测年龄值;
    基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
  3. 根据权利要求2所述的方法,其中,所述样本图像集合为多个;
    所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值,包括:
    基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值。
  4. 根据权利要求2所述的方法,其中,所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像;
    所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值,包括:
    基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,调整第一神经网络的网络参数值;
    其中,所述样本图像为初始样本图像或者增强样本图像。
  5. 根据权利要求2所述的方法,其中,所述样本图像集合为多个,每一所述样本图像集合中包括多个初始样本图像,以及每一所述初始样本图像对应的增强样本图像,所述增强样本图像为对所述初始样本图像进行信息变换处理后的图像,同一样本图像集合中的多个初始样本图像为通过同一图像采集设备采集得到;
    所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、所述样本图像集合中的样本图像的预测年龄值之差、以及所述样本图像集合中的样本图像的年龄标签的年龄值之差,调整第一神经网络的网络参数值,包括:
    基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,并基于计算出的损失值,调整第一神经网络的网络参数值;
    其中,所述样本图像为初始样本图像或者增强样本图像。
  6. 根据权利要求5所述的方法,其中,所述基于每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预 测年龄值之差、所述任意两个样本图像的年龄标签的年龄值之差、以及所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算本次训练过程中的损失值,包括:
    根据每一所述样本图像对应的预测年龄值与所述样本图像的年龄标签的年龄值之差、同一样本图像集合中任意两个样本图像的预测年龄值之差、以及所述任意两个样本图像的年龄标签的年龄值之差,计算第一损失值;以及,
    根据所述初始样本图像的预测年龄值与所述初始样本图像对应的增强样本图像的预测年龄值之差,计算第二损失值;
    将所述第一损失值和所述第二损失值之和作为本次训练过程中的损失值。
  7. 根据权利要求4至6任一项所述的方法,其中,根据以下方法确定所述初始样本图像对应的增强样本图像:
    生成所述初始样本图像中人脸区域图像对应的三维人脸模型;
    对所述三维人脸模型进行不同角度的旋转,得到不同角度下的第一增强样本图像;以及,
    将所述初始样本图像中每一像素点在RGB通道上的取值,与不同的光线影响值相加,得到在不同的光线影响值下的第二增强样本图像;
    所述增强样本图像为所述第一增强样本图像或所述第二增强样本图像。
  8. 根据权利提取要求1所述的方法,其中,所述属性信息包括性别信息,根据以下方法确定所述舱内人员的性别信息:
    将所述人脸图像输入用于进行性别信息提取的第二神经网络中,得到所述第二神经网络输出的二维特征向量,所述二维特征向量中第一维度上的元素值用于表征所述人脸图像为男性的概率,第二维度上的元素值用于表征所述人脸图像为女性的概率;
    将所述二维特征向量输入至分类器中,将概率大于设定阈值的性别确定为所述人脸图像的性别。
  9. 根据权利要求8所述的方法,其中,根据以下方法确定所述设定阈值:
    获取采集所述人脸图像的图像采集设备在所述舱内采集的多张样本图像,以及每一所述样本图像对应的性别标签;
    将所述多张样本图像输入至所述第二神经网络中,得到每一所述样本图像分别在多个候选阈值中每一所述候选阈值下对应的预测性别;
    针对每一所述候选阈值,根据所述候选阈值下的每一所述样本图像对应的预测性别和性别标签,确定所述候选阈值下的预测准确率;
    将最大的预测准确率对应的候选阈值确定为所述设定阈值。
  10. 根据权利要求9所述的方法,其中,根据以下方法确定所述多个候选阈值:
    按照设定步长,从预设取值范围内选取所述多个候选阈值。
  11. 根据权利要求1所述的方法,其中,所述状态信息包括睁闭眼信息,根据以下方法确定所述舱内人员的睁闭眼信息:
    对所述人脸图像进行特征提取,得到多维特征向量,所述多维特征向量中每一维度上的元素值用于表征所述人脸图像中的眼睛处于所述维度对应的状态的概率;
    将概率大于预设值的维度对应的状态,确定为所述舱内人员的睁闭眼信息。
  12. 根据权利要求11所述的方法,其中,眼睛的状态包括以下状态中的至少之一:
    人眼不可见;人眼可见且睁眼;人眼可见且闭眼。
  13. 根据权利要求1所述的方法,其中,所述状态信息包括情绪信息,根据以下步骤确定舱内人员的情绪信息:
    根据所述人脸图像,识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作;
    基于识别到的所述每一所述器官的动作、以及预先设置的面部动作与情绪信息之间的映射关系,确定所述舱内人员的情绪信息。
  14. 根据权利要求13所述的方法,其中,人脸上的器官的动作包括以下动作中的至少两种:
    皱眉;瞪眼;嘴角上扬;上唇上抬;嘴角向下;张嘴。
  15. 根据权利要求13所述的方法,其中,根据所述人脸图像识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作是由第三神经网络执行的,所述第三神经网络包括主干网络和至少两个分类分支网络,每一所述分类分支网络用于识别人脸上的一个器官的一种动作;
    根据所述人脸图像识别所述人脸图像代表的人脸上的至少两个器官中每一所述器官的动作,包括:
    利用所述主干网络对所述人脸图像进行特征提取,得到所述人脸图像的特征图;
    分别利用每一所述分类分支网络对所述人脸图像的特征图进行动作识别,得到每一所述分类分支网络能够识别的动作的发生概率;
    将发生概率大于预设概率的动作确定为所述人脸图像代表的人脸上的器官的动作。
  16. 根据权利要求1至15任一项所述的方法,其中,所述调整舱内的环境设置,包括以下类型的调整中的至少之一:
    调整音乐类型;调整温度;调整灯光类型;调整气味。
  17. 一种舱内环境的调整装置,包括:
    获取模块,被配置为获取舱内人员的人脸图像;
    确定模块,被配置为基于所述人脸图像,确定所述舱内人员的属性信息和状态信息;
    调整模块,被配置为基于所述舱内人员的属性信息和状态信息,调整舱内环境。
  18. 一种电子设备,其中,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当所述电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至16任一项所述的舱内环境的调整方法的步骤。
  19. 一种计算机可读存储介质,其中,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至16任一项所述的舱内环境的调整方法的步骤。
  20. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至16中任一项所述的舱内环境的调整方法的步骤。
PCT/CN2020/135500 2020-03-30 2020-12-10 一种舱内环境的调整方法及装置 WO2021196721A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020227013199A KR20220063256A (ko) 2020-03-30 2020-12-10 캐빈 내부 환경의 조절 방법 및 장치
JP2022524727A JP2022553779A (ja) 2020-03-30 2020-12-10 キャビン内の環境の調整方法及び装置
US17/722,554 US20220237943A1 (en) 2020-03-30 2022-04-18 Method and apparatus for adjusting cabin environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010237887.1A CN111439267B (zh) 2020-03-30 2020-03-30 一种舱内环境的调整方法及装置
CN202010237887.1 2020-03-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/722,554 Continuation US20220237943A1 (en) 2020-03-30 2022-04-18 Method and apparatus for adjusting cabin environment

Publications (1)

Publication Number Publication Date
WO2021196721A1 true WO2021196721A1 (zh) 2021-10-07

Family

ID=71649308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135500 WO2021196721A1 (zh) 2020-03-30 2020-12-10 一种舱内环境的调整方法及装置

Country Status (5)

Country Link
US (1) US20220237943A1 (zh)
JP (1) JP2022553779A (zh)
KR (1) KR20220063256A (zh)
CN (1) CN111439267B (zh)
WO (1) WO2021196721A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114132328A (zh) * 2021-12-10 2022-03-04 智己汽车科技有限公司 一种自动调节驾乘环境的辅助驾驶系统及方法、存储介质
CN114925806A (zh) * 2022-03-30 2022-08-19 北京达佳互联信息技术有限公司 信息处理方法、信息处理模型训练方法及装置

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111439267B (zh) * 2020-03-30 2021-12-07 上海商汤临港智能科技有限公司 一种舱内环境的调整方法及装置
CN112085701B (zh) * 2020-08-05 2024-06-11 深圳市优必选科技股份有限公司 一种人脸模糊度检测方法、装置、终端设备及存储介质
CN112329665B (zh) * 2020-11-10 2022-05-17 上海大学 一种人脸抓拍系统
TWI755318B (zh) * 2021-04-26 2022-02-11 和碩聯合科技股份有限公司 分類方法及電子裝置
CN113850243A (zh) * 2021-11-29 2021-12-28 北京的卢深视科技有限公司 模型训练、人脸识别方法、电子设备及存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069400A (zh) * 2015-07-16 2015-11-18 北京工业大学 基于栈式稀疏自编码的人脸图像性别识别系统
CN107194347A (zh) * 2017-05-19 2017-09-22 深圳市唯特视科技有限公司 一种基于面部动作编码系统进行微表情检测的方法
CN108528371A (zh) * 2018-03-07 2018-09-14 北汽福田汽车股份有限公司 车辆的控制方法、系统及车辆
CN109131167A (zh) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 用于控制车辆的方法和装置
CN109308519A (zh) * 2018-09-29 2019-02-05 广州博通信息技术有限公司 一种基于神经网络的制冷设备故障预测方法
CN109711309A (zh) * 2018-12-20 2019-05-03 北京邮电大学 一种自动识别人像图片是否闭眼的方法
CN109766840A (zh) * 2019-01-10 2019-05-17 腾讯科技(深圳)有限公司 人脸表情识别方法、装置、终端及存储介质
CN110175501A (zh) * 2019-03-28 2019-08-27 重庆电政信息科技有限公司 基于人脸识别的多人场景专注度识别方法
US20200019759A1 (en) * 2018-07-11 2020-01-16 Samsung Electronics Co., Ltd. Simultaneous recognition of facial attributes and identity in organizing photo albums
CN111439267A (zh) * 2020-03-30 2020-07-24 上海商汤临港智能科技有限公司 一种舱内环境的调整方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000010993U (ko) * 1998-11-28 2000-06-26 윤종용 방수수단을 갖는 키 입력장치
KR20200010993A (ko) * 2018-07-11 2020-01-31 삼성전자주식회사 보완된 cnn을 통해 이미지 속 얼굴의 속성 및 신원을 인식하는 전자 장치.
CN109686050A (zh) * 2019-01-18 2019-04-26 桂林电子科技大学 基于云服务与深度神经网络的车内环境监测预警方法

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069400A (zh) * 2015-07-16 2015-11-18 北京工业大学 基于栈式稀疏自编码的人脸图像性别识别系统
CN107194347A (zh) * 2017-05-19 2017-09-22 深圳市唯特视科技有限公司 一种基于面部动作编码系统进行微表情检测的方法
CN108528371A (zh) * 2018-03-07 2018-09-14 北汽福田汽车股份有限公司 车辆的控制方法、系统及车辆
US20200019759A1 (en) * 2018-07-11 2020-01-16 Samsung Electronics Co., Ltd. Simultaneous recognition of facial attributes and identity in organizing photo albums
CN109131167A (zh) * 2018-08-03 2019-01-04 百度在线网络技术(北京)有限公司 用于控制车辆的方法和装置
CN109308519A (zh) * 2018-09-29 2019-02-05 广州博通信息技术有限公司 一种基于神经网络的制冷设备故障预测方法
CN109711309A (zh) * 2018-12-20 2019-05-03 北京邮电大学 一种自动识别人像图片是否闭眼的方法
CN109766840A (zh) * 2019-01-10 2019-05-17 腾讯科技(深圳)有限公司 人脸表情识别方法、装置、终端及存储介质
CN110175501A (zh) * 2019-03-28 2019-08-27 重庆电政信息科技有限公司 基于人脸识别的多人场景专注度识别方法
CN111439267A (zh) * 2020-03-30 2020-07-24 上海商汤临港智能科技有限公司 一种舱内环境的调整方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114132328A (zh) * 2021-12-10 2022-03-04 智己汽车科技有限公司 一种自动调节驾乘环境的辅助驾驶系统及方法、存储介质
CN114132328B (zh) * 2021-12-10 2024-05-14 智己汽车科技有限公司 一种自动调节驾乘环境的辅助驾驶系统及方法、存储介质
CN114925806A (zh) * 2022-03-30 2022-08-19 北京达佳互联信息技术有限公司 信息处理方法、信息处理模型训练方法及装置

Also Published As

Publication number Publication date
KR20220063256A (ko) 2022-05-17
US20220237943A1 (en) 2022-07-28
CN111439267A (zh) 2020-07-24
CN111439267B (zh) 2021-12-07
JP2022553779A (ja) 2022-12-26

Similar Documents

Publication Publication Date Title
WO2021196721A1 (zh) 一种舱内环境的调整方法及装置
WO2017107957A9 (zh) 人脸图像的检索方法及装置
WO2018205801A1 (zh) 人脸动画实现的方法、计算机设备及存储介质
CN109271930B (zh) 微表情识别方法、装置与存储介质
WO2021139475A1 (zh) 一种表情识别方法及装置、设备、计算机可读存储介质、计算机程序产品
CN108629336B (zh) 基于人脸特征点识别的颜值计算方法
US20140153832A1 (en) Facial expression editing in images based on collections of images
Zhao et al. Applying contrast-limited adaptive histogram equalization and integral projection for facial feature enhancement and detection
Hebbale et al. Real time COVID-19 facemask detection using deep learning
Robin et al. Improvement of face and eye detection performance by using multi-task cascaded convolutional networks
WO2021127916A1 (zh) 脸部情感识别方法、智能装置和计算机可读存储介质
RU2768797C1 (ru) Способ и система для определения синтетически измененных изображений лиц на видео
Mamatov et al. Method for false attack detection in face identification system
Chen et al. Robust gender recognition for uncontrolled environment of real-life images
Lin et al. A gender classification scheme based on multi-region feature extraction and information fusion for unconstrained images
CN111191549A (zh) 一种两级人脸防伪检测方法
Gowda et al. Facial Expression Analysis and Estimation Based on Facial Salient Points and Action Unit (AUs)
CN113723165A (zh) 基于深度学习的待检测人员危险表情检测方法及系统
Gilorkar et al. A review on feature extraction for Indian and American sign language
JP5325687B2 (ja) 個人属性推定装置、個人属性推定方法および個人属性推定システム
Wang et al. Framework for facial recognition and reconstruction for enhanced security and surveillance monitoring using 3D computer vision
Gabdiev et al. Models and Methods for Solving Face Recognition Problem by Photos
Frieslaar Robust south african sign language gesture recognition using hand motion and shape
Kartbayev et al. Development of a computer system for identity authentication using artificial neural networks
Kao et al. Gender Classification with Jointing Multiple Models for Occlusion Images.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928353

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20227013199

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2022524727

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928353

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/07/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20928353

Country of ref document: EP

Kind code of ref document: A1