WO2021098855A1 - 用户信息的检测方法及系统、电子设备 - Google Patents

用户信息的检测方法及系统、电子设备 Download PDF

Info

Publication number
WO2021098855A1
WO2021098855A1 PCT/CN2020/130631 CN2020130631W WO2021098855A1 WO 2021098855 A1 WO2021098855 A1 WO 2021098855A1 CN 2020130631 W CN2020130631 W CN 2020130631W WO 2021098855 A1 WO2021098855 A1 WO 2021098855A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detection
user
information
detection object
Prior art date
Application number
PCT/CN2020/130631
Other languages
English (en)
French (fr)
Inventor
方三勇
甄海洋
王进
Original Assignee
虹软科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 虹软科技股份有限公司 filed Critical 虹软科技股份有限公司
Priority to EP20888865.1A priority Critical patent/EP4064113A4/en
Publication of WO2021098855A1 publication Critical patent/WO2021098855A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/593Recognising seat occupancy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Definitions

  • the present invention relates to the technical field of information processing, and in particular to a method and system for detecting user information, and electronic equipment.
  • Private cars have become one of the most convenient means of transportation for ordinary families, such as commuting to and from get off work on weekdays, picking up children to and from school, and family outings on weekends. Wait. However, news such as children being forgotten in the car is not uncommon. If it is not found in time, the child may faint due to lack of oxygen in the car, and even life-threatening. If it is a hot summer, the temperature inside the car is very high. , It is easy to produce the danger of suffocation, which will cause irreparable harm to the family.
  • the current way of detecting children in the car is mostly through infrared sensors, and then the signal is transmitted to the controller of the single-chip microcomputer through digital-to-analog conversion for early warning operation.
  • the sensor itself is easily interfered by various heat sources and light sources.
  • the traditional ability is poor.
  • the infrared radiation of the human body is easily blocked and difficult to be received by the probe.
  • the ambient temperature is close to the human body temperature in summer, the detection and sensitivity are significantly reduced. Sometimes It will cause malfunction, so that timely treatment cannot be carried out when the child is in danger, leading to serious consequences.
  • this method of detecting through infrared sensors also has higher requirements on the installation position and sensitivity of the sensor, which requires a lot of cost. The installation can only be completed at a price, and the early warning effect is poor.
  • the embodiments of the present disclosure provide a method and system for detecting user information, and electronic equipment to at least solve the technical problem that the infrared sensor detects the state of the child in the inner area of the vehicle in the related art, which is easily affected by the environment and causes the equipment to malfunction.
  • a method for detecting user information including: acquiring a first image; inputting the first image into a first detection model to determine the active user's activity in the target area User information, wherein the user information includes at least one of the following: user age information, user gender information, user behavior information, user facial expression information, user body shape information, and user clothing information; and output detection results based on the user information.
  • the detection method further includes: extracting image information in the first image, and determining whether a first detection object exists in the first image, wherein the first detection object includes At least one of the following: human face, human head, torso, limbs, human body; if the first detection object does not exist in the first image, delete the first image; or, if the first image exists
  • the first detection object intercept the region of interest in the first image where the first detection object is located; perform image processing on the region of interest in the first image to obtain a first detection object that contains a qualified first detection object. Check the image.
  • performing image processing on the region of interest in the first image to obtain a first detection image containing a qualified first detection object includes: performing an initial quality evaluation on the region of interest to obtain an image quality evaluation result , wherein the evaluation content of the initial quality evaluation includes at least one of the following: image blur evaluation, angle evaluation, position evaluation, and light intensity evaluation; if the image quality evaluation result indicates that the quality of the first inspection object is unqualified, Stop detecting the first image; if the image quality evaluation result indicates that the quality of the first detection object is qualified, the subregion where the first detection object is located in the region of interest is used as the first detection image.
  • the detection method further includes: extracting coordinates of multiple characteristic points in the first detection image; Determine the center point coordinates of the first detection object in the feature point coordinates; based on the center point coordinates of the first detection object, map the first detection object center point in the first detection image to a preset first A standard image indicates the center point of the first detection object to align the first detection image and the first standard image.
  • mapping the center point of the first test object in the first test image to the center point of the first test object in a preset first standard image includes: using the first standard image of the first standard image to indicate the center point of the first test object.
  • the center point of a detection object is used as a reference, and the first detection image is moved above the first standard image; the center point of the first detection object is used as a reference, and the scale of the first standard image is reduced Or enlarge the first detection image so that the size of the first detection image and the first standard image are the same; if the detection object of the first detection image and the detection object of the first standard image are inconsistent in orientation ,
  • the first detection image is rotated to make the detection object of the first detection image and the detection object of the first standard image have the same orientation; after the detection object of the first detection image and the detection object of the first standard image have the same orientation , Determining to map the center point in the first detection image to the center point in the first standard image indicating the first detection object.
  • the detection method further includes: acquiring multiple user images containing different image factors, where the image factors include at least one of the following: image scene, light intensity, resolution, User decoration; filter the multiple user images to obtain multiple image sample sets corresponding to different user categories, wherein each user image in each image sample set corresponds to the attribute label of the first detection object and the category to which the user belongs The category identification; crop each image in the multiple image sample sets to obtain multiple first standard images; place the first standard image in the multiple image sample sets and each of the first standard images on The attribute label and category identification of the first detection object are input to the initial network model to train the initial network model to obtain the first detection model.
  • image factors include at least one of the following: image scene, light intensity, resolution, User decoration
  • filter the multiple user images to obtain multiple image sample sets corresponding to different user categories, wherein each user image in each image sample set corresponds to the attribute label of the first detection object and the category to which the user belongs
  • the category identification crop each image in the multiple image sample sets to obtain multiple first standard images; place the first standard image
  • the initial network model includes at least: a data layer, a convolutional layer, a pooling layer, an activation layer, a fully connected layer, and an output layer.
  • the step of training the initial network model to obtain the first detection model includes: combining the first standard image of the multiple image sample sets and the first standard image on each of the first standard images.
  • the attribute label and category identification of a detection object are input to the training network of the initial network model through the data layer of the initial network model; the convolutional layer of the initial network model is trained, and all the data are extracted through preset convolution parameters.
  • the data features of the first standard image in the multiple image sample sets are described to obtain a first data network feature map, where the convolution parameters at least include: a first extraction step size, a convolution kernel size, and a number of convolution kernels;
  • the pooling layer of the initial network model is trained, and the first data network feature map is down-sampled through preset pooling parameters to obtain a second data network feature map, where the pooling parameters include at least: The second extraction step size and pooling size; training the activation layer of the initial network model, and performing non-linear change processing on the second data network feature map, wherein the non-linear change processing method includes using at least the following One of the activation functions: relu activation function, prelu activation function, relu6 activation function; training the fully connected layer of the initial network model, connecting the first data network feature map and the second data network feature map, and passing The preset feature weights map the feature space in the feature map to the identification space through linear transformation, where the identification space is set to record the attribute label and category
  • the step of inputting the first image into a first detection model to determine user information of active users in the target area includes: converting the first image to the first detection model through the data layer of the first detection model.
  • the image is input to the data network of the first detection model; the convolutional layer, the pooling layer, and the activation layer of the first detection model are used to perform image feature extraction on the first image to obtain a multi-dimensional image output vector;
  • the multi-dimensional image output vector is input to different fully connected layers to obtain user information evaluation results, and the user information evaluation results are input to an output layer; the output layer of the first detection model is used to output user information.
  • the detection method further includes: inputting the first image into a second detection model to determine detection information of a second detection object in the first image, wherein the second detection object is The reference object of the first detection object.
  • the detection result is obtained and output based on the detection information of the second detection object and the user information.
  • the user information further includes: user gender, user activity posture, facial expression, and degree of fatigue of the detected object.
  • the application scenario of the detection method includes at least one of the following: vehicle internal personnel monitoring, elevator personnel monitoring.
  • a system for detecting user information including: an image capturing device configured to obtain a first image; an analysis device configured to input the first image to the first detection Model to determine user information of users who are active in the target area, where the user information includes at least one of the following: user age information, user gender information, user behavior information, user expression information, user body shape information, user Clothing information; a result output device configured to output a detection result based on the user information.
  • the image capture device is an independent camera device or a camera device integrated with the result output device in one device.
  • the detection system further includes: a location judging device configured to extract image information in the first image after acquiring the first image, and use a detection object detector to determine whether there is a first image in the first image.
  • a detection object wherein the first detection object includes at least one of the following: a human face, a human head, a torso, limbs, and a human body; the deletion unit is set to when the first detection object does not exist in the first image, Deleting the first image; or, an intercepting unit configured to intercept the region of interest in the first image where the first detection object is located when the first detection object exists in the first image; image processing device , Setting to perform image processing on the region of interest in the first image to obtain a first detection image containing a qualified first detection object.
  • the image processing device includes: a quality evaluation unit configured to perform an initial quality evaluation of the region of interest to obtain an image quality evaluation result, wherein the evaluation content of the initial quality evaluation includes at least one of the following : Image blur evaluation, angle evaluation, position evaluation, light intensity evaluation; a stop unit configured to stop detecting the first image when the image quality evaluation result indicates that the quality of the first inspection object is unqualified; first determination A unit, configured to, when the image quality evaluation result indicates that the quality of the first detection object is qualified, use the subregion where the first detection object is located in the region of interest as the first detection image.
  • the detection system further includes: a coordinate extraction device configured to, after taking the sub-region where the first detection object is located in the region of interest as the first detection image, extract from the first detection image A plurality of feature point coordinates; the second determining unit is configured to determine the center point coordinates of the first detection object in the feature point coordinates; the first mapping unit is configured to determine the center point coordinates of the first detection object based on the The center point of the first detection object in the first detection image is mapped to a center point indicating the first detection object in a preset first standard image, so as to align the first detection image and the first standard image .
  • a coordinate extraction device configured to, after taking the sub-region where the first detection object is located in the region of interest as the first detection image, extract from the first detection image A plurality of feature point coordinates
  • the second determining unit is configured to determine the center point coordinates of the first detection object in the feature point coordinates
  • the first mapping unit is configured to determine the center point coordinates of the first detection object based on the The center point
  • the first mapping unit includes: a first movement module configured to move the first detection image to the first detection image based on the center point of the first detection object of the first standard image Above the standard image; the alignment module is set to use the center point of the first detection object as a reference, according to the ratio of the first standard image, reduce or expand the first detection image, so that the first detection image
  • the size of the first standard image is the same;
  • the rotation module is configured to rotate the first detection image when the detection object of the first detection image and the detection object of the first standard image have different orientations.
  • the detection object of the first detection image and the detection object of the first standard image have the same orientation; the first determining module is configured to determine that the detection object of the first detection image and the detection object of the first standard image have the same orientation.
  • the center point in the first detection image is mapped to the center point in the first standard image indicating the first detection object.
  • the detection system further includes: an image acquisition device configured to acquire multiple user images containing different image factors before acquiring the first image, wherein the image factors include at least one of the following: image scene , Illumination, resolution, and user decoration; an image filtering device configured to filter the multiple user images to obtain multiple image sample sets corresponding to different user categories, wherein each user image in the image sample set corresponds to There is the attribute label of the first detection object and the category identifier of the category to which the user belongs; the cropping unit is set to crop each image in the multiple image sample sets to obtain multiple first standard images; the training device is set to The first standard image of the multiple image sample sets, the attribute label and the category identification of the first detection object on each of the first standard images are input to the initial network model to train the initial network model to obtain The first detection model.
  • an image acquisition device configured to acquire multiple user images containing different image factors before acquiring the first image, wherein the image factors include at least one of the following: image scene , Illumination, resolution, and user decoration
  • an image filtering device
  • the initial network model includes at least: a data layer, a convolutional layer, a pooling layer, an activation layer, a fully connected layer, and an output layer.
  • the training device includes: a first input unit configured to collect the first standard image of the multiple image sample sets, and the attribute label and category of the first detection object on each of the first standard images Identifies that the data layer of the initial network model is input to the training network of the initial network model; the first training unit is configured to train the convolutional layer of the initial network model, and extract the The data features of the first standard image of the multiple image sample sets are obtained to obtain the first data network feature map, where the convolution parameters at least include: the first extraction step size, the size of the convolution kernel, and the number of convolution kernels;
  • the second training unit is configured to train the pooling layer of the initial network model, and perform down-sampling processing on the first data network feature map through preset pooling parameters to obtain a second data network feature map, wherein the The pooling parameters include at least: a second extraction step size and a pooling size; a third training unit is set to train the activation layer of the initial network model, and perform non-linear change processing on the second data network feature map
  • the analysis device includes: an image processing module configured to input the first image to the data network of the first detection model through the data layer of the first detection model; and a feature extraction unit configured to Perform image feature extraction on the first image using the convolutional layer, pooling layer, and activation layer of the first detection model to obtain a multi-dimensional image output vector; the third input unit is configured to input the multi-dimensional image output vector To different fully connected layers, obtain user information evaluation results, and input the user information evaluation results into the output layer; the user information output unit is configured to output user information using the output layer of the first detection model.
  • the detection system further includes: an image input unit configured to input the first image to a second detection model to determine detection information of a second detection object in the first image, wherein The second detection object is a reference object of the first detection object.
  • the detection result is obtained and output based on the detection information of the second detection object and the user information.
  • the user information further includes: user gender, user activity posture, facial expression, and degree of fatigue of the detected object.
  • the application scenario of the detection system includes at least one of the following: vehicle internal personnel monitoring, elevator personnel monitoring.
  • an electronic device including: a processor; and a memory, configured to store executable instructions of the processor; wherein the processor is configured to execute the The instructions can be executed to execute the user information detection method described in any one of the above.
  • a storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute any one of the foregoing The detection method of user information.
  • the first image is acquired first, and then the first image is input to the first detection model to determine the user information of the active user in the target area, where the user information includes at least one of the following: user Age information, user gender information, user behavior information, user facial expression information, user body shape information, user clothing information, and finally output detection results based on user information.
  • the detection method is applied to the internal area monitoring of the vehicle, the first image inside the vehicle can be used to analyze the user information of the people in the car, and the age, gender, behavior, expression, body shape, clothing and other information of the people in the car can be analyzed. Analyze and analyze the detection results of the people in the car.
  • the detection result can be used to prompt an alarm in time, so as to prevent the child from being forgotten in the car after the owner gets off the car.
  • This method of analyzing image content to obtain user information will not be affected by the environment. It only needs to ensure that the image capture device can work normally. The probability of equipment failure is significantly reduced, and the detection results are not affected.
  • the stability is high, so as to solve the technical problem that the infrared sensor detects the state of the child in the inner area of the vehicle in the related technology, which is easily affected by the environment and causes the equipment to malfunction.
  • Fig. 1 is a flowchart of an optional user information detection method according to an embodiment of the present invention
  • Fig. 2 is a schematic diagram of an optional user information detection system according to an embodiment of the present invention.
  • an embodiment of a method for detecting user information is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, Although a logical sequence is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than here.
  • the first image of the area to be detected (for example, the interior of the vehicle, the elevator in the shopping mall, the conference room, etc.) is acquired through the image capturing device, and the first image is processed to identify the detection object information, and analyze the age of the user Wait for user information and make a judgment on the age of the user, so that when the personnel in the area are in a dangerous state, an alarm prompt will be issued in time.
  • Fig. 1 is a flowchart of an optional user information detection method according to an embodiment of the present invention. As shown in Fig. 1, the method includes the following steps:
  • Step S102 acquiring a first image
  • Step S104 Input the first image to the first detection model to determine the user information of the active user in the target area, where the user information includes at least one of the following: user age information, user gender information, user behavior information, user Emoticon information, user body shape information, user clothing information.
  • the types of the first detection model and the following second detection model used in the embodiments of the present invention include, but are not limited to: Convolutional Neural Networks (CNN), which recognize displacement, scaling, and other forms of distortion. Denatured two-dimensional graphics. Since the feature detection layer of CNN learns through training data, when CNN is used, explicit feature extraction is avoided, and learning is implicitly performed from training data; in addition, due to the neuron weights on the same feature mapping surface The same, so the network can learn in parallel, which is also a major advantage of convolutional networks over networks that connect neurons to each other. Convolutional neural network has unique advantages in image processing with its special structure of local weight sharing. Its layout is closer to the actual biological neural network. Weight sharing reduces the complexity of the network, especially for multi-dimensional input vectors. The feature that images can be directly input to the network avoids the complexity of data reconstruction in the process of feature extraction and classification.
  • CNN Convolutional Neural Networks
  • Step S106 Output the detection result based on the user information.
  • the first image can be acquired first, and then the first image is input to the first detection model to determine the user information of the active user in the target area, and finally the detection result is output based on the user information.
  • the detection method is applied to the internal area monitoring of the vehicle
  • the first image inside the vehicle can be used to analyze the user information of the people in the car, and the age, gender, behavior, expression, body shape, clothing and other information of the people in the car can be analyzed. Analyze and analyze the detection results of the people in the car. For example, in the case of a child left in the car, the detection result can be used to prompt an alarm in time, so as to prevent the child from being forgotten in the car after the owner gets off the car.
  • This method of analyzing image content to obtain user information will not be affected by the environment. It only needs to ensure that the image capture device can work normally. The probability of equipment failure is significantly reduced, and the detection results are not affected. The stability is high, so as to solve the technical problem that the infrared sensor detects the state of the child in the inner area of the vehicle in the related technology, which is easily affected by the environment and causes the equipment to malfunction.
  • the first detection model can be used first to detect the user, and the second detection model can be used to assist the user detection.
  • the first detection model will be described.
  • a first detection model is pre-trained, and the first image captured is analyzed through the first detection model to obtain information including user age information, user gender information, user behavior information, user expression information, user body shape information, User information such as user clothing information.
  • the first detection model needs to be trained, including: acquiring multiple user images containing different image factors, where the image factors include at least one of the following: image Scene, illuminance, resolution, user decoration; filter multiple user images to obtain multiple image sample sets corresponding to different user categories, where each user image in each image sample set corresponds to the attribute label of the first detection object and The category identification of the category to which the user belongs; each image in the multiple image sample sets is cropped to obtain multiple first standard images; the first standard image in the multiple image sample sets, and the first standard image on each first standard image
  • the attribute label and category identification of the detection object are input to the initial network model to train the initial network model to obtain the first detection model.
  • the user image collected in the embodiment of the present invention may be a two-dimensional image or a three-dimensional image.
  • the image may be captured from multiple angles.
  • the image capturing device for example, a camera
  • the image used can be an image containing multiple image factors.
  • the embodiment of the present invention uses the analysis of the user category as the schematic description of the user information.
  • the age of the detection object it is assumed that the apparent attribute of the detection object is used as the analysis result. Since the difference between the detection objects of similar age is small and the distinguishing ability is poor, the age is divided into categories in order to obtain accurate classification results to meet the application of the present invention. Therefore, the age can be divided into multiple different categories according to the apparent age. For example, people can be divided into 3 different categories (infants 0-5 years old, children 6-15 years old, and other 16 years +).
  • the training sample set contains the above three categories, and ensure that the materials of the three categories are covered and distributed evenly, and that each individual category has more user images to be trained. Label each user image sample to be trained, and the category label is (0-5 years old: 0; 6-15 years old: 1; 16 years old+: 2).
  • the image of the first detection object can be cropped into a standard multiple Channel image (for example, cropped to a 60 ⁇ 60 size RGB three-channel color image).
  • a series of operations are performed on the user image samples to be trained, such as horizontal or vertical translation, and stretching of different scales.
  • the layers of the initial network model include the data layer, convolutional layer, pooling layer, activation layer, fully connected layer, and output layer. Except for the data layer and output layer, the input of each layer in the middle is the output of the previous layer, and the output is the next Layer input.
  • the training method in the embodiment of the present invention is a gradient descent method and a back propagation algorithm.
  • the step of training the initial network model to obtain the first detection model includes: combining the first standard image of a plurality of image sample sets and the first detection object on each first standard image
  • the attribute label and category identification of the initial network model are input to the training network of the initial network model through the data layer of the initial network model; the convolution layer of the initial network model is trained, and the first standard image of multiple image sample sets is extracted through the preset convolution parameters
  • the first data network feature map is obtained, where the convolution parameters include at least: the first extraction step size, the size of the convolution kernel, and the number of convolution kernels;
  • the pooling layer of the training initial network model, through the preset Pooling parameters perform down-sampling processing on the first data network feature map to obtain a second data network feature map, where the pooling parameters include at least: the second extraction step size, pooling size; training the activation layer of the initial network model, and
  • the second data network feature map performs non-linear change processing, where the non-linear change processing method includes using at least
  • the convolutional layer extracts data features through the set step size, convolution kernel size, and the number of convolution kernels.
  • the pooling layer downloads the feature map of the previous layer through the set step size and pooling size. Sampling, the activation layer makes nonlinear changes to the feature map of the previous layer.
  • activation functions such as the relu activation function can be used; after that, the fully connected layer connects all the feature maps, and maps the feature space to the identification space through linear transformation through weights. The full connection is followed by the relu activation function; the final output layer is to classify and regress the feature map.
  • the softmax function is used as the category classification in the embodiment of the present invention.
  • the training network model stage all training samples will be input to the initial network model (such as convolutional neural network), and the difference between the output result and the actual label is calculated through the loss function. This process is called “forward” Forward. Then, according to the difference between the output result and the actual label, the error degree of the initial network model parameters is determined, and the model parameters are updated to perform neural network learning. This process is called “Backward”. By adjusting the weight value of each layer in the initial network model, the gap between the output value of the model and the actual sample label value is getting smaller and smaller, until the output value of the network model is consistent with the actual label value or the smallest gap does not change. Finally, the required first detection model is obtained.
  • the initial network model such as convolutional neural network
  • back propagation is performed through the Loss
  • the parameters of the network model are adjusted until convergence.
  • the above-mentioned convergent network is fine-tuned and trained with the material of the in-vehicle scene, and the marking and cutting of the material are the same as the above operations; the material of the actual application scenario is used to fine-tune the above-mentioned general network model,
  • the network keeps the parameters of the previously shared feature extraction layer (referring to the network layer in front of the fully connected layer) unchanged, and the learning rate of the fully connected layer is not 0, and it can continue to learn to obtain the new parameters of the fully connected layer after fine-tuning training, thus After iterative training, higher accuracy is obtained.
  • the method to keep the parameters unchanged is to set the learning rate of the corresponding layer to 0.
  • the first detection model is trained, and the first detection model can be applied in various actual operating environments.
  • the second detection model can be understood as a model trained by the reference object of the first detection object.
  • the second detection model can assist in determining the information of the detection object, for example, detection of vehicle seats, seat backs, clothing placed in the vehicle, etc. To assist in determining user information.
  • deep learning can be used to detect whether the human body and the seat are a child or an adult.
  • the judgment may include: inputting the first image to a second detection network (for example, a seat detection network, a seat back detection network, and a clothing detection network), and outputting the position detection result of the seat and/or the seat back; Input the first detection network (human detection network) to determine whether the image contains a human body and output the human body detection result; according to the human body detection result, combined with the seat and/or seat back position detection result, it is judged whether it is a child or an adult.
  • a second detection network for example, a seat detection network, a seat back detection network, and a clothing detection network
  • the detection of the seat and/or the back of the chair can be used to assist in determining the body shape of the human body, the specific area where the human head is located, etc., to eliminate the interference of randomly placed clothes and improve the accuracy of detection.
  • the newly acquired image can be input into the model for real-time detection, analysis and judgment, and the detection result can be obtained.
  • Step S102 Acquire a first image.
  • the detection method further includes: extracting image information in the first image, and using a detection object detector to determine whether the first detection object exists in the first image, where the first image
  • a detection object includes at least one of the following: a face, a human head, a torso, limbs, and a human body; if the first detection object does not exist in the first image, the first image is deleted; or, if the first detection exists in the first image Object, intercept the region of interest in the first image where the first detection object is located; perform image processing on the region of interest in the first image to obtain a first detection image containing the qualified first detection object.
  • the above-mentioned region of interest may refer to the area after filtering out the image that does not exclude the person, and the detection is mainly performed on the area where the person is in the image.
  • an image may be captured by an RGB camera, a photographing module, an infrared camera, etc., to obtain the first image
  • the extracted image information may include, but is not limited to: RGB color information and depth information. Recognizing the color information of the color channel of the detected object image can effectively improve the recognition rate, that is, improve the accuracy of the detection object attribute analysis.
  • This application can be tested for adults, children, and the elderly, and the testing methods are diversified, including preliminary testing of human body shape to determine whether it is a child, adult, etc.; or through age or height testing, you can Determine whether it is a child; you can also use behavioral information such as limbs and motion frequency to assist in determining whether it is a child.
  • the gender can also be judged by clothing size, color, etc., and whether it is a child. Comprehensively judge the type of human beings through various information.
  • a detection object detector or a detection object judgment model to detect the area where the first detection object is located in the region of interest in the first image, and exclude the first image without the detection object, and obtain a rectangle or other rules containing the detection object The detection target image.
  • performing image processing on the region of interest in the first image to obtain a first detection image containing a qualified first detection object includes: performing an initial quality evaluation on the region of interest to obtain an image quality evaluation result, wherein, the evaluation content of the initial quality evaluation includes at least one of the following: image blur evaluation, angle evaluation, position evaluation, and light intensity evaluation; if the image quality evaluation result indicates that the quality of the first inspection object is unqualified, stop detecting the first image; If the image quality evaluation result indicates that the quality of the first detection object is qualified, the subregion where the first detection object is located in the region of interest is taken as the first detection image.
  • the quality of the detection object is evaluated on the rectangular area detected by the detection object, and the detection object that is blurred, large angle, small size, severely deviated from the detection object frame, and insufficient light is judged as unqualified. Return the unqualified state to the first test object whose evaluation result is unqualified, and stop the test. For the first test object whose evaluation result is qualified, perform the next step of calculation.
  • the detection method further includes: extracting the coordinates of multiple feature points in the first detection image; and determining the feature points
  • the coordinates indicate the center point coordinates of the first detection object.
  • the first detection object may include: eyes, mouth, torso, and limbs; based on the center point coordinates of the first detection object, the first detection object in the first detection image The center point is mapped to a center point indicating the first detection object in the preset first standard image, so as to align the first detection image and the first standard image.
  • the contour and key point coordinates of the first detection object can be obtained through feature point positioning.
  • the center point of the first detection object can include but is not limited to: eye center point, mouth corner point, eye corner point, nose center point, torso center, limbs Central part.
  • the feature point positioning scheme used can be obtained based on the landmark library.
  • auxiliary positioning can be performed on the information around the center point, for example, in order to improve the calculation accuracy of the eye center point coordinates, the eye center point coordinates can be obtained by weighting the top, bottom, left, and right of the eye circumference and the eye center point.
  • the center point of the eye located can also be used alone.
  • affine transformation can be used to align and adjust (mainly by moving, scaling, flipping, rotating, etc.)
  • the transformation of the object image to be detected and the center point of the first detection object in the standard detection object image are adjusted to the position corresponding to the standard detection object image, and a three-channel color image with the same size as the training sample is obtained through affine transformation.
  • mapping the center point of the first test object in the first test image to the center point of the preset first standard image indicating the first test object includes: using the first test object in the first standard image Move the first inspection image to the top of the first standard image as the reference; use the center point of the first inspection object as the reference, according to the ratio of the first standard image, reduce or enlarge the first inspection image, so that the first
  • the size of the detection image is the same as the size of the first standard image; if the detection object of the first detection image is inconsistent with the detection object of the first standard image, the first detection image is rotated so that the detection object of the first detection image is the same as the first standard image.
  • the detection object has the same orientation; after the detection object of the first detection image and the detection object of the first standard image have the same orientation, it is determined to map the center point in the first detection image to the preset first standard image indicating the first detection object Center point.
  • Step S104 Input the first image to the first detection model to determine the user information of the active user in the target area, where the user information includes at least one of the following: user age information, user gender information, user behavior information, user Emoticon information, user body shape information, user clothing information.
  • the step of inputting the first image into the first detection model to determine the user information of the active user in the target area includes: inputting the first image into the first detection model through the data layer of the first detection model Data network; use the convolutional layer, pooling layer and activation layer of the first detection model to perform image feature extraction on the first image to obtain a multi-dimensional image output vector; input the multi-dimensional image output vector to different fully connected layers to obtain user information
  • the evaluation result, and the user information evaluation result is input to the output layer;
  • the output layer of the first detection model is used to output the user information, where the user information includes at least one of the following: user age information, user gender information, user behavior information, and user expression Information, user body type information, user clothing information.
  • the obtained multi-channel color first image is input into the pre-trained first detection model for calculation, and the network structure order can be data layer -> N group layers -> K fully connected layers + Output layer, where N is greater than or equal to 1.
  • the group layer includes a convolutional layer, a pooling layer, and an activation layer.
  • the color images of the multiple channels are input to the feature extraction layer of the first detection model to obtain a multi-dimensional output vector, and the multi-dimensional output vector is input to different fully connected layers of the neural network to obtain the output result of the category.
  • the embodiments of the present invention can also identify the user's gender, behavior, facial expressions, etc., for example, to analyze whether the user is sleepy while driving, and if it is able to promptly issue danger prompts on a high-speed, It can avoid the danger caused by the fatigue of the user driving for a long time.
  • Step S106 Output the detection result based on the user information.
  • the application scenario of the user information detection method in the embodiment of the present invention includes at least one of the following: vehicle internal personnel monitoring and elevator personnel monitoring.
  • the image data of the detected objects of passengers entering the car can be acquired through a photographing device (such as a high-definition camera installed in the car), and the image data of the detected objects can be input into the pre-trained first detection model and output
  • the detection object category information is fed back to the on-board system based on the analysis results, so that the on-board system can intelligently adjust system parameters, and the driver can make reasonable decisions based on the system prompts when driving and parking.
  • the detection method further includes: inputting the first image into the second detection model to determine detection information of the second detection object in the first image, wherein the second detection object is a reference of the first detection object Things.
  • the detection result is output based on the detection information of the second detection object and user information.
  • the detection result it may be determined whether to issue an alarm prompt according to the detection result.
  • the terminal camera installed in the elevator can obtain the image of the detected object entering the elevator, analyze its age through the detected object image, and feed the obtained age information back to the elevator control system. If there are only children in the elevator and no adults In the case of circumstance, the alarm message is sent through the alarm device, which can effectively avoid the danger caused by the child riding the elevator alone.
  • the applicable scenarios of the embodiment of the present invention include not only the above-mentioned vehicle internal personnel monitoring and elevator personnel monitoring, but also other scenarios, such as elementary school buses and conference rooms.
  • a deep learning network model can be used to detect whether there is a child in the car, and to determine whether the child is in danger according to the detection result when the door is opened, which can effectively prevent the child from being left in the car. In order to avoid such unintentional tragedies.
  • a child when a child is detected in the car, it can actively send a signal to let the on-board system play music suitable for the child, and give a safety speed limit reminder or window status reminder to help the driver make a reasonable decision.
  • Fig. 2 is a schematic diagram of an optional user information detection system according to an embodiment of the present invention.
  • the detection system may include: an image capture device 21, an analysis device 23, and a result output device 25, wherein,
  • the image capturing device 21 is configured to obtain a first image
  • the analysis device 23 is configured to input the first image into the first detection model to determine user information of the active user in the target area, where the user information includes at least one of the following: user age information, user gender information, and user behavior Information, user facial expression information, user body shape information, user clothing information;
  • the result output device 25 is configured to output the detection result based on user information.
  • the above-mentioned user information detection system may obtain the first image through the image capturing device 21, and then input the first image into the first detection model through the analysis device 23 to determine the user information of the active user in the target area, where the user information It includes at least one of the following: user age information, user gender information, user behavior information, user facial expression information, user body shape information, and user clothing information.
  • the result output device 25 outputs the detection result based on the user information.
  • the detection method is applied to the internal area monitoring of the vehicle, the first image inside the vehicle can be used to analyze the user information of the people in the car, and the age, gender, behavior, expression, body shape, clothing and other information of the people in the car can be analyzed.
  • the detection result can be used to prompt an alarm in time, so as to prevent the child from being forgotten in the car after the owner gets off the car.
  • This method of analyzing image content to obtain user information will not be affected by the environment. It only needs to ensure that the image capture device can work normally. The probability of equipment failure is significantly reduced, and the detection results are not affected.
  • the stability is high, so as to solve the technical problem that the infrared sensor detects the state of the child in the inner area of the vehicle in the related technology, which is easily affected by the environment and causes the equipment to malfunction.
  • the image capturing device is an independent camera device or a camera device integrated with the result output device in one device.
  • the detection system further includes: a location judging device configured to extract image information in the first image after acquiring the first image, and use a detection object detector to determine whether the first detection object exists in the first image ,
  • the first detection object includes at least one of the following: human face, human head, torso, limbs, and human body
  • the deletion unit is set to delete the first image when the first detection object does not exist in the first image
  • the interception unit is configured to intercept the region of interest in the first image where the first detection object is located when the first detection object exists in the first image
  • the image processing device is configured to perform image processing on the region of interest in the first image , To obtain a first inspection image containing a qualified first inspection object.
  • the image processing device includes: a quality evaluation unit configured to perform an initial quality evaluation of the region of interest to obtain an image quality evaluation result, wherein the evaluation content of the initial quality evaluation includes at least one of the following: Image blur evaluation, angle evaluation, position evaluation, and light intensity evaluation; the stop unit is set to stop detecting the first image when the image quality evaluation result indicates that the quality of the first inspection object is unqualified; the first determination unit is set to When the evaluation result indicates that the quality of the first detection object is qualified, the subregion where the first detection object is located in the region of interest is taken as the first detection image.
  • the detection system further includes: a coordinate extraction device configured to extract the coordinates of multiple feature points in the first detection image after taking the sub-region where the first detection object is located in the region of interest as the first detection image; and second The determining unit is configured to determine the center point coordinates of the first detection object in the feature point coordinates.
  • the first detection object may include: eyes, mouth, torso, and limbs; the first mapping unit is configured to be based on the first detection object The center point coordinates map the center point of the first detection object in the first detection image to the center point of the preset first standard image indicating the first detection object, so as to align the first detection image and the first standard image.
  • the first mapping unit includes: a first moving module configured to move the first detected image above the first standard image with the center point of the first detection object of the first standard image as a reference;
  • the module is set to use the center point of the first detection object as a reference, and according to the ratio of the first standard image, reduce or enlarge the first detection image, so that the size of the first detection image and the first standard image are consistent;
  • the rotation module is set To rotate the first detection image to make the detection object of the first detection image and the detection object of the first standard image have the same orientation when the detection object of the first detection image and the detection object of the first standard image have different orientations;
  • the first determination module It is set to determine to map the central point in the first detection image to the central point in the preset first standard image indicating the first detection object after the detection object of the first detection image and the detection object of the first standard image have the same orientation.
  • the detection system further includes: an image acquisition device configured to acquire multiple user images containing different image factors before acquiring the first image, where the image factors include at least one of the following: image scene, illuminance, resolution Rate, user decoration; the image filtering device is set to filter multiple user images to obtain multiple image sample sets corresponding to different user categories, where each user image in each image sample set corresponds to the attribute label of the first detection object And the category identification of the category to which the user belongs; the cropping unit is set to crop each image in the multiple image sample sets to obtain multiple first standard images; the training device is set to set the first standard image of the multiple image sample sets , The attribute label and category identification of the first detection object on each first standard image are input to the initial network model to train the initial network model to obtain the first detection model.
  • the image factors include at least one of the following: image scene, illuminance, resolution Rate, user decoration
  • the image filtering device is set to filter multiple user images to obtain multiple image sample sets corresponding to different user categories, where each user image in each image sample set corresponds
  • the initial network model includes at least: a data layer, a convolutional layer, a pooling layer, an activation layer, a fully connected layer, and an output layer.
  • the training device includes: a first input unit configured to pass the first standard image of the multiple image sample sets, and the attribute label and category identification of the first detection object on each first standard image through the initial
  • the data layer of the network model is input to the training network of the initial network model;
  • the first training unit is set to train the convolutional layer of the initial network model, and the first standard image of the multiple image sample sets is extracted through preset convolution parameters Data features, the first data network feature map is obtained, where the convolution parameters include at least: the first extraction step size, the size of the convolution kernel, and the number of convolution kernels;
  • the second training unit is set to train the pooling of the initial network model Layer, the first data network feature map is down-sampled through preset pooling parameters to obtain the second data network feature map, where the pooling parameters at least include: second extraction step size, pooling size;
  • third training The unit is set to train the activation layer of the initial network model, and perform non-linear change processing on the second data network feature map, where the non-
  • the analysis device includes: an image processing module that inputs the first image to the data network of the first detection model through the data layer of the first detection model; the feature extraction unit is configured to use the convolutional layer of the first detection model, The pooling layer and the activation layer perform image feature extraction on the first image to obtain the multi-dimensional image output vector; the third input unit is set to input the multi-dimensional image output vector to different fully connected layers to obtain the user information evaluation result, and the user The information evaluation result is input and output layer; the output unit is configured to use the output layer of the first detection model to output user information, where the user information includes at least one of the following: user age information, user gender information, user behavior information, and user expression information , User body shape information, user clothing information.
  • the detection system further includes: an image input unit configured to input the first image to the second detection model to determine detection information of the second detection object in the first image, wherein the second detection object is the first The reference object of the detection object.
  • the output detection result is obtained based on the detection information of the second detection object and user information.
  • the user information also includes: user gender, user activity posture, facial expression, and degree of fatigue of the detected object.
  • the application scenario of the detection system includes at least one of the following: vehicle internal personnel monitoring and elevator personnel monitoring.
  • an electronic device including: a processor; and a memory, configured to store executable instructions of the processor; wherein the processor is configured to execute the foregoing by executing the executable instructions Any item of user information detection method.
  • a storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute any one of the above-mentioned user information detection methods when the program is running.
  • This application also provides a computer program product, which when executed on a data processing device, is suitable for executing a program that initializes the following method steps: acquiring a first image; inputting the first image to the first detection model to determine whether User information of active users in the target area, where the user information includes at least one of the following: user age information, user gender information, user behavior information, user expression information, user body shape information, user clothing information; output detection results based on user information .
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • a computer device which can be a personal computer, a server, or a network device, etc.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code .
  • the solution provided by the embodiment of the application can realize user information detection.
  • the technical solution provided by the embodiment of the application can be applied to the image detection device inside the spatial area, and the area to be detected (for example, the interior of the vehicle) is acquired by the image capture device. , Shopping mall elevators, meeting rooms, etc.), and process the first image, identify the detection object information, analyze the user information such as the user’s age, and make a judgment on the user’s age, so that people in the area are in danger When the status is in the state, an alarm prompt is issued in time.
  • the solution of this application to analyze the image content to obtain user information will not be affected by the environment. It only needs to ensure that the image capture device can work normally, the probability of equipment failure is significantly reduced, and the detection result is more stable.
  • This application can automatically analyze the images taken by each image capturing device in a certain space area, analyze whether the presence of persons in the area is in a dangerous state, and realize accurate identification of the information of the persons in the area, improve the degree of danger perception, and reduce the risk of persons in a dangerous state. Probability can effectively prevent children from being forgotten in the car, thereby avoiding such unintentional tragedies.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

一种用户信息的检测方法及系统、电子设备。其中,该方法包括:获取第一图像(S102);将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息(S104);基于用户信息输出检测结果(S106)。

Description

用户信息的检测方法及系统、电子设备
本申请要求于2019年11月22日提交中国专利局、申请号为201911158474.8、申请名称“用户信息的检测方法及系统、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及信息处理技术领域,具体而言,涉及一种用户信息的检测方法及系统、电子设备。
背景技术
相关技术中,随着社会的进步,人们的生活水平得到极大的提高,私家车已经成为普通家庭出行最便捷的交通工具之一,比如工作日的上下班、接送孩子上下学、周末家庭出游等。然而,小孩被遗忘在车中这类的新闻屡见不鲜,如果不能及时的发现,小孩在车中可能会因缺氧而造成晕厥,甚至会危及生命,如果是在炎热的夏季,车内温度很高,很容易产生窒息的危险,这些都会对家庭造成无法挽回的伤害。
当前的检测车内小孩的方式,大多是通过红外传感器,然后将信号通过数模转换传送给单片机的控制器来进行预警操作。然而传感器本身容易受各种热源,光源的干扰,同时传统能力差,人体的红外辐射容易被遮挡,不易被探头接收,当夏天的时候环境温度和人体温度接近时,探测和灵敏度明显下降,有时会造成失灵,这样就无法在小孩发生危险时进行及时救治,导致严重的后果;同时这种通过红外传感器检测的方式,对传感器的安装位置、灵敏度等也有较高的要求,需要花费很大的代价才能完成安装,而且预警效果较差。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本公开实施例提供了一种用户信息的检测方法及系统、电子设备,以至少解决相关技术中通过红外传感器检测车辆内部区域的孩子状态,容易受到环境影响,造成设备失灵的技术问题。
根据本发明实施例的一个方面,提供了一种用户信息的检测方法,包括:获取第 一图像;将所述第一图像输入至第一检测模型,以确定在所述目标区域内活动用户的用户信息,其中,所述用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息;基于所述用户信息输出检测结果。
可选地,在获取第一图像之后,所述检测方法还包括:提取所述第一图像中的图像信息,判断所述第一图像中是否存在第一检测对象,其中,第一检测对象包括下述至少之一:人脸、人头、躯干、四肢、人体;若所述第一图像中不存在所述第一检测对象,删除所述第一图像;或者,若所述第一图像中存在所述第一检测对象,截取所述第一图像中第一检测对象所在的感兴趣区域;对所述第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像。
可选地,对所述第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像包括:对所述感兴趣区域进行初始质量评估,得到图像质量评估结果,其中,所述初始质量评估的评估内容包括下述至少之一:图像模糊评估、角度评估、位置评估、光照强度评估;若所述图像质量评估结果指示所述第一检测对象质量不合格,则停止检测所述第一图像;若所述图像质量评估结果指示所述第一检测对象质量合格,将所述感兴趣区域中第一检测对象所在的子区域作为所述第一检测图像。
可选地,在将所述感兴趣区域中第一检测对象所在的子区域作为所述第一检测图像之后,所述检测方法还包括:提取所述第一检测图像中多个特征点坐标;确定所述特征点坐标中指示第一检测对象的中心点坐标;基于所述第一检测对象的中心点坐标,将所述第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示所述第一检测对象的中心点,以对齐所述第一检测图像和所述第一标准图像。
可选地,将所述第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示所述第一检测对象的中心点包括:以所述第一标准图像的第一检测对象的中心点为基准,将所述第一检测图像移动至所述第一标准图像上方;以所述第一检测对象的中心点为基准,按照所述第一标准图像的比例,缩小或扩大所述第一检测图像,以使所述第一检测图像与所述第一标准图像的大小一致;若所述第一检测图像的检测对象与所述第一标准图像的检测对象朝向不一致,则旋转所述第一检测图像,使所述第一检测图像的检测对象和第一标准图像检测对象朝向一致;在所述第一检测图像的检测对象和第一标准图像检测对象朝向一致后,确定将所述第一检测图像中的中心点映射至所述第一标准图像中指示所述第一检测对象的中心点。
可选地,在获取第一图像之前,所述检测方法还包括:采集包含不同图像因素的 多张用户图像,其中,所述图像因素包括下述至少之一:图像场景、光照度、分辨率、用户装饰;过滤所述多张用户图像,得到对应不同用户类别的多份图像样本集,其中,每份所述图像样本集中的每张用户图像对应有第一检测对象的属性标签和用户所属类别的类别标识;对所述多份图像样本集中的每张图像进行裁剪,得到多张第一标准图像;将所述多份图像样本集的第一标准图像、每张所述第一标准图像上的第一检测对象的属性标签和类别标识输入至初始网络模型,以对所述初始网络模型进行训练,得到所述第一检测模型。
可选地,所述初始网络模型至少包括:数据层、卷积层、池化层、激活层、全连接层和输出层。
可选地,对所述初始网络模型进行训练,得到所述第一检测模型的步骤,包括:将所述多份图像样本集的第一标准图像、每张所述第一标准图像上的第一检测对象的属性标签和类别标识通过所述初始网络模型的数据层输入至所述初始网络模型的训练网络中;训练所述初始网络模型的卷积层,通过预先设置的卷积参数提取所述多份图像样本集的第一标准图像的数据特征,得到第一数据网络特征图,其中,所述卷积参数至少包括:第一提取步长、卷积核尺寸和卷积核个数;训练所述初始网络模型的池化层,通过预先设置的池化参数对所述第一数据网络特征图进行下采样处理,得到第二数据网络特征图,其中,所述池化参数至少包括:第二提取步长、池化尺寸;训练所述初始网络模型的激活层,对所述第二数据网络特征图进行非线性变化处理,其中,所述非线性变化处理的方式包括使用下述至少之一的激活函数:relu激活函数、prelu激活函数、relu6激活函数;训练所述初始网络模型的全连接层,连接所述第一数据网络特征图和所述第二数据网络特征图,并通过预设的特征权重将特征图中的特征空间通过线性变换映射至标识空间,其中,所述标识空间设置为记录第一检测对象的属性标签和类别标识;训练所述初始网络模型的输出层,输出与每张第一标准图像对应的分类结果,其中,所述分类结果指示与所述第一标准图像中的检测对象对应的类别。
可选地,将所述第一图像输入至第一检测模型,以确定在所述目标区域内活动用户的用户信息的步骤,包括:通过所述第一检测模型的数据层将所述第一图像输入至所述第一检测模型的数据网络;利用所述第一检测模型的卷积层、池化层和激活层对所述第一图像进行图像特征抽取,得到多维图像输出向量;将所述多维图像输出向量输入至不同的全连接层,得到用户信息评估结果,并将所述用户信息评估结果输入输出层;利用所述第一检测模型的输出层输出用户信息。
可选地,所述检测方法还包括:将所述第一图像输入至第二检测模型,以确定在所述第一图像内第二检测对象的检测信息,其中,所述第二检测对象为所述第一检测 对象的参照物。
可选地,基于所述第二检测对象的检测信息和所述用户信息获取输出检测结果。
可选地,所述用户信息还包括:用户性别、用户活动姿态、表情、检测对象疲惫程度。
可选地,所述检测方法的应用场景包括下述至少之一:车辆内部人员监测、电梯人员监测。
可选地,根据检测结果确定是否发出报警提示。
根据本发明实施例的另一方面,还提供了一种用户信息的检测系统,包括:图像捕获装置,设置为获取第一图像;分析装置,设置为将所述第一图像输入至第一检测模型,以确定在所述目标区域内活动用户的用户信息,其中,所述用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息;结果输出装置,设置为基于所述用户信息输出检测结果。
可选地,所述图像捕获装置为独立的摄像装置或与结果输出装置集成在一个设备中的摄像装置。
可选地,所述检测系统还包括:部位判断装置,设置为在获取第一图像之后,提取所述第一图像中的图像信息,使用检测对象检测器判断所述第一图像中是否存在第一检测对象,其中,第一检测对象包括下述至少之一:人脸、人头、躯干、四肢、人体;删除单元,设置为在所述第一图像中不存在所述第一检测对象时,删除所述第一图像;或者,截取单元,设置为在所述第一图像中存在所述第一检测对象时,截取所述第一图像中第一检测对象所在的感兴趣区域;图像处理装置,设置为对所述第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像。
可选地,所述图像处理装置包括:质量评估单元,设置为对所述感兴趣区域进行初始质量评估,得到图像质量评估结果,其中,所述初始质量评估的评估内容包括下述至少之一:图像模糊评估、角度评估、位置评估、光照强度评估;停止单元,设置为在所述图像质量评估结果指示所述第一检测对象质量不合格时,停止检测所述第一图像;第一确定单元,设置为在所述图像质量评估结果指示所述第一检测对象质量合格时,将所述感兴趣区域中第一检测对象所在的子区域作为所述第一检测图像。
可选地,所述检测系统还包括:坐标提取装置,设置为在将所述感兴趣区域中第一检测对象所在的子区域作为所述第一检测图像之后,提取所述第一检测图像中多个特征点坐标;第二确定单元,设置为确定所述特征点坐标中指示第一检测对象的中心 点坐标;第一映射单元,设置为基于所述第一检测对象的中心点坐标,将所述第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示所述第一检测对象的中心点,以对齐所述第一检测图像和所述第一标准图像。
可选地,所述第一映射单元包括:第一移动模块,设置为以所述第一标准图像的第一检测对象的中心点为基准,将所述第一检测图像移动至所述第一标准图像上方;对齐模块,设置为以所述第一检测对象的中心点为基准,按照所述第一标准图像的比例,缩小或扩大所述第一检测图像,以使所述第一检测图像与所述第一标准图像的大小一致;旋转模块,设置为在所述第一检测图像的检测对象与所述第一标准图像的检测对象朝向不一致时,旋转所述第一检测图像,使所述第一检测图像的检测对象和第一标准图像检测对象朝向一致;第一确定模块,设置为在所述第一检测图像的检测对象和第一标准图像检测对象朝向一致后,确定将所述第一检测图像中的中心点映射至所述第一标准图像中指示所述第一检测对象的中心点。
可选地,所述检测系统还包括:图像采集装置,设置为在获取第一图像之前,采集包含不同图像因素的多张用户图像,其中,所述图像因素包括下述至少之一:图像场景、光照度、分辨率、用户装饰;图像过滤装置,设置为过滤所述多张用户图像,得到对应不同用户类别的多份图像样本集,其中,每份所述图像样本集中的每张用户图像对应有第一检测对象的属性标签和用户所属类别的类别标识;裁剪单元,设置为对所述多份图像样本集中的每张图像进行裁剪,得到多张第一标准图像;训练装置,设置为将所述多份图像样本集的第一标准图像、每张所述第一标准图像上的第一检测对象的属性标签和类别标识输入至初始网络模型,以对所述初始网络模型进行训练,得到所述第一检测模型。
可选地,所述初始网络模型至少包括:数据层、卷积层、池化层、激活层、全连接层和输出层。
可选地,所述训练装置包括:第一输入单元,设置为将所述多份图像样本集的第一标准图像、每张所述第一标准图像上的第一检测对象的属性标签和类别标识通过所述初始网络模型的数据层输入至所述初始网络模型的训练网络中;第一训练单元,设置为训练所述初始网络模型的卷积层,通过预先设置的卷积参数提取所述多份图像样本集的第一标准图像的数据特征,得到第一数据网络特征图,其中,所述卷积参数至少包括:第一提取步长、卷积核尺寸和卷积核个数;第二训练单元,设置为训练所述初始网络模型的池化层,通过预先设置的池化参数对所述第一数据网络特征图进行下采样处理,得到第二数据网络特征图,其中,所述池化参数至少包括:第二提取步长、池化尺寸;第三训练单元,设置为训练所述初始网络模型的激活层,对所述第二数据 网络特征图进行非线性变化处理,其中,所述非线性变化处理的方式包括使用下述至少之一的激活函数:relu激活函数、prelu激活函数、relu6激活函数;第四训练单元,设置为训练所述初始网络模型的全连接层,连接所述第一数据网络特征图和所述第二数据网络特征图,并通过预设的特征权重将特征图中的特征空间通过线性变换映射至标识空间,其中,所述标识空间设置为记录第一检测对象的属性标签和类别标识;第五训练单元,设置为训练所述初始网络模型的输出层,输出与每张第一标准图像对应的分类结果,其中,所述分类结果设置为指示与所述第一标准图像中的检测对象对应的类别。
可选地,所述分析装置包括:图像处理模块,设置为通过所述第一检测模型的数据层将所述第一图像输入至所述第一检测模型的数据网络;特征抽取单元,设置为利用所述第一检测模型的卷积层、池化层和激活层对所述第一图像进行图像特征抽取,得到多维图像输出向量;第三输入单元,设置为将所述多维图像输出向量输入至不同的全连接层,得到用户信息评估结果,并将所述用户信息评估结果输入输出层;用户信息输出单元,设置为利用所述第一检测模型的输出层输出用户信息。
可选地,所述检测系统还包括:图像输入单元,设置为将所述第一图像输入至第二检测模型,以确定在所述第一图像内第二检测对象的检测信息,其中,所述第二检测对象为所述第一检测对象的参照物。
可选地,基于所述第二检测对象的检测信息和所述用户信息获取输出检测结果。
可选地,所述用户信息还包括:用户性别、用户活动姿态、表情、检测对象疲惫程度。
可选地,所述检测系统的应用场景包括下述至少之一:车辆内部人员监测、电梯人员监测。
可选地,根据检测结果确定是否发出报警提示。
根据本发明实施例的另一方面,还提供了一种电子设备,包括:处理器;以及存储器,设置为存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的用户信息的检测方法。
根据本发明实施例的另一方面,还提供了一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行上述中任意一项所述的用户信息的检测方法。
在本发明实施例中,采用先获取第一图像,然后将第一图像输入至第一检测模型, 以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息,最后基于用户信息输出检测结果。在该实施例中,若是检测方法应用于车辆内部区域监测,可以利用车辆内部第一图像分析车内人员的用户信息,对车内的人员的年龄、性别、行为、表情、体型、服饰等信息进行分析,分析出有关车内人员的检测结果,例如在车内有小孩被遗留的情况下,可以及时通过检测结果发出报警提示,从而达到预防小孩在车主下车后仍然被遗忘在车上的情况,减少车内人员发生危险的概率,这种分析图像内容得到用户信息的方案不会受到环境影响,仅需要保证图像拍摄装置能够正常工作即可,设备发生故障的概率明显降低,检测结果的稳定性较高,从而解决相关技术中通过红外传感器检测车辆内部区域的孩子状态,容易受到环境影响,造成设备失灵的技术问题。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明设置为解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的一种可选的用户信息的检测方法的流程图;
图2是根据本发明实施例的一种可选的用户信息的检测系统的示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
根据本发明实施例,提供了一种用户信息的检测方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本发明实施例通过图像拍摄装置获取到待检测区域(例如,车辆内部、商场电梯、会议室等)的第一图像,并对第一图像进行处理,识别出检测对象信息,分析得到用户的年龄等用户信息,对用户年龄做出判断,从而在区域内人员出现危险状态时,及时发出报警提示。
图1是根据本发明实施例的一种可选的用户信息的检测方法的流程图,如图1所示,该方法包括如下步骤:
步骤S102,获取第一图像;
步骤S104,将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息。
本发明实施例使用的第一检测模型和下述的第二检测模型的种类包括但不限于:卷积神经网络CNN(Convolutional Neural Networks,简称CNN),通过CNN识别位移、缩放及其他形式扭曲不变性的二维图形。由于CNN的特征检测层通过训练数据进行学习,所以在使用CNN时,避免了显式的特征抽取,而隐式地从训练数据中进行学习;再者由于同一特征映射面上的神经元权值相同,所以网络可以并行学习,这也是卷积网络相对于神经元彼此相连网络的一大优势。卷积神经网络以其局部权值共享的特殊结构在图像处理方面有着独特的优越性,其布局更接近于实际的生物神经网络,权值共享降低了网络的复杂性,特别是多维输入向量的图像可以直接输入网络这一特点避免了特征提取和分类过程中数据重建的复杂度。
步骤S106,基于用户信息输出检测结果。
通过上述步骤,可以采用先获取第一图像,然后将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,最后基于用户信息输出检测结果。在该实施例中,若是检测方法应用于车辆内部区域监测,可以利用车辆内部第一图像分析车内人员的用户信息,对车内的人员的年龄、性别、行为、表情、体型、服饰等信息进行分析,分析出有关车内人员的检测结果,例如在车内有小孩被遗留的情况下,可以及时通过检测结果发出报警提示,从而达到预防小孩在车主下车后仍然被遗忘在车上的情况,减少车内人员发生危险的概率,这种分析图像内容得到用户信息的方案不会 受到环境影响,仅需要保证图像拍摄装置能够正常工作即可,设备发生故障的概率明显降低,检测结果的稳定性较高,从而解决相关技术中通过红外传感器检测车辆内部区域的孩子状态,容易受到环境影响,造成设备失灵的技术问题。
下面结合各步骤对本发明实施例进行详细说明。对于本发明实施例中涉及的第一检测模型和第二检测模型,可以优先使用第一检测模型对用户进行检测,同时可以使用第二检测模型对用户进行辅助检测。
对第一检测模型进行说明。
本发明实施例中会预先训练出第一检测模型,通过第一检测模型对拍摄到第一图像进行分析,得到包含用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息等的用户信息。
作为本发明可选的实施例,在获取第一图像之前,需要对第一检测模型进行训练,包括:采集包含不同图像因素的多张用户图像,其中,图像因素包括下述至少之一:图像场景、光照度、分辨率、用户装饰;过滤多张用户图像,得到对应不同用户类别的多份图像样本集,其中,每份图像样本集中的每张用户图像对应有第一检测对象的属性标签和用户所属类别的类别标识;对多份图像样本集中的每张图像进行裁剪,得到多张第一标准图像;将多份图像样本集的第一标准图像、每张第一标准图像上的第一检测对象的属性标签和类别标识输入至初始网络模型,以对初始网络模型进行训练,得到第一检测模型。
本发明实施例中采集的用户图像可以为二维图像或者三维图像,图像可以是从多个角度拍摄的,图像拍摄装置(例如,摄像头)可以安装在目标区域的任意一个位置。而在训练网络模型时,使用的图像可以是包含多个图像因素的图像。
本发明实施例以分析用户类别为用户信息的示意说明,对于检测对象年龄的分析,假设以检测对象的表观属性作为分析结果。由于相近年龄的检测对象之间的差异性较小,区分能力较差,因此将年龄按照类别的方式进行划分,为了能够得到准确的类别区分结果以满足本发明的应用。因此按照表观年龄可以将年龄划分为多个不同类别,例如,将人员划分为3个不同的类别(婴幼儿0-5岁,儿童6-15岁,其他16岁+)。
在确定类别的划分方式后,需要建立大规模的图像训练样本库,采集不同图像因素的多张用户图像,训练多标签网络模型(例如卷积神经网络分类模型)。通过各种渠道收集到的图像素材,包括了不同场景,不同光照,不同分辨率,不同装饰等。在训练之前,可以通过终端或者用户手动过滤掉大角度,模糊,光照条件较差及分辨率较低的样本。训练样本集中包含上述3个类别,且保证3个类别的素材都要涵盖且分布 均衡,保证每个单独类都有较多的待训练用户图像。对每一张待训练的用户图像样本进行标记,类别标签为(0-5岁:0;6-15岁:1;16岁+:2)。以人脸检测为例(还可以通过座位、人体四肢、躯干等定位),可以通过眼睛点定位,平移,旋转,缩放等一系列的变换操作,将第一检测对象的图像裁剪为标准的多通道图像(例如,裁剪为60×60大小的RGB三个通道的彩色图像)。为了扩充训练样本,增强训练模型的鲁棒性,对待训练的用户图像样本进行了一系列的操作,如水平方向或垂直方向的平移,不同尺度的拉伸等。
将处理后的训练样本集输入到初始网络模型中进行多任务分类模型训练。初始网络模型各层包括数据层、卷积层、池化层、激活层、全连接层和输出层,除了数据层和输出层,中间各层输入均为前一层的输出,输出为下一层的输入。
可选的,本发明实施例中的训练方式为梯度下降法和反向传播算法。
作为本发明可选的实施例,对初始网络模型进行训练,得到第一检测模型的步骤,包括:将多份图像样本集的第一标准图像、每张第一标准图像上的第一检测对象的属性标签和类别标识通过初始网络模型的数据层输入至初始网络模型的训练网络中;训练初始网络模型的卷积层,通过预先设置的卷积参数提取多份图像样本集的第一标准图像的数据特征,得到第一数据网络特征图,其中,卷积参数至少包括:第一提取步长、卷积核尺寸和卷积核个数;训练初始网络模型的池化层,通过预先设置的池化参数对第一数据网络特征图进行下采样处理,得到第二数据网络特征图,其中,池化参数至少包括:第二提取步长、池化尺寸;训练初始网络模型的激活层,对第二数据网络特征图进行非线性变化处理,其中,非线性变化处理的方式包括使用下述至少之一的激活函数:relu激活函数、prelu激活函数、relu6激活函数;训练初始网络模型的全连接层,连接第一数据网络特征图和第二数据网络特征图,并通过预设的特征权重将特征图中的特征空间通过线性变换映射至标识空间,其中,标识空间设置为记录第一检测对象的属性标签和类别标识;训练初始网络模型的输出层,输出与每张第一标准图像对应的分类结果,其中,分类结果设置为指示与第一标准图像中的检测对象对应的类别。
即在训练初始网络模型时,可以先将处理好的第一标准图像(如尺寸为60×60的检测对象样本图像)和第一检测对象的属性标签通过数据层输入到训练网络中;然后利用卷积层通过设置好的步长、卷积核尺寸、卷积核个数来提取数据特征,池化层通过设置好的步长、池化尺寸对前一层特征图(feature map)进行下采样,激活层对前一层特征图进行非线性变化,本方案中可以采用relu激活函数等激活函数;之后全连接层连接所有的特征图,通过权重将特征空间通过线性变换映射到标识空间,全连 接后接relu激活函数;最后输出层是对特征图进行分类和回归,可选的,本发明实施例采用softmax函数作为类别分类。
通过已经分类好的用户图像以及第一检测对象的属性标签不断训练初始网络模型,不断提高网络模型的分析准确度,扩展网络模型面对的图像类型和各种图像内容,为后续拍摄第一图像,分析第一图像做准备。
在训练网络模型阶段,所有训练样本的都将输入至初始网络模型(如卷积神经网络),并且通过损失函数计算输出结果与实际标签的差距。这个过程被称为“正向传递”Forward。然后,根据输出结果与实际标签的差异,确定初始网络模型参数的误差度,对模型参数进行更新,从而进行神经网络学习,这个过程被称为“反向传递”Backward。通过调整初始网络模型中每层的权重值,使得模型的输出值与实际样本标签值之间的差距越来越小,直到网络模型的输出值与实际标签值一致或保持最小差距不再变化,则最终得到所需要的第一检测模型。
可选的,本发明实施例对于用户信息中用户年龄计算时,年龄误差的代价函数为Loss=L_Age,通过该Loss进行反向传播,对网络模型的参数进行调整直至收敛。
对于每种应用场景,例如,车辆内部区域,采用车内场景的素材对上述收敛的网络进行微调训练,素材的标记和裁剪同上操作;采用实际应用场景的素材对上述通用的网络模型进行微调,网络保持前面共用的特征提取层(指全连接层前面的网络层)的参数不变,全连接层的学习率不为0,就可以继续学习得到微调训练后的全连接层新的参数,从而经过迭代训练得到更高的精度,其中,保持参数不变的方法是将对应层的学习率设置为0。
通过微调训练,可以在素材量较少的情况下,利用其它场景的素材作为预训练用于训练特征提取器,然后利用少量的素材微调达到更快的收敛速度和较高的精度。
通过上述实施方式,训练好第一检测模型,可以在各种实际运行环境中应用该第一检测模型。
其次,对于第二检测模型
该第二检测模型可以理解为第一检测对象的参照物所训练得到的模型,通过第二检测模型可以辅助确定检测对象的信息,例如,对车辆座位、椅背、车辆内放置衣物等进行检测,从而辅助确定用户信息。
本发明实施例中,可以利用深度学习对人体和座椅检测判断是否为儿童或成人的方案。在判断时,可以包括:将第一图像输入至第二检测网络(例如,座椅检测网络、 椅背检测网络、衣物检测网络),输出座椅和/或椅背的位置检测结果;将图像输入第一检测网络(人体检测网络),判断图像中是否包含人体并输出人体检测结果;根据人体检测结果,结合座椅和/或椅背的位置检测结果,判断是否为儿童或成人。
对座椅和/或椅背的检测,可以用于辅助判断人体的体型、人头所在的特定区域等,排除随意放置的衣物的干扰,提高检测的准确度。
在得到第一检测模型和第二检测模型后,可以将最新采集的图像输入至模型中进行实时检测、分析判断,得到检测结果。
步骤S102,获取第一图像。
作为本发明可选的实施例,在获取第一图像之后,检测方法还包括:提取第一图像中的图像信息,使用检测对象检测器判断第一图像中是否存在第一检测对象,其中,第一检测对象包括下述至少之一:人脸、人头、躯干、四肢、人体;若第一图像中不存在第一检测对象,则删除第一图像;或者,若第一图像中存在第一检测对象,截取第一图像中第一检测对象所在的感兴趣区域;对第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像。
上述感兴趣区域可以是指过滤掉不含掉人物之外的图像后的区域,主要针对图像中人物所在的区域进行检测。
本发明实施例中,可以通过RGB摄像头、拍摄模块或者红外摄像头等拍摄图像,得到第一图像,提取的图像信息中可以包含但不限于:RGB颜色信息、深度信息。对检测对象图像彩色通道颜色信息进行识别能够有效提升识别率,即提高检测对象属性分析的准确度。
本申请在检测时,无论是对于成人、小孩、老人都可以进行检测,检测方式多样化,包括对人体体型进行初步检测,以判断是否为小孩、成人等;或者通过年龄段或者身高检测,可以判断是否为小孩;还可以通过四肢等行为信息,通过动作频率等辅助判断是否小孩。另外,还可以通过服饰大小、颜色等判断性别,并判断是否为小孩。通过各种信息综合判别人物的类型。
可选的,使用检测对象检测器或者检测对象判断模型检测第一图像中感兴趣区域中第一检测对象所在的区域,排除没有检测对象的第一图像,可以得到包含检测对象的矩形或者其它规则的检测对象图像。
在本发明实施例中,对第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像包括:对感兴趣区域进行初始质量评估,得到图像质量 评估结果,其中,初始质量评估的评估内容包括下述至少之一:图像模糊评估、角度评估、位置评估、光照强度评估;若图像质量评估结果指示第一检测对象质量不合格,则停止检测第一图像;若图像质量评估结果指示第一检测对象质量合格,将感兴趣区域中第一检测对象所在的子区域作为第一检测图像。
上述实施例,对检测对象检测得到的矩形区域,做检测对象质量评估,将模糊、大角度、小尺寸、检测对象框严重偏离、光照不足的检测对象判定为质量不合格。对评估结果为不合格的第一检测对象返回不合格状态,并停止检测。对评估结果为合格的第一检测对象,执行下一步计算。
作为本发明可选的实施例,在将感兴趣区域中第一检测对象所在的子区域作为第一检测图像之后,检测方法还包括:提取第一检测图像中多个特征点坐标;确定特征点坐标中指示第一检测对象的中心点坐标,例如,第一检测对象可以包括:眼睛、嘴巴、躯干、四肢;基于第一检测对象的中心点坐标,将第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示第一检测对象的中心点,以对齐第一检测图像和第一标准图像。
即可以通过特征点定位得到第一检测对象的轮廓和关键点坐标,第一检测对象的中心点可以包括但不限于:眼睛中心点、嘴角点、眼角点、鼻尖中心点、躯干中心部位、四肢中心部位。使用的特征点定位方案可以是基于landmark库得到的。
本申请中,为了提高各个中心点坐标(如眼睛中心点、嘴角点、眼角点、鼻尖中心点、躯干中心部位、四肢中心部位)的计算精度,可以针对中心点周围信息进行辅助定位,例如,为了提高眼睛中心点坐标的计算精度,可以采用眼周的上下左右和眼睛中心点加权得到眼睛中心点坐标。当然,也可以单独采用定位出来的眼睛中心点。
通过上述实施例,对筛选后的检测对象图像,做特征点定位,得到第一检测对象中心点坐标,然后可以利用仿射变换对齐和调整(主要是通过移动,缩放,翻转,旋转等一系列的变换)将待检测对象图像与标准检测对象图中的第一检测对象中心点调整到与标准检测对象图相对应的位置,经仿射变换得到与训练样本相同尺寸的三通道彩色图。
在本发明实施例中,将第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示第一检测对象的中心点包括:以第一标准图像的第一检测对象的中心点为基准,将第一检测图像移动至第一标准图像上方;以第一检测对象的中心点为基准,按照第一标准图像的比例,缩小或扩大第一检测图像,以使第一检测图像与第一标准图像的大小一致;若第一检测图像的检测对象与第一标准图像的检测对象朝向不 一致,则旋转第一检测图像,使第一检测图像的检测对象和第一标准图像检测对象朝向一致;在第一检测图像的检测对象和第一标准图像检测对象朝向一致后,确定将第一检测图像中的中心点映射至预设的第一标准图像中指示第一检测对象的中心点。
步骤S104,将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息。
可选的,将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息的步骤,包括:通过第一检测模型的数据层将第一图像输入至第一检测模型的数据网络;利用第一检测模型的卷积层、池化层和激活层对第一图像进行图像特征抽取,得到多维图像输出向量;将多维图像输出向量输入至不同的全连接层,得到用户信息评估结果,并将用户信息评估结果输入输出层;利用第一检测模型的输出层输出用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息。
在本发明实施例,得到的多通道的彩色第一图像输入到预先训练得到的第一检测模型中进行计算,网络结构顺序可以为数据层->N个小组层->K个全连接层+输出层,其中N大于等于1。小组层包括卷积层、池化层、激活层。将所述多个通道的彩色图像输入到第一检测模型的特征抽取层,得到多维输出向量,将所述多维输出向量,输入到所述神经网络的不同全连接层,得到类别的输出结果。
本发明实施例除了可以识别出区域内用户的类别,还可以对用户的性别、行为、表情等进行识别,例如,分析用户在开车时是否有困意,若是在高速上能够及时发出危险提示,可以避免用户长时间开车出现疲劳造成的危险。
步骤S106,基于用户信息输出检测结果。
可选的,本发明实施例中用户信息的检测方法的应用场景包括下述至少之一:车辆内部人员监测、电梯人员监测。
以车辆内部人员监测为例,可以通过拍摄装置(如安装在车内的高清摄像头)获取进入车内的乘客检测对象图像数据,将检测对象图像数据输入到预先训练的第一检测模型中,输出检测对象类别信息,根据分析结果反馈给车载系统,这样车载系统就能智能的调整系统参数,并且驾驶员在行车和停车时根据系统提示做出合理的决策。
在本发明实施例,检测方法还包括:将第一图像输入至第二检测模型,以确定在第一图像内第二检测对象的检测信息,其中,第二检测对象为第一检测对象的参照物。
可选的,基于第二检测对象的检测信息和用户信息输出检测结果。
可选的,在得到检测结果后,可以根据检测结果确定是否发出报警提示。
以电梯小孩监测为例,安装在电梯内的终端拍摄装置可以获取进入电梯的检测对象图像,通过检测对象图像分析其年龄,将得到的年龄信息反馈给电梯控制系统,如果电梯内只有孩子没有大人的情况,通过报警装置发出报警信息,这样能有效避免孩子独自搭乘电梯造成的危险。
本发明实施例可以应用的场景不仅包括上述的车辆内部人员监测和电梯人员监测,还可以应用于其他场景中,例如,小学生校车、会议室。
本发明实施例中,可以通过深度学习的网络模型,检测出车上是否有小孩,并且在车门打开的时候根据检测结果确定孩子是否有危险,能够有效的避免小孩被遗忘在车中的情况,从而避免这类无意之中造成的悲剧。同时在检测到车上有小孩的时候,可以主动发出信号,让车载系统播放适合小孩的音乐,并且给出安全限速提示或者车窗状态提示,帮助司机做出合理的决策。
图2是根据本发明实施例的一种可选的用户信息的检测系统的示意图,如图2所示,该检测系统可以包括:图像捕获装置21,分析装置23,结果输出装置25,其中,
图像捕获装置21,设置为获取第一图像;
分析装置23,设置为将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息;
结果输出装置25,设置为基于用户信息输出检测结果。
上述用户信息的检测系统,可以通过图像捕获装置21获取第一图像,然后通过分析装置23将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息,最后通过结果输出装置25基于用户信息输出检测结果。在该实施例中,若是检测方法应用于车辆内部区域监测,可以利用车辆内部第一图像分析车内人员的用户信息,对车内的人员的年龄、性别、行为、表情、体型、服饰等信息进行分析,分析出有关车内人员的检测结果,例如在车内有小孩被遗留的情况下,可以及时通过检测结果发出报警提示,从而达到预防小孩在车主下车后仍然被遗忘在车上的情况,减少车内人员发生危险的概率,这种分析图像内容 得到用户信息的方案不会受到环境影响,仅需要保证图像拍摄装置能够正常工作即可,设备发生故障的概率明显降低,检测结果的稳定性较高,从而解决相关技术中通过红外传感器检测车辆内部区域的孩子状态,容易受到环境影响,造成设备失灵的技术问题。
可选的,图像捕获装置为独立的摄像装置或与结果输出装置集成在一个设备中的摄像装置。
另一种可选的,检测系统还包括:部位判断装置,设置为在获取第一图像之后,提取第一图像中的图像信息,使用检测对象检测器判断第一图像中是否存在第一检测对象,其中,第一检测对象包括下述至少之一:人脸、人头、躯干、四肢、人体;删除单元,设置为在第一图像中不存在第一检测对象时,删除第一图像;或者,截取单元,设置为在第一图像中存在第一检测对象时,截取第一图像中第一检测对象所在的感兴趣区域;图像处理装置,设置为对第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像。
作为本发明可选的实施例,图像处理装置包括:质量评估单元,设置为对感兴趣区域进行初始质量评估,得到图像质量评估结果,其中,初始质量评估的评估内容包括下述至少之一:图像模糊评估、角度评估、位置评估、光照强度评估;停止单元,设置为在图像质量评估结果指示第一检测对象质量不合格时,停止检测第一图像;第一确定单元,设置为在图像质量评估结果指示第一检测对象质量合格时,将感兴趣区域中第一检测对象所在的子区域作为第一检测图像。
可选的,检测系统还包括:坐标提取装置,设置为在将感兴趣区域中第一检测对象所在的子区域作为第一检测图像之后,提取第一检测图像中多个特征点坐标;第二确定单元,设置为确定特征点坐标中指示第一检测对象的中心点坐标,例如,第一检测对象可以包括:眼睛、嘴巴、躯干、四肢;第一映射单元,设置为基于第一检测对象的中心点坐标,将第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示第一检测对象的中心点,以对齐第一检测图像和第一标准图像。
在本发明实施例中,第一映射单元包括:第一移动模块,设置为以第一标准图像的第一检测对象的中心点为基准,将第一检测图像移动至第一标准图像上方;对齐模块,设置为以第一检测对象的中心点为基准,按照第一标准图像的比例,缩小或扩大第一检测图像,以使第一检测图像与第一标准图像的大小一致;旋转模块,设置为在第一检测图像的检测对象与第一标准图像的检测对象朝向不一致时,旋转第一检测图像,使第一检测图像的检测对象和第一标准图像检测对象朝向一致;第一确定模块,设置为在第一检测图像的检测对象和第一标准图像检测对象朝向一致后,确定将第一 检测图像中的中心点映射至预设的第一标准图像中指示第一检测对象的中心点。
可选的,检测系统还包括:图像采集装置,设置为在获取第一图像之前,采集包含不同图像因素的多张用户图像,其中,图像因素包括下述至少之一:图像场景、光照度、分辨率、用户装饰;图像过滤装置,设置为过滤多张用户图像,得到对应不同用户类别的多份图像样本集,其中,每份图像样本集中的每张用户图像对应有第一检测对象的属性标签和用户所属类别的类别标识;裁剪单元,设置为对多份图像样本集中的每张图像进行裁剪,得到多张第一标准图像;训练装置,设置为将多份图像样本集的第一标准图像、每张第一标准图像上的第一检测对象的属性标签和类别标识输入至初始网络模型,以对初始网络模型进行训练,得到第一检测模型。
可选的,初始网络模型至少包括:数据层、卷积层、池化层、激活层、全连接层和输出层。
另一种可选的,训练装置包括:第一输入单元,设置为将多份图像样本集的第一标准图像、每张第一标准图像上的第一检测对象的属性标签和类别标识通过初始网络模型的数据层输入至初始网络模型的训练网络中;第一训练单元,设置为训练初始网络模型的卷积层,通过预先设置的卷积参数提取多份图像样本集的第一标准图像的数据特征,得到第一数据网络特征图,其中,卷积参数至少包括:第一提取步长、卷积核尺寸和卷积核个数;第二训练单元,设置为训练初始网络模型的池化层,通过预先设置的池化参数对第一数据网络特征图进行下采样处理,得到第二数据网络特征图,其中,池化参数至少包括:第二提取步长、池化尺寸;第三训练单元,设置为训练初始网络模型的激活层,对第二数据网络特征图进行非线性变化处理,其中,非线性变化处理的方式包括使用下述至少之一的激活函数:relu激活函数、prelu激活函数、relu6激活函数;第四训练单元,设置为训练初始网络模型的全连接层,连接第一数据网络特征图和第二数据网络特征图,并通过预设的特征权重将特征图中的特征空间通过线性变换映射至标识空间,其中,标识空间设置为记录第一检测对象的属性标签和类别标识;第五训练单元,设置为训练初始网络模型的输出层,输出与每张第一标准图像对应的分类结果,其中,分类结果设置为指示与第一标准图像中的检测对象对应的类别。
可选的,分析装置包括:图像处理模块,通过第一检测模型的数据层将第一图像输入至第一检测模型的数据网络;特征抽取单元,设置为利用第一检测模型的卷积层、池化层和激活层对第一图像进行图像特征抽取,得到多维图像输出向量;第三输入单元,设置为将多维图像输出向量输入至不同的全连接层,得到用户信息评估结果,并将用户信息评估结果输入输出层;输出单元,设置为利用第一检测模型的输出层输出 用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息。
可选的,检测系统还包括:图像输入单元,设置为将第一图像输入至第二检测模型,以确定在第一图像内第二检测对象的检测信息,其中,第二检测对象为第一检测对象的参照物。
可选地,基于第二检测对象的检测信息和用户信息获取输出检测结果。
可选的,用户信息还包括:用户性别、用户活动姿态、表情、检测对象疲惫程度。
可选的,检测系统的应用场景包括下述至少之一:车辆内部人员监测、电梯人员监测。
可选的,根据检测结果确定是否发出报警提示。
根据本发明实施例的另一方面,还提供了一种电子设备,包括:处理器;以及存储器,设置为存储处理器的可执行指令;其中,处理器配置为经由执行可执行指令来执行上述任意一项的用户信息的检测方法。
根据本发明实施例的另一方面,还提供了一种存储介质,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述中任意一项的用户信息的检测方法。
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:获取第一图像;将第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息;基于用户信息输出检测结果。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模 块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。
工业实用性
本申请实施例提供的方案可以实现用户信息检测,在本申请实施例提供的技术方案中,可以应用于空间区域内部的图像检测装置中,通过图像拍摄装置获取到待检测区域(例如,车辆内部、商场电梯、会议室等)的第一图像,并对第一图像进行处理,识别出检测对象信息,分析得到用户的年龄等用户信息,对用户年龄做出判断,从而在区域内人员出现危险状态时,及时发出报警提示,本申请分析图像内容得到用户信息的方案不会受到环境影响,仅需要保证图像拍摄装置能够正常工作即可,设备发生故障的概率明显降低,检测结果的稳定性较高,从而解决相关技术中通过红外传感器检测车辆内部区域的孩子状态,容易受到环境影响,造成设备失灵的技术问题。本申请可以自动分析某一空间区域内各个图像拍摄设备拍摄的图像,分析出区域内人员出现是否处于危险状态,对区域内部的人员信息实现准确识别,提高危险感知程度,降低人员处于危险状态的概率,能够有效的避免小孩被遗忘在车中的情况,从而避免这 类无意之中造成的悲剧。

Claims (20)

  1. 一种用户信息的检测方法,包括:
    获取第一图像;
    将所述第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,所述用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息;
    基于所述用户信息输出检测结果。
  2. 根据权利要求1所述的检测方法,其中,在获取第一图像之后,所述检测方法还包括:
    提取所述第一图像中的图像信息,判断所述第一图像中是否存在第一检测对象,其中,第一检测对象包括下述至少之一:人脸、人头、躯干、四肢、人体;
    若所述第一图像中不存在所述第一检测对象,删除所述第一图像;或者,
    若所述第一图像中存在所述第一检测对象,截取所述第一图像中第一检测对象所在的感兴趣区域;
    对所述第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像。
  3. 根据权利要求2所述的检测方法,其中,对所述第一图像中的感兴趣区域进行图像处理,得到包含合格的第一检测对象的第一检测图像包括:
    对所述感兴趣区域进行初始质量评估,得到图像质量评估结果,其中,所述初始质量评估的评估内容包括下述至少之一:图像模糊评估、角度评估、位置评估、光照强度评估;
    若所述图像质量评估结果指示所述第一检测对象的质量不合格,则停止检测所述第一图像;
    若所述图像质量评估结果指示所述第一检测对象的质量合格,将所述感兴趣区域中所述第一检测对象所在的子区域作为所述第一检测图像。
  4. 根据权利要求3所述的检测方法,其中,在将所述第一检测对象所在的子区域作为所述第一检测图像之后,所述检测方法还包括:
    提取所述第一检测图像中多个特征点坐标;
    确定所述特征点坐标中指示第一检测对象的中心点坐标;
    基于所述第一检测对象的中心点坐标,将所述第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示所述第一检测对象的中心点,以对齐所述第一检测图像和所述第一标准图像。
  5. 根据权利要求4所述的检测方法,其中,将所述第一检测图像中的第一检测对象中心点映射至预设的第一标准图像中指示所述第一检测对象的中心点包括:
    以所述第一标准图像的第一检测对象的中心点为基准,将所述第一检测图像移动至所述第一标准图像上方;
    以所述第一检测对象的中心点为基准,按照所述第一标准图像的比例,缩小或扩大所述第一检测图像,以使所述第一检测图像与所述第一标准图像的大小一致;
    若所述第一检测图像的检测对象与所述第一标准图像的检测对象朝向不一致,则旋转所述第一检测图像,使所述第一检测图像的检测对象和第一标准图像检测对象朝向一致;
    在所述第一检测图像的检测对象和第一标准图像检测对象朝向一致后,确定将所述第一检测图像中的中心点映射至所述第一标准图像中指示所述第一检测对象的中心点。
  6. 根据权利要求1所述的检测方法,其中,在获取第一图像之前,所述检测方法还包括:
    采集包含不同图像因素的多张用户图像,其中,所述图像因素包括下述至少之一:图像场景、光照度、分辨率、用户装饰;
    过滤所述多张用户图像,得到对应不同用户类别的多份图像样本集,其中,每份所述图像样本集中的每张用户图像对应有第一检测对象的属性标签和用户所属类别的类别标识;
    对所述多份图像样本集中的每张图像进行裁剪,得到多张第一标准图像;
    将所述多份图像样本集的第一标准图像、每张所述第一标准图像上的第一检测对象的属性标签和类别标识输入至初始网络模型,以对所述初始网络模型进行训练,得到所述第一检测模型。
  7. 根据权利要求6所述的检测方法,其中,所述初始网络模型至少包括:数据层、卷积层、池化层、激活层、全连接层和输出层。
  8. 根据权利要求7所述的检测方法,其中,对所述初始网络模型进行训练,得到所述第一检测模型的步骤,包括:
    将所述多份图像样本集的第一标准图像、每张所述第一标准图像上的第一检测对象的属性标签和类别标识通过所述初始网络模型的数据层输入至所述初始网络模型的训练网络中;
    训练所述初始网络模型的卷积层,通过预先设置的卷积参数提取所述多份图像样本集的第一标准图像的数据特征,得到第一数据网络特征图,其中,所述卷积参数至少包括:第一提取步长、卷积核尺寸和卷积核个数;
    训练所述初始网络模型的池化层,通过预先设置的池化参数对所述第一数据网络特征图进行下采样处理,得到第二数据网络特征图,其中,所述池化参数至少包括:第二提取步长、池化尺寸;
    训练所述初始网络模型的激活层,对所述第二数据网络特征图进行非线性变化处理,其中,所述非线性变化处理的方式包括使用下述至少之一的激活函数:relu激活函数、prelu激活函数、relu6激活函数;
    训练所述初始网络模型的全连接层,连接所述第一数据网络特征图和所述第二数据网络特征图,并通过预设的特征权重将特征图中的特征空间通过线性变换映射至标识空间,其中,所述标识空间设置为记录第一检测对象的属性标签和类别标识;
    训练所述初始网络模型的输出层,输出与每张第一标准图像对应的分类结果,其中,所述分类结果指示与所述第一标准图像中的检测对象对应的类别。
  9. 根据权利要求8所述的检测方法,其中,将所述第一图像输入至第一检测模型,以确定在所述目标区域内活动用户的用户信息的步骤,包括:
    通过所述第一检测模型的数据层将所述第一图像输入至所述第一检测模型的数据网络;
    利用所述第一检测模型的卷积层、池化层和激活层对所述第一图像进行图像特征抽取,得到多维图像输出向量;
    将所述多维图像输出向量输入至不同的全连接层,得到用户信息评估结果, 并将所述用户信息评估结果输入至输出层;
    利用所述第一检测模型的输出层输出用户信息。
  10. 根据权利要求1所述的检测方法,其中,所述检测方法还包括:将所述第一图像输入至第二检测模型,以确定在所述第一图像内第二检测对象的检测信息,其中,所述第二检测对象为所述第一检测对象的参照物。
  11. 根据权利要求10所述的检测方法,其中,基于所述第二检测对象的检测信息和所述用户信息输出检测结果。
  12. 根据权利要求1至11中任意一项所述的检测方法,其中,所述检测方法的应用场景包括下述至少之一:车辆内部人员监测、电梯人员监测。
  13. 根据权利要求1至11中任意一项所述的检测方法,其中,根据所述检测结果确定是否发出报警提示。
  14. 一种用户信息的检测系统,包括:
    图像捕获装置,设置为获取第一图像;
    分析装置,设置为将所述第一图像输入至第一检测模型,以确定在目标区域内活动用户的用户信息,其中,所述用户信息包括下述至少之一:用户年龄信息、用户性别信息、用户行为信息、用户表情信息、用户体型信息、用户服饰信息;
    结果输出装置,设置为基于所述用户信息输出检测结果。
  15. 根据权利要求14所述的检测系统,其中,所述检测系统包括:
    部位判断装置,设置为在获取所述第一图像之后,提取所述第一图像中的图像信息,使用检测对象检测器判断所述第一图像中是否存在第一检测对象,其中,所述第一检测对象包括下述至少之一:人脸、人头、躯干、四肢、人体;
    删除单元,设置为在所述第一图像中不存在所述第一检测对象时,删除所述第一图像;或者,
    截取单元,设置为在所述第一图像中存在所述第一检测对象时,截取所述第一图像中第一检测对象所在的感兴趣区域;
    图像处理装置,设置为对所述第一图像中的感兴趣区域进行图像处理,得到包含合格的所述第一检测对象的第一检测图像。
  16. 根据权利要求15所述的检测系统,其中,所述图像处理装置包括:
    质量评估单元,设置为对所述感兴趣区域进行初始质量评估,得到图像质量评估结果,其中,初始质量评估的评估内容包括下述至少之一:图像模糊评估、角度评估、位置评估、光照强度评估;
    停止单元,设置为在所述图像质量评估结果指示所述第一检测对象质量不合格时,停止检测所述第一图像;
    第一确定单元,设置为在所述图像质量评估结果指示所述第一检测对象质量合格时,将所述感兴趣区域中所述第一检测对象所在的子区域作为第一检测图像。
  17. 根据权利要求14所述的检测系统,其中,所述检测系统还包括:图像输入单元,设置为将所述第一图像输入至第二检测模型,以确定在所述第一图像内第二检测对象的检测信息,其中,第二检测对象为所述第一检测对象的参照物。
  18. 根据权利要求17所述的检测系统,其中,基于所述第二检测对象的检测信息和所述用户信息输出所述检测结果。
  19. 一种电子设备,包括:
    处理器;以及
    存储器,设置为存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至13中任意一项所述的用户信息的检测方法。
  20. 一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至13中任意一项所述的用户信息的检测方法。
PCT/CN2020/130631 2019-11-22 2020-11-20 用户信息的检测方法及系统、电子设备 WO2021098855A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20888865.1A EP4064113A4 (en) 2019-11-22 2020-11-20 USER INFORMATION DETECTION METHOD AND SYSTEM, AND ELECTRONIC DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911158474.8A CN112836549A (zh) 2019-11-22 2019-11-22 用户信息的检测方法及系统、电子设备
CN201911158474.8 2019-11-22

Publications (1)

Publication Number Publication Date
WO2021098855A1 true WO2021098855A1 (zh) 2021-05-27

Family

ID=75922153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130631 WO2021098855A1 (zh) 2019-11-22 2020-11-20 用户信息的检测方法及系统、电子设备

Country Status (3)

Country Link
EP (1) EP4064113A4 (zh)
CN (1) CN112836549A (zh)
WO (1) WO2021098855A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116135614A (zh) * 2023-03-28 2023-05-19 重庆长安汽车股份有限公司 车内滞留人员保护方法、装置及车辆
WO2024082912A1 (zh) * 2022-10-20 2024-04-25 华为技术有限公司 体型测量方法及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105938551A (zh) * 2016-06-28 2016-09-14 深圳市唯特视科技有限公司 一种基于视频数据的人脸特定区域提取方法
CN106295579A (zh) * 2016-08-12 2017-01-04 北京小米移动软件有限公司 人脸对齐方法及装置
CN106503669A (zh) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及系统
EP3144851A1 (en) * 2015-09-18 2017-03-22 Panasonic Intellectual Property Corporation of America Image recognition method
CN108229267A (zh) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 对象属性检测、神经网络训练、区域检测方法和装置
CN109522790A (zh) * 2018-10-08 2019-03-26 百度在线网络技术(北京)有限公司 人体属性识别方法、装置、存储介质及电子设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426850B (zh) * 2015-11-23 2021-08-31 深圳市商汤科技有限公司 一种基于人脸识别的关联信息推送设备及方法
US10192125B2 (en) * 2016-10-20 2019-01-29 Ford Global Technologies, Llc Vehicle-window-transmittance-control apparatus and method
CN107169454B (zh) * 2017-05-16 2021-01-01 中国科学院深圳先进技术研究院 一种人脸图像年龄估算方法、装置及其终端设备
CN108985133B (zh) * 2017-06-01 2022-04-12 北京中科奥森数据科技有限公司 一种人脸图像的年龄预测方法及装置
US10838425B2 (en) * 2018-02-21 2020-11-17 Waymo Llc Determining and responding to an internal status of a vehicle
CN108549720A (zh) * 2018-04-24 2018-09-18 京东方科技集团股份有限公司 一种基于情绪识别的安抚方法、装置及设备、存储介质
CN108960087A (zh) * 2018-06-20 2018-12-07 中国科学院重庆绿色智能技术研究院 一种基于多维度评估标准的人脸图像质量评估方法及系统
CN110119714B (zh) * 2019-05-14 2022-02-25 山东浪潮科学研究院有限公司 一种基于卷积神经网络的驾驶员疲劳检测方法及装置
CN110210382A (zh) * 2019-05-30 2019-09-06 上海工程技术大学 一种基于时空特征识别的人脸疲劳驾驶检测方法及装置
CN110287942B (zh) * 2019-07-03 2021-09-17 成都旷视金智科技有限公司 年龄估计模型的训练方法、年龄估计方法以及对应的装置
CN110472611A (zh) * 2019-08-21 2019-11-19 图谱未来(南京)人工智能研究院有限公司 人物属性识别的方法、装置、电子设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3144851A1 (en) * 2015-09-18 2017-03-22 Panasonic Intellectual Property Corporation of America Image recognition method
CN105938551A (zh) * 2016-06-28 2016-09-14 深圳市唯特视科技有限公司 一种基于视频数据的人脸特定区域提取方法
CN106295579A (zh) * 2016-08-12 2017-01-04 北京小米移动软件有限公司 人脸对齐方法及装置
CN106503669A (zh) * 2016-11-02 2017-03-15 重庆中科云丛科技有限公司 一种基于多任务深度学习网络的训练、识别方法及系统
CN108229267A (zh) * 2016-12-29 2018-06-29 北京市商汤科技开发有限公司 对象属性检测、神经网络训练、区域检测方法和装置
CN109522790A (zh) * 2018-10-08 2019-03-26 百度在线网络技术(北京)有限公司 人体属性识别方法、装置、存储介质及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4064113A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024082912A1 (zh) * 2022-10-20 2024-04-25 华为技术有限公司 体型测量方法及电子设备
CN116135614A (zh) * 2023-03-28 2023-05-19 重庆长安汽车股份有限公司 车内滞留人员保护方法、装置及车辆

Also Published As

Publication number Publication date
CN112836549A (zh) 2021-05-25
EP4064113A1 (en) 2022-09-28
EP4064113A4 (en) 2023-05-10

Similar Documents

Publication Publication Date Title
Ramzan et al. A survey on state-of-the-art drowsiness detection techniques
CN108921100B (zh) 一种基于可见光图像与红外图像融合的人脸识别方法及系统
JP7229174B2 (ja) 人識別システム及び方法
US9378421B2 (en) System and method for seat occupancy detection from ceiling mounted camera using robust adaptive threshold criteria
US20220129687A1 (en) Systems and methods for detecting symptoms of occupant illness
García et al. Driver monitoring based on low-cost 3-D sensors
WO2021098855A1 (zh) 用户信息的检测方法及系统、电子设备
US11062126B1 (en) Human face detection method
Anishchenko Machine learning in video surveillance for fall detection
CN109325408A (zh) 一种手势判断方法及存储介质
CN110148092B (zh) 基于机器视觉的青少年坐姿及情绪状态的分析方法
Małecki et al. Multispectral data acquisition in the assessment of driver’s fatigue
CN115375991A (zh) 一种强/弱光照和雾环境自适应目标检测方法
Anitta Human head pose estimation based on HF method
JP2019106149A (ja) 情報処理装置、情報処理プログラム、及び、情報処理方法
JP6773825B2 (ja) 学習装置、学習方法、学習プログラム、及び対象物認識装置
Forczmański et al. Driver drowsiness estimation by means of face depth map analysis
Iamudomchai et al. Deep learning technology for drunks detection with infrared camera
Zhou Eye-Blink Detection under Low-Light Conditions Based on Zero-DCE
Campomanes-Álvarez et al. Automatic facial expression recognition for the interaction of individuals with multiple disabilities
Hadi et al. Fusion of thermal and depth images for occlusion handling for human detection from mobile robot
CN114120370A (zh) 基于cnn-lstm的人体跌倒检测实现方法及系统
Afroz et al. IoT based two way safety enabled intelligent stove with age verification using machine learning
Forczmański et al. Supporting driver physical state estimation by means of thermal image processing
Li et al. Calibration error prediction: ensuring high-quality mobile eye-tracking

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20888865

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020888865

Country of ref document: EP

Effective date: 20220622