WO2024001365A1 - 参数检测方法和设备 - Google Patents

参数检测方法和设备 Download PDF

Info

Publication number
WO2024001365A1
WO2024001365A1 PCT/CN2023/085382 CN2023085382W WO2024001365A1 WO 2024001365 A1 WO2024001365 A1 WO 2024001365A1 CN 2023085382 W CN2023085382 W CN 2023085382W WO 2024001365 A1 WO2024001365 A1 WO 2024001365A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
facial feature
prediction
feature parameter
value
Prior art date
Application number
PCT/CN2023/085382
Other languages
English (en)
French (fr)
Inventor
李源
周欣文
黄�俊
沈鹏程
Original Assignee
魔门塔(苏州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔门塔(苏州)科技有限公司 filed Critical 魔门塔(苏州)科技有限公司
Publication of WO2024001365A1 publication Critical patent/WO2024001365A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the field of image detection technology, and in particular to a parameter detection method and device.
  • DMS Driver Monitor System
  • This application provides a parameter detection method and equipment, which can make the detection results of the driver's facial feature parameters more accurate.
  • embodiments of the present application provide a parameter detection method, including: acquiring a first video frame image; the first video frame image includes a driver's facial image; and determining a first prediction vector of the first facial feature parameter.
  • the first prediction vector is obtained by inputting the first video frame image into a preset first model; the first model is used to detect the first facial feature parameter of the driver in the image; according to the first face
  • the first prediction vector of the facial feature parameter is used to calculate the detection value of the first facial feature parameter and the confidence of the detection value.
  • the credibility of the detection value can be characterized by the confidence level,
  • the detection result of the first facial feature parameter of the first video frame image is more accurate.
  • the above-mentioned first model may be the parameter detection model in the following embodiments.
  • the first prediction vector is the prediction vector output by the parameter detection model in the following embodiment.
  • calculating the detection value of the first facial feature parameter and the confidence of the detection value based on the first prediction vector of the first facial feature parameter includes: The first prediction vector is normalized to obtain the first prediction probability vector of the first facial feature parameter; the detection value of the first facial feature parameter is calculated according to the first prediction probability vector, and the detection value of the first facial feature parameter is calculated according to the first prediction probability vector.
  • the first prediction probability vector calculates the confidence of the detection value.
  • calculating the detection value of the first facial feature parameter according to the first prediction probability vector includes:
  • i is an integer, and i ⁇ [0, N].
  • calculating the confidence of the detection value based on the first predicted probability vector includes:
  • calculating the confidence of the detection value based on the detection value of the first facial feature parameter includes:
  • the confidence level con f of the first facial feature parameter is calculated using the following formula:
  • round is the detection value of the first facial feature parameter
  • round represents the rounding function
  • the normalization process on the first prediction vector to obtain the first prediction probability vector of the first facial feature parameter includes:
  • the first prediction vector is normalized using a normalized exponential function to obtain the first prediction probability vector.
  • the training method of the first model includes: obtaining a first sample, the first sample includes: a first image, a first facial feature parameter of the first image. Parameter value; input the first sample into the preset initial model to obtain the second prediction vector of the first facial feature parameter; perform normalization processing on the second prediction vector to obtain the first face the second prediction probability vector of facial feature parameters; encoding the first parameter value to obtain the true value vector of the first facial feature parameter; using a predetermined value vector according to the second prediction probability vector and the true value vector Suppose the loss function calculates the prediction deviation value; adjust the parameters of the initial model according to the prediction deviation value; when the initial model does not converge, obtain a new sample as the first sample, and return the The first sample is input into a preset initial model until the initial model converges, and the first model is obtained.
  • the preset initial model includes: a convolutional neural network and a fully connected network.
  • the convolutional neural network is used as the input of the preset initial model.
  • the fully connected network is used As the output of the preset initial model; the output end of the fully connected network is provided with N+1 neurons for the first facial feature parameter, and the output of the N+1 neurons serves as the output of the first facial feature parameter. vector elements of the second prediction vector of the first facial feature parameter.
  • embodiments of the present application provide a method for establishing a parameter detection model, including: obtaining a first sample, where the first sample includes: a first image, a first facial feature parameter of the first image. parameter value; input the first sample into the preset initial model to obtain the second prediction vector of the first facial feature parameter; perform normalization processing on the second prediction vector to obtain the first face a second prediction probability vector of facial feature parameters; encoding the first parameter value to obtain a true value vector of the first facial feature parameter; according to the second prediction probability vector
  • the rate vector and the true value vector use a preset loss function to calculate the prediction deviation value; adjust the parameters of the initial model according to the prediction deviation value; when the initial model does not converge, obtain a new sample as the For the first sample, return to the step of inputting the first sample into the preset initial model until the initial model converges to obtain the parameter detection model.
  • the preset initial model includes: a convolutional neural network and a fully connected network, the convolutional neural network serves as the input of the above preset initial model, and the fully connected network serves as the output of the preset initial model,
  • the convolutional neural network can effectively extract the characteristics of the first sample, input the characteristics to the fully connected network, and the fully connected network outputs the prediction vector of the first facial feature parameter according to the characteristics; the fully connected network
  • the output terminal is provided with N+1 neurons for the first facial feature parameter, and the outputs of the N+1 neurons serve as vector elements of the second prediction vector of the first facial feature parameter.
  • normalizing the second prediction vector includes: using a normalized exponential function to normalize the second prediction vector.
  • the encoding of the first parameter value to obtain the true value vector of the first facial feature parameter includes: encoding the first parameter value using a Gaussian distribution function, Obtain the true value vector of the first facial feature parameter; or use an arbitrary distribution function to encode the first parameter value to obtain the true value vector of the first facial feature parameter.
  • the prediction deviation value is calculated using a preset loss function based on the second prediction probability vector and the true value vector.
  • the i-th vector element Li included in the prediction bias value is calculated using the formula of the following loss function:
  • Si is the i-th vector element of the second prediction probability vector
  • Ti is the i-th vector element of the truth vector
  • i is an integer, and i ⁇ [0, N].
  • the prediction deviation value is calculated using a preset loss function based on the second prediction probability vector and the true value vector.
  • m int(v)
  • v is the first parameter value
  • int is the rounding operation
  • S m is the m-th vector element of the second prediction probability vector
  • S m+1 is the second The m+1th vector element of the prediction probability vector.
  • embodiments of the present application provide an electronic device, including: one or more processors, the processors are coupled to a memory, and one or more computer programs are stored in the memory; the one or more When the computer program is executed by the processor, the electronic device is caused to execute the method described in any one of the first aspects.
  • embodiments of the present application provide an electronic device, including: one or more processors, the processors are coupled to a memory, and one or more computer programs are stored in the memory; the one or more When the computer program is executed by the processor, the electronic device is caused to execute the method described in any one of the second aspects.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute any of the steps described in the first aspect. method.
  • inventions of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when run on a computer, causes the computer to execute any of the steps described in the second aspect. method.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a computer program that, when run on a computer, causes the computer to execute any method of the first aspect.
  • inventions of the present application provide a computer program product.
  • the computer program product includes a computer program that, when run on a computer, causes the computer to execute any method of the second aspect.
  • the present application provides a computer program, which when the computer program is executed by a computer, is used to perform the method described in the first aspect or the second aspect.
  • the program in the ninth aspect may be stored in whole or in part on a storage medium packaged with the processor, or in part or in whole on a memory that is not packaged with the processor.
  • Figure 1 is a schematic structural diagram of a system applicable to the parameter detection method provided by the embodiment of the present application
  • Figure 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of a method for establishing a parameter detection model provided by an embodiment of the present application
  • Figure 4 is a schematic flow chart of a parameter detection method provided by an embodiment of the present application.
  • Figure 5 is another schematic flowchart of a parameter detection method provided by an embodiment of the present application.
  • the DMS system it is necessary to detect the driver's facial feature parameters such as line of sight parameters, facial key points, etc. based on the images captured by the camera, and then use the driver's facial feature parameters to detect the driver's driving behavior, such as whether he is distracted while driving. Wait for testing.
  • the detection results of the driver's facial feature parameters in the prior art are often not accurate enough, which in turn makes the detection results of the DMS system's detection of the driver's driving behavior not accurate enough.
  • a sight detection model can be preset in the DMS system.
  • the input of the sight detection model is the image captured by the camera.
  • the output is the pitch angle and yaw angle of the driver's line of sight in the image.
  • the camera captures an image when the driver blinks, closes his eyes, or an object blocks the driver's eyes, the driver's pupils are not visible in the image, that is, the image does not include the driver's pupil image, but the line of sight
  • the detection model will still output the angle values of the pitch angle and the yaw angle of the line of sight for this image, but in fact the above-mentioned angle values of the line of sight output based on the image are not accurate. If the above-mentioned angles of the image where the driver's pupils are not visible are The value serves as the data basis for subsequent driver distraction detection, which will cause bias in the driver distraction detection results.
  • the embodiment of the present application proposes a parameter detection method and a parameter detection model establishment method, which can make the detection results of the driver's facial feature parameters more accurate, and thereby enable the DMS system to detect the driver's driving behavior more accurately. Test results are more accurate.
  • FIG. 1 it is a structural schematic diagram of a DMS system to which the parameter detection method of the embodiment of the present application is applicable, including: a camera 110 and an electronic device 120; wherein,
  • Camera 110 is used to capture still images or video.
  • the camera 110 can be installed in the vehicle, and can be positioned in front of the driver and aimed at the driver's face.
  • the still image or video captured by the camera 110 includes the driver's facial image.
  • the object generates an optical image through the lens and projects it onto the photosensitive element of the camera 110 .
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the electronic device 120 is used to convert electrical signals into digital image signals, detect the driver's facial feature parameters based on the digital image signals, detect the driver's driving behavior based on the detection results of the driver's facial feature parameters, and so on.
  • the electronic device 120 may be disposed in the same vehicle as the camera 110 .
  • the camera 110 may be provided with an image signal processor (ISP), and the above electrical signal may be converted into a digital image signal by the ISP, so that the camera 110 directly transmits the digital image signal converted by the ISP.
  • ISP image signal processor
  • the electronic device 120 does not need to perform the above-mentioned conversion of the electrical signal into a digital image signal.
  • the electronic device referred to in the embodiment of the present application may be a vehicle-mounted device, an Internet of Vehicles terminal, a computer, a laptop, a handheld communication device, a handheld computing device, and/or other devices used for communication on a wireless system, and Next-generation communication systems, such as mobile terminals in 5G networks or mobile terminals in future evolved Public Land Mobile Network (PLMN) networks.
  • PLMN Public Land Mobile Network
  • the electronic device 200 includes: a processor 210, a memory 220, a display screen 230, etc.
  • the above electronic device 120 can be implemented by the electronic device 200 shown in FIG. 2 .
  • the electronic device may also include: an antenna, a mobile communication module, a wireless communication module, an audio module, a speaker, a receiver, a microphone, a headphone jack, a charging management module, a power management module, and a battery.
  • an antenna a mobile communication module
  • a wireless communication module an audio module
  • a speaker a speaker
  • a receiver a microphone
  • a headphone jack a charging management module
  • a power management module a battery.
  • One or more devices among others are not limited by the embodiments of this application.
  • the processor 210 may include one or more processing units.
  • the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processing unit, GPU), ISP, controller, video codec, digital signal processor (DSP), baseband processor, and/or neural-network processing unit (NPU), etc.
  • application processor application processor, AP
  • modem processor modem processor
  • GPU graphics processing unit
  • ISP ISP
  • controller video codec
  • DSP digital signal processor
  • NPU neural-network processing unit
  • different processing units can be independent devices or integrated in one or more processors.
  • the electrical signals transmitted by the above-mentioned cameras can be converted into digital image signals by the ISP.
  • ISP can output digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 210 may also be provided with a memory for storing instructions and data.
  • the memory in processor 210 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 210 . If the processor 210 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 210 is reduced, thus improving the efficiency of the system.
  • Memory 220 may be used to store computer executable program code, which includes instructions.
  • the memory 220 may include a program storage area and a data storage area.
  • the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the electronic device 200 (such as audio data, etc.).
  • the memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the processor 210 executes various functional applications and data processing of the electronic device 200 by executing instructions stored in the memory 220 and/or instructions stored in the memory provided in the processor.
  • the memory 220 is disposed in the electronic device 200 as an example. In other embodiments provided by the embodiments of this application, the above-mentioned memory 220 may not be disposed in the electronic device 200 . At this time, the memory 220 can be connected to the electronic device 200 through an interface provided by the electronic device 200, and further can be connected to the processor 210 in the electronic device 200.
  • the display screen 230 is used to display images, videos, etc.
  • Display 230 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • electronic device 200 may include one or more display screens 230.
  • a training sample set is preset, and the training sample set includes several samples.
  • the number of samples is not limited in the embodiments of this application.
  • Each sample includes: an image and a parameter value of at least 1 facial feature parameter in the image.
  • the facial feature parameters may include but are not limited to: the driver's line of sight parameters, the driver's head posture State parameters, position parameters of eye feature points, position parameters of facial feature areas, position parameters of the driver's body area, etc.
  • the driver's sight parameters may include but are not limited to: pitch angle, yaw angle, etc.
  • the driver's head posture parameters may include but are not limited to: pitch angle, yaw angle, roll angle, etc.
  • the eye feature points may include but are not limited to: eye center point, eye corner point, eyelid point, pupil point, etc.
  • facial feature areas may include, but are not limited to: eye areas, mouth areas, nose areas, etc.
  • body feature areas may include but are not limited to: hand areas, body torso areas, etc.
  • the initial model is preset in the embodiment of this application.
  • the initial model may include: a convolutional neural network and a fully connected network; where the convolutional neural network is used as the input end of the initial model to receive training samples, and the convolutional neural network can effectively extract The characteristics of the sample are input to the fully connected network; the fully connected network is used as the output end of the initial model to output the prediction vector of facial feature parameters.
  • the prediction vector of each facial feature parameter is an N+1 dimensional array. That is to say, the output end of the fully connected network is set with N+1 neurons for each facial feature parameter, as neurons that output vector elements of the facial feature parameter, so that for each facial feature parameter, its corresponding The vector elements output by the N+1 neurons form the prediction vector of the facial feature parameters.
  • N is a natural number.
  • the output end of the fully connected network includes N+1 neurons for the pitch angle.
  • the output end of the fully connected network includes N+1 neurons for this yaw angle.
  • the electronic device that executes the parameter detection model establishment method in the embodiment of the present application may be the above-mentioned electronic device 120, or may be an electronic device installed outside the vehicle, which is not limited in the embodiment of the present application.
  • the parameter detection model establishment method may include:
  • Step 301 Obtain a first sample; the first sample includes: a first image and a first parameter value of a first facial feature parameter.
  • Step 302 Input the first sample into the preset initial model to obtain a prediction vector of the first facial feature parameter.
  • the prediction vector of the above-mentioned first facial feature parameter is an N+1-dimensional array.
  • the vector elements y 0 , y 1 , etc. in the prediction vector y are the output values of the N+1 neurons corresponding to the first facial feature parameter.
  • the first facial feature parameter in this step and subsequent step 303 is the first facial feature parameter of the first sample.
  • the first sample is omitted.
  • Step 303 Normalize the prediction vector of the first facial feature parameter to obtain the prediction probability vector of the first facial feature parameter.
  • the normalization process in the embodiment of this application refers to mapping each vector element in the above prediction vector to the interval [0, 1] through calculation.
  • the prediction probability vector of the first facial feature parameter is denoted as P.
  • the Softmax function is also called the normalized exponential function.
  • the prediction probability vector is also an N+1 dimensional array, and the sum of the vector elements in the prediction probability vector is 1, that is
  • Step 304 Encode the first parameter value to obtain the true value vector of the first facial feature parameter.
  • the true value vector is T
  • the Gaussian distribution function can be used to encode the first parameter value v to obtain the true value vector T.
  • the calculation formula is as follows:
  • the specific value of ⁇ can be set according to the actual situation of the first parameter value. If the prediction of the first parameter value is difficult or If the accuracy is low, you can set ⁇ to be relatively large, so that the degree of discreteness of the true value vector T is relatively large. Otherwise, set ⁇ to be relatively small, so that the degree of discreteness of the true value vector T is relatively large. Small.
  • any distribution function can be used to encode the first parameter value v to obtain the true value vector T.
  • Step 305 Calculate the prediction deviation value using a preset loss function according to the true value vector T of the first facial feature parameter and the prediction probability vector P.
  • the prediction deviation value is set to L.
  • the kullback-Leibler divergence can be used as the loss function.
  • i is an integer, and i ⁇ [0, N].
  • m int(v)
  • int is a rounding operation
  • S m is the m-th vector element of the prediction probability vector P calculated in step 303
  • S m+1 is the prediction probability vector P calculated in step 303.
  • Step 306 Use the prediction deviation value to adjust the parameters of the preset model.
  • Determining whether the preset model has converged can be implemented using related technologies, such as determining whether the prediction deviation value satisfies the preset convergence conditions, or determining whether the adjustment value of the parameters of the preset model satisfies the preset convergence conditions, etc. This embodiment of the present application does not do this. limited.
  • Step 308 Acquire a new sample as the first sample, return to the above step 302 and subsequent steps until the preset initial model converges, and obtain the parameter detection model of the embodiment of the present application.
  • step 308 steps such as using test samples to test the parameter detection model obtained by the above training and adjusting the parameters of the parameter detection model according to the test results may be included to optimize the parameter detection model.
  • steps such as using test samples to test the parameter detection model obtained by the above training and adjusting the parameters of the parameter detection model according to the test results may be included to optimize the parameter detection model.
  • the input of the parameter detection model established by the method shown in Figure 3 is an image, and the output is a prediction vector of the first facial feature parameter of the image.
  • the prediction vector is an N+1 dimensional array.
  • the detection model only outputs one parameter value, and the output result of the parameter detection model has more dimensional data information, thereby improving the accuracy of the detection result of the first facial feature parameter by the parameter detection model.
  • the parameter detection model can be provided in the above-mentioned electronic device 120 to support the electronic device to detect the first facial feature parameter of the image.
  • Figure 4 is a schematic flowchart of a parameter detection method provided by an embodiment of the present application. As shown in Figure 4, this method can be applied to the above-mentioned electronic equipment.
  • the method can include:
  • Step 401 Obtain the first video frame image.
  • the first video frame image obtained in this step may be a video frame image including the driver's facial image captured by the camera in real time.
  • Step 402 Determine the first prediction vector of the first facial feature parameter of the first video frame image.
  • the first prediction vector can be obtained by inputting the first video frame image into a preset first model.
  • the first facial feature parameter of the first video frame image is simply described as the first facial feature parameter below.
  • the parameter detection model used in this step is also the parameter detection model obtained in the above-mentioned parameter detection model establishment method. For this reason, in this step, the description in Figure 3 is continued, and the prediction vector of the first facial feature parameter is recorded as y.
  • Step 403 Calculate the detection value of the first facial feature parameter and the confidence of the detection value according to the prediction vector of the first facial feature parameter.
  • the confidence level of the detection value of the first facial feature parameter will also be referred to as the confidence level of the first facial feature parameter in the subsequent description.
  • this step may include:
  • the detection value of the first facial feature parameter and the confidence level of the detection value are calculated according to the prediction probability vector.
  • the specific processing method for normalizing the prediction vector y in this step generally needs to be the same as the normalization processing method used in the establishment of the above-mentioned parameter detection model to ensure the detection value and confidence obtained by subsequent calculations. degree of accuracy.
  • the detection value of the first facial feature parameter is recorded as The above-mentioned calculation of the detection value of the first facial feature parameter based on the prediction probability vector P Can include:
  • the confidence level can include:
  • the maximum value among S 0 ⁇ S N can be selected as the detection value
  • the confidence level con f the maximum value among S 0 ⁇ S N.
  • the confidence level can include:
  • round represents the rounding function.
  • the detection result of the first facial feature parameter of the first video frame image is more accurate.
  • the detection value and confidence level of the first facial feature parameter of the first video frame image can be used as a data basis for detecting the driver's driving behavior.
  • the parameter detection method according to the embodiment of the present application may also include the following step 404 after step 403.
  • Step 404 Detect the driver's driving behavior based on the detection value and confidence level of the first facial feature parameter.
  • whether the detection value is suitable as a data basis for driving behavior detection can be evaluated based on the confidence level of the detection value. For example, detection values whose confidence level is lower than a certain threshold are filtered out and are not used as the data basis for driver's driving behavior detection, thereby making the data basis for driver's driving behavior detection more accurate and effective, thereby improving the driver's Accuracy of driving behavior detection.
  • the above-mentioned driving behavior may be the driver's distracting behavior
  • the first facial feature parameter may be the pitch angle and yaw angle of the line of sight
  • this step may include:
  • the driver's distracting behavior is detected based on the detection value of the sight pitch angle and its confidence level, the detection value of the sight yaw angle and its confidence level.
  • the above-mentioned detection of the driver's distracting behavior is based on the detection value of the sight pitch angle and its confidence level, the detection value of the sight yaw angle and its confidence level, and may specifically include:
  • the driver's distracting behavior is detected based on the detection value of the line of sight pitch angle and the detection value of the line of sight yaw angle of the effective video frame image.
  • the above-mentioned first time period may be a period of preset duration with the shooting time of the first video frame image as the end point.
  • the specific value of the above-mentioned preset duration is not limited in this embodiment of the application; the above-mentioned first frame
  • the video frame image is the last video frame image in the above-mentioned first time period.
  • a first threshold can be set for the confidence of the line of sight pitch angle.
  • a second threshold is set to achieve the confidence of the yaw angle.
  • the video frame images whose confidence in the sight pitch angle is lower than the first threshold and/or the confidence of the sight yaw angle is lower than the second threshold are filtered out, and the remaining The video frame image, that is, the video frame image in which the confidence level of the sight pitch angle is not lower than the first threshold and the confidence level of the sight yaw angle is not lower than the second threshold, is used as the above-mentioned valid video frame image.
  • the above-mentioned detection of the driver's distracting behavior based on the detection value of the line of sight pitch angle and the detection value of the line of sight yaw angle of the effective video frame image can be implemented using a relevant distraction behavior detection method, which is not limited by the embodiments of the present application.
  • video frame images with low confidence in the sight pitch angle and/or low confidence in the sight yaw angle can be filtered out.
  • videos without driver pupil images in the video frame images The confidence of the sight pitch angle and/or the confidence of the sight yaw angle of the frame image is often low and can be filtered out, thereby making the basic data for driver distraction behavior detection more accurate and effective, thereby improving driver Accuracy of distraction detection.
  • An embodiment of the present application also provides an electronic device, including: a processor, the processor is coupled to a memory, and a computer program is stored in the memory; when the computer program is executed by the processor, the processor is used to implement the present invention. Apply the method provided by the embodiment shown in Figures 3 to 5.
  • An embodiment of the present application also provides an electronic device.
  • the device includes a storage medium and a central processor.
  • the storage medium may be a non-volatile storage medium, and a computer executable program is stored in the storage medium.
  • the central processing unit is connected to the non-volatile storage medium and executes the computer executable program to implement the method provided by the embodiment shown in Figures 3 to 5 of this application.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program. When it is run on a computer, it causes the computer to execute the steps provided by the embodiments shown in FIGS. 3 to 5 of the present application. method.
  • An embodiment of the present application also provides a computer program product.
  • the computer program product includes a computer program that, when run on a computer, causes the computer to execute the method provided by the embodiments shown in FIGS. 3 to 5 of the present application.
  • At least one refers to one or more, and “multiple” refers to two or more.
  • And/or describes the relationship between associated objects, indicating that there can be three relationships. For example, A and/or B can represent the existence of A alone, the existence of A and B at the same time, or the existence of B alone. Where A and B can be singular or plural.
  • the character “/” generally indicates that the related objects are in an “or” relationship.
  • At least one of the following" and similar expressions refers to any combination of these items, including any combination of single or plural items.
  • At least one of a, b and c can mean: a, b, c, a and b, a and c, b and c or a and b and c, where a, b, c can be single, also Can be multiple.
  • any function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory; hereinafter referred to as: ROM), random access memory (Random Access Memory; hereinafter referred to as: RAM), magnetic disks or optical disks, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disks or optical disks etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种参数检测方法和设备,方法包括:获取第一视频帧图像;所述第一视频帧图像包括驾驶员的面部图像;确定第一面部特征参数的第一预测向量,所述第一预测向量通过将所述第一视频帧图像输入预设第一模型得到;所述第一模型用于检测图像中驾驶员的第一面部特征参数;根据所述第一面部特征参数的预测向量计算所述第一面部特征参数的检测值以及所述检测值的置信度。本申请能够使得驾驶员的面部特征参数的检测结果更为准确。

Description

参数检测方法和设备 技术领域
本申请涉及图像检测技术领域,特别涉及一种参数检测方法和设备。
背景技术
在驾驶员监控系统(Driver Monitor System,DMS)系统中,需要基于摄像头拍摄到的图像对驾驶员的面部特征参数例如视线、面部关键点等进行检测,进而基于驾驶员的面部特征参数对驾驶员的驾驶行为例如驾驶中是否分心等进行检测。但是,现有技术中驾驶员的面部特征参数的检测结果不够准确,从而使得DMS系统对驾驶员的驾驶行为检测的检测结果不够准确,影响用户体验。
发明内容
本申请提供了一种参数检测方法和设备,能够使得驾驶员的面部特征参数的检测结果更为准确。
第一方面,本申请实施例提供一种参数检测方法,包括:获取第一视频帧图像;所述第一视频帧图像包括驾驶员的面部图像;确定第一面部特征参数的第一预测向量,所述第一预测向量通过将所述第一视频帧图像输入预设第一模型得到;所述第一模型用于检测图像中驾驶员的第一面部特征参数;根据所述第一面部特征参数的第一预测向量计算所述第一面部特征参数的检测值以及所述检测值的置信度。该方法中,不仅可以计算得到第一视频帧图像的第一面部特征参数的检测值,还可以计算得到该检测值的置信度,从而,可以通过置信度来表征检测值的可信程度,从而使得第一视频帧图像的第一面部特征参数的检测结果更为准确。
其中,上述的第一模型可以是下述实施例中的参数检测模型。第一预测向量是下述实施例中参数检测模型输出的预测向量。
在一种可能的实现方式中,所述根据所述第一面部特征参数的第一预测向量计算所述第一面部特征参数的检测值以及所述检测值的置信度,包括:对所述第一预测向量进行归一化处理,得到所述第一面部特征参数的第一预测概率向量;根据所述第一预测概率向量计算所述第一面部特征参数的检测值,根据所述第一预测概率向量计算所述检测值的置信度。
在一种可能的实现方式中,所述第一预测概率向量记为P=[S0,S1,…,SN],N是自然数,S0、S1、…、SN分别为第一预测概率向量的向量元素时,所述根据所述第一预测概率向量计算所述第一面部特征参数的检测值,包括:
根据以下公式计算所述第一面部特征参数的检测值
其中,i是整数,且i∈[0,N]。
在一种可能的实现方式中,所述第一预测概率向量记为P=[S0,S1,…,SN],N是自然数,S0、S1、…、SN分别为第一预测概率向量的向量元素时,根据所述第一预测概率向量计算所述检测值的置信度,包括:
使用以下公式计算所述检测值的置信度con f:
con f=max(P)。
在一种可能的实现方式中,所述第一预测概率向量记为P=[S0,S1,…,SN],N是自然数,S0、S1、…、SN分别为第一预测概率向量的向量元素时,所述根据所述第一面部特征参数的检测值计算所述检测值的置信度,包括:
使用以下公式计算所述第一面部特征参数的置信度con f:
其中,是所述第一面部特征参数的检测值,round表示四舍五入函数。
在一种可能的实现方式中,所述对所述第一预测向量进行归一化处理,得到所述第一面部特征参数的第一预测概率向量,包括:
使用归一化指数函数对所述第一预测向量进行归一化处理,得到所述第一预测概率向量。
在一种可能的实现方式中,所述第一模型的训练方法包括:获取第一样本,所述第一样本包括:第一图像,第一图像的第一面部特征参数的第一参数值;将所述第一样本输入预设初始模型,得到所述第一面部特征参数的第二预测向量;对所述第二预测向量进行归一化处理,得到所述第一面部特征参数的第二预测概率向量;对所述第一参数值进行编码,得到所述第一面部特征参数的真值向量;根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值;根据所述预测偏差值对所述初始模型的参数进行调整;在所述初始模型未收敛时,获取新的样本作为所述第一样本,返回所述将所述第一样本输入预设初始模型的步骤,直到所述初始模型收敛,得到所述第一模型。
在一种可能的实现方式中,所述预设初始模型包括:卷积神经网络和全连接网络,所述卷积神经网络用于作为所述预设初始模型的输入,所述全连接网络用于作为所述预设初始模型的输出;所述全连接网络的输出端针对于所述第一面部特征参数设置有N+1个神经元,所述N+1个神经元的输出作为所述第一面部特征参数的第二预测向量的向量元素。
第二方面,本申请实施例提供一种参数检测模型建立方法,包括:获取第一样本,所述第一样本包括:第一图像,第一图像的第一面部特征参数的第一参数值;将所述第一样本输入预设初始模型,得到所述第一面部特征参数的第二预测向量;对所述第二预测向量进行归一化处理,得到所述第一面部特征参数的第二预测概率向量;对所述第一参数值进行编码得到所述第一面部特征参数的真值向量;根据所述第二预测概 率向量以及所述真值向量使用预设损失函数计算预测偏差值;根据所述预测偏差值对所述初始模型的参数进行调整;在所述初始模型未收敛时,获取新的样本作为所述第一样本,返回所述将所述第一样本输入预设初始模型的步骤,直到所述初始模型收敛,得到所述参数检测模型。
在一种可能的实现方式中,所述预设初始模型包括:卷积神经网络和全连接网络,卷积神经网络作为上述预设初始模型的输入,全连接网络作为预设初始模型的输出,卷积神经网络可以有效提取所述第一样本的特征,将该特征输入给所述全连接网络,全连接网络根据该特征输出第一面部特征参数的预测向量;所述全连接网络的输出端针对于所述第一面部特征参数设置有N+1个神经元,所述N+1个神经元的输出作为所述第一面部特征参数的第二预测向量的向量元素。
在一种可能的实现方式中,所述对所述第二预测向量进行归一化处理,包括:使用归一化指数函数对所述第二预测向量进行归一化处理。
在一种可能的实现方式中,所述对所述第一参数值进行编码得到所述第一面部特征参数的真值向量,包括:使用高斯分布函数对所述第一参数值进行编码,得到所述第一面部特征参数的真值向量;或者,使用任意分布函数对所述第一参数值进行编码,得到所述第一面部特征参数的真值向量。
在一种可能的实现方式中,在使用高斯分布函数对所述第一参数值进行编码时,所述根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值,包括:
使用以下损失函数的公式计算所述预测偏差值包括的第i个向量元素Li
其中,Si是第二预测概率向量的第i个向量元素,Ti是所述真值向量的第i个向量元素,i是整数,且i∈[0,N]。
在一种可能的实现方式中,在使用任意分布函数对所述第一参数值进行编码时,所述根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值,包括:
使用以下损失函数的公式计算所述预测偏差值:
L(Sm,Sm+1)=-((m+1-v)log(Sm)+(v-m)log(Sm+1))
其中,m=int(v),v是所述第一参数值,int为取整操作,Sm是所述第二预测概率向量的第m个向量元素,Sm+1是所述第二预测概率向量的第m+1个向量元素。
第三方面,本申请实施例提供一种电子设备,包括:一个或多个处理器,所述处理器与存储器耦合,所述存储器中存储有一个或多个计算机程序;所述一个或多个计算机程序被所述处理器执行时,使得所述电子设备执行第一方面任一项所述的方法。
第四方面,本申请实施例提供一种电子设备,包括:一个或多个处理器,所述处理器与存储器耦合,所述存储器中存储有一个或多个计算机程序;所述一个或多个计算机程序被所述处理器执行时,使得所述电子设备执行第二方面任一项所述的方法。
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行第一方面任一项所述的方法。
第六方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行第二方面任一项所述的方法。
第七方面,本申请实施例提供一种计算机程序产品,计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行第一方面任一项的方法。
第八方面,本申请实施例提供一种计算机程序产品,计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行第二方面任一项的方法。
第九方面,本申请提供一种计算机程序,当所述计算机程序被计算机执行时,用于执行第一方面或第二方面所述的方法。
在一种可能的设计中,第九方面中的程序可以全部或者部分存储在与处理器封装在一起的存储介质上,也可以部分或者全部存储在不与处理器封装在一起的存储器上。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本申请实施例提供的参数检测方法适用的一种系统结构示意图;
图2为本申请实施例提供的电子设备的一种结构示意图;
图3为本申请实施例提供的参数检测模型建立方法的一种流程示意图;
图4为本申请实施例提供的参数检测方法的一种流程示意图;
图5为本申请实施例提供的参数检测方法的另一种流程示意图。
具体实施方式
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
在DMS系统中,需要基于摄像头拍摄到的图像对驾驶员的面部特征参数例如视线参数、面部关键点等进行检测,进而基于驾驶员的面部特征参数对驾驶员的驾驶行为例如驾驶中是否分心等进行检测。但是,现有技术中驾驶员的面部特征参数的检测结果往往不够准确,进而使得DMS系统对驾驶员的驾驶行为检测的检测结果不够准确。
以驾驶员的面部特征参数是驾驶员的视线参数、对驾驶员的视线参数进行检测为例,可以在DMS系统中预设视线检测模型,该视线检测模型的输入是摄像头拍摄到 的图像,输出是图像中驾驶员的视线的俯仰角和偏航角。然而,如果在驾驶员眨眼、闭眼或者物品遮挡驾驶员眼睛等时刻摄像头拍摄得到一图像,该图像中驾驶员的瞳孔不可见,也即该图像中不包括驾驶员的瞳孔图像,但是,视线检测模型仍然会为该图像输出视线的俯仰角和偏航角的角度值,而实际上基于该图像输出的视线上述角度值并不准确,如果将上述驾驶员的瞳孔不可见的图像的上述角度值作为后续驾驶员分心检测的数据基础,会造成驾驶员分心检测结果存在偏差。
为此,本申请实施例提出一种参数检测方法和一种参数检测模型建立方法,能够使得驾驶员的面部特征参数的检测结果更为准确,进而可以使得DMS系统对驾驶员的驾驶行为检测的检测结果更为准确。
对本申请实施例参数检测方法可以适用的系统的架构进行示例性说明。如图1所示,是本申请实施例参数检测方法适用的DMS系统的一种结构示意图,包括:摄像头110、电子设备120;其中,
摄像头110用于捕获静态图像或视频。摄像头110可以设置于车辆中,位置可以位于驾驶员的前方,对准驾驶员的面部。驾驶员坐于驾驶位时,摄像头110捕获的静态图像或者视频中包括驾驶员的面部图像。物体通过镜头生成光学图像投射到摄像头110的感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给电子设备120。
电子设备120用于将电信号转换成数字图像信号,根据数字图像信号进行驾驶员的面部特征参数的检测,根据驾驶员的面部特征参数的检测结果对驾驶员的驾驶行为进行检测等。电子设备120可以与摄像头110设置于同一车辆中。
在另一个实施例中,摄像头110中可以设置图像信号处理器(image signal processor,ISP),则上述电信号可以由ISP转换成数字图像信号,从而摄像头110将ISP转换成的数字图像信号直接传输至电子设备120,此时,电子设备120无需进行上述电信号到数字图像信号的转换。
本申请实施例所称之电子设备可以是车载设备、车联网终端、电脑、膝上型计算机、手持式通信设备、手持式计算设备、和/或用于在无线系统上进行通信的其它设备以及下一代通信系统,例如,5G网络中的移动终端或者未来演进的公共陆地移动网络(Public Land Mobile Network,PLMN)网络中的移动终端等。
如图2所示,为本申请实施例提供的电子设备的一种结构示意图。电子设备200包括:处理器210,存储器220,显示屏230,等。上述电子设备120可以通过图2所示电子设备200实现。
可选地,为了电子设备的功能更为完善,电子设备还可以包括:天线,移动通信模块,无线通信模块,音频模块,扬声器,受话器,麦克风,耳机接口,充电管理模块,电源管理模块,电池等中的一种或者多种器件,本申请实施例不作限定。
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit, GPU),ISP,控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
上述摄像头传输的电信号可以由ISP转换成数字图像信号。ISP可以将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器210中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器210中的存储器为高速缓冲存储器。该存储器可以保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器210的等待时间,因而提高了系统的效率。
存储器220可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。存储器220可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备200使用过程中所创建的数据(比如音频数据等)等。此外,存储器220可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器210通过运行存储在存储器220的指令,和/或存储在设置于处理器中的存储器的指令,执行电子设备200的各种功能应用以及数据处理。
需要说明的是,图2所示的实施例中以存储器220设置于电子设备200中为例,在本申请实施例提供的其他实施例中,电子设备200中也可以不设置上述存储器220,此时,存储器220可以通过电子设备200提供的接口与电子设备200连接,进而可以与电子设备200中的处理器210连接。
显示屏230用于显示图像,视频等。显示屏230包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备200可以包括1个或多个显示屏230。
以下,将结合上述系统架构和电子设备的结构对本申请实施例参数检测方法进行详细说明。
首先,对本申请实施例参数检测模型建立方法进行示例性说明。
本申请实施例中预设训练样本集,训练样本集中包括若干个样本。样本数量本申请实施例不作限定。
每个样本包括:图像,图像中至少1个面部特征参数的参数值。
可选地,面部特征参数可以包括但不限于:驾驶员的视线参数,驾驶员的头部姿 态参数,眼部特征点的位置参数,面部特征区域的位置参数,驾驶员的身体区域的位置参数等。
可选地,驾驶员的视线参数可以包括但不限于:俯仰角、偏航角等。
可选地,驾驶员的头部姿态参数可以包括但不限于:俯仰角、偏航角、滚转角等。
可选地,眼部特征点可以包括但不限于:眼睛中心点、眼角点、眼睑点、瞳孔点等。
可选地,面部特征区域可以包括但不限于:眼睛区域、嘴巴区域、鼻子区域等。
可选地,身体特征区域可以包括但不限于:手部区域、身体躯干区域等。
本申请实施例中预设初始模型,初始模型可以包括:卷积神经网络和全连接网络;其中,卷积神经网络用于作为初始模型的输入端,接收训练样本,卷积神经网络可以有效提取样本的特征,该特征输入给全连接网络;全连接网络用于作为初始模型的输出端,输出面部特征参数的预测向量。
本申请实施例中,每个面部特征参数的预测向量是一个N+1维数组。也即是说,全连接网络的输出端针对于每个面部特征参数设置N+1个神经元,作为输出该面部特征参数的向量元素的神经元,从而针对于每个面部特征参数,其对应的N+1个神经元输出的向量元素组成该面部特征参数的预测向量。N是自然数。
举例来说,对于视线的俯仰角这一参数,全连接网络的输出端包括针对于该俯仰角的N+1个神经元,对于视线的偏航角这一参数,全连接网络的输出端包括针对于该偏航角的N+1个神经元。
以下,对本申请实施例的参数检测模型建立方法实现流程进行示例性说明。执行本申请实施例参数检测模型建立方法的电子设备可以是上述电子设备120,也可以是设置于车辆之外的电子设备,本申请实施例不作限定。如图3所示,参数检测模型建立方法可以包括:
步骤301:获取第一样本;第一样本包括:第一图像,第一面部特征参数的第一参数值。
步骤302:将第一样本输入预设初始模型,得到第一面部特征参数的预测向量。
上述第一面部特征参数的预测向量是N+1维数组,本申请实施例中将该预测向量记为y=[y0,y1,…,yN];
预测向量y中的向量元素y0,y1等是第一面部特征参数对应的N+1个神经元的输出值。
需要说明的是,本步骤以及后续步骤303中的第一面部特征参数是第一样本的第一面部特征参数,为了便于说明,省略了第一样本。
步骤303:对第一面部特征参数的预测向量进行归一化处理,得到第一面部特征参数的预测概率向量。
本申请实施例中的归一化处理是指通过计算将上述预测向量中的每个向量元素映射到区间[0,1]。
本申请实施例中,将第一面部特征参数的预测概率向量记为P。
在一种可能的实现方式中,可以使用softmax函数对预测向量y进行归一化处理,得到的预测概率向量P=softmax(y)=[S0,S1,…,SN]。
Softmax函数也称为归一化指数函数。
预测概率向量也是N+1维数组,预测概率向量中的向量元素的和为1,也即
步骤304:对第一参数值进行编码得到第一面部特征参数的真值向量。
设第一参数值为v,真值向量为T,真值向量是N+1维数组,假设为T=[t0,t1,…,tN]。
在一种可能的实现方式中,可以使用高斯分布函数对第一参数值v进行编码,得到真值向量T,计算公式例如下式所示:
其中,i是整数,且i∈[0,N];σ是高斯分布的标准差,σ的具体数值可以根据第一参数值的实际情况设定,如果第一参数值的预测难度较大或者准确度较低,可以将σ设置的相对较大,从而得到的真值向量T的离散程度相对较大,否则,将σ设置的相对较小,从而得到的真值向量T的离散程度相对较小。
在另一种可能的实现方式中,可以使用任意分布函数对第一参数值v进行编码,得到真值向量T。
步骤305:根据第一面部特征参数的真值向量T以及预测概率向量P使用预设损失函数计算预测偏差值。
以下将预测偏差值设为L。
如果真值向量T使用高斯分布函数编码得到,在一种可能的实现方式中,可以使用kullback-Leibler散度作为损失函数,此时,计算得到的预测偏差值L可以是一个N 1维的数组,假设为L=[l0,l1,…,lN],对于预测偏差值中第i个向量元素Li,可以使用如下的损失函数计算:
其中,i是整数,且i∈[0,N]。
如果真值向量T使用任意分布函数编码得到,在一种可能的实现方式中,可以使用以下的损失函数计算预测偏差值:
L(Sm,Sm+1)=-((m+1-v)log(Sm)+(v-m)log(Sm+1))
其中,m=int(v),int为取整操作,Sm是步骤303中计算得到的预测概率向量P的第m个向量元素,Sm+1是步骤303中计算得到的预测概率向量P的第m+1个向量 元素。
举例来说,假设第一面部特征参数是视线的俯仰角,俯仰角的第一参数值为46.8,则m=int(46.8)=46,使用以上的损失函数可以计算预测偏差值L=-((46+1-46.8)log(S46)+(46.8-46)log(S47))=-(0.2log(S46)+0.8log(S47))。
步骤306:使用预测偏差值对预设模型的参数进行调整。
如何使用预测偏差值对预设模型的参数进行调整可以使用相关技术实现,这里不赘述。
步骤307:判断预设初始模型是否收敛,如果是,则得到参数检测模型,如果否,执行步骤308。
其中,判断预设模型是否收敛可以使用相关技术实现,例如判断预测偏差值是否满足预设收敛条件,或者,判断预设模型的参数的调整值是否满足预设收敛条件等,本申请实施例不作限定。
步骤308:获取新样本作为第一样本,返回执行上述步骤302以及后续步骤,直至预设初始模型收敛,得到本申请实施例的参数检测模型。
需要说明的是,步骤308之后还可以包括使用测试样本对上述训练得到的参数检测模型进行测试、根据测试结果对参数检测模型的参数进行调整等步骤,以对参数检测模型进行优化,本申请实施例不作限定。
通过上述图3所示方法建立的参数检测模型,其输入是图像,输出是图像的第一面部特征参数的预测向量,该预测向量是N+1维数组,相较于现有技术中参数检测模型仅输出一个参数值,参数检测模型输出的结果具有更多维度的数据信息,从而提高了参数检测模型对第一面部特征参数的检测结果的准确性。该参数检测模型可以设置于上述电子设备120中,支持电子设备对图像的第一面部特征参数进行检测。
以下,对本申请实施例提供的参数检测方法进行示例性说明。
图4是本申请实施例提供的参数检测方法的一种方法流程示意图,如图4所示,该方法可以应用于上述电子设备中,该方法可以包括:
步骤401:获取第一视频帧图像。
其中,本步骤中获取的第一视频帧图像可以是摄像头实时拍摄的包含驾驶员面部图像的一帧视频帧图像。
步骤402:确定第一视频帧图像的第一面部特征参数的第一预测向量,第一预测向量可以通过将第一视频帧图像输入预设第一模型得到。
为了便于描述,以下将第一视频帧图像的第一面部特征参数简化描述为第一面部特征参数。
本步骤中使用的参数检测模型也即是上述参数检测模型建立方法中得到的参数检测模型,为此,本步骤中延续前述图3中的说明,将第一面部特征参数的预测向量记为y。
步骤403:根据第一面部特征参数的预测向量计算第一面部特征参数的检测值以及检测值的置信度。
为了便于描述,后续描述中将第一面部特征参数的检测值的置信度也称为第一面部特征参数的置信度。
可选地,本步骤可以包括:
对第一面部特征参数的预测向量进行归一化处理,得到预测概率向量;
根据预测概率向量计算第一面部特征参数的检测值和上述检测值的置信度。
需要说明的是,本步骤中对预测向量y进行归一化处理的具体处理方法一般需要与上述参数检测模型建立过程中使用的归一化处理方法相同,以保证后续计算得到的检测值以及置信度的准确性。仍以上述使用softmax函数对预测向量y进行归一化处理为例,则,延续图3中的说明,得到的预测概率向量为P,P=softmax(y)=[S0,S1,…,SN]。
其中,将第一面部特征参数的检测值记为上述根据预测概率向量P计算第一面部特征参数的检测值可以包括:
使用以下的计算公式计算第一面部特征参数的检测值
在一种可能的实现方式中,如果以使用高斯分布函数对第一参数值进行编码为例建立参数检测模型,则上述根据预测概率向量P计算上述检测值的置信度可以包括:
根据预测概率向量P使用以下公式计算检测值的置信度con f:
con f=max(P)
也即是说,可以选择S0~SN中的最大值作为检测值的置信度con f。
在另一种可能的实现方式中,如果以使用任意分布函数对第一参数值进行编码为例建立参数检测模型,则上述根据预测概率向量P计算上述检测值的置信度可以包括:
根据预测概率向量P得到的检测值来计算置信度con f:
其中,round表示四舍五入函数。
通过以上步骤,不仅可以计算得到第一视频帧图像的第一面部特征参数的检测值,还可以计算得到该检测值的置信度,从而,可以通过置信度来表征检测值的可信程度,从而使得第一视频帧图像的第一面部特征参数的检测结果更为准确。
可选地,上述第一视频帧图像的第一面部特征参数的检测值以及置信度可以作为对驾驶员的驾驶行为进行检测的数据基础。此时,如图5所示,本申请实施例参数检测方法在步骤403之后,还可以包括以下的步骤404。
步骤404:根据第一面部特征参数的检测值以及置信度对驾驶员的驾驶行为进行检测。
在根据第一面部特征参数的检测值以及置信度对驾驶员的驾驶行为进行检测时,可以根据检测值的置信度对检测值是否适合作为驾驶行为检测的数据基础进行评估, 例如将置信度低于某个阈值的检测值过滤掉,不作为驾驶员的驾驶行为检测的数据基础,从而使得驾驶员的驾驶行为检测的数据基础更为准确、有效,从而可以提升驾驶员的驾驶行为检测的准确性。
举例来说,上述驾驶行为可以是驾驶员的分心行为,则,第一面部特征参数可以是视线的俯仰角和偏航角,则,本步骤可以包括:
根据视线俯仰角的检测值及其置信度、视线偏航角的检测值及其置信度对驾驶员的分心行为进行检测。
上述根据视线俯仰角的检测值及其置信度、视线偏航角的检测值及其置信度对驾驶员的分心行为进行检测,具体可以包括:
获取第一时间段中每一帧视频帧图像的视线俯仰角的检测值以及置信度,并且,获取第一时间段中每一帧视频帧图像的视线偏航角的检测值以及置信度;
根据每帧视频帧图像的视线俯仰角的置信度以及视线偏航角的置信度对第一时间段中的视频帧图像进行过滤,得到有效视频帧图像;
根据有效视频帧图像的视线俯仰角的检测值和视线偏航角的检测值对驾驶员的分心行为进行检测。
需要说明的是,上述第一时间段可以是以第一视频帧图像的拍摄时刻作为终点的预设时长的时间段,上述预设时长的具体取值本申请实施例不作限定;上述第一帧视频帧图像是上述第一时间段中的最后一帧视频帧图像。
在一种可能的实现方式中,根据每帧视频帧图像的视线俯仰角的置信度以及视线偏航角的置信度对视频帧图像进行过滤时,可以为视线俯仰角的置信度设置第一阈值,为实现偏航角的置信度设置第二阈值,视线俯仰角的置信度低于第一阈值和/或视线偏航角的置信度低于第二阈值的视频帧图像被过滤掉,剩余的视频帧图像,也即视线俯仰角的置信度不低于第一阈值且视线偏航角的置信度不低于第二阈值的视频帧图像,作为上述有效视频帧图像。
上述根据有效视频帧图像的视线俯仰角的检测值和视线偏航角的检测值对驾驶员的分心行为进行检测,可以使用相关分心行为检测方法实现,本申请实施例不作限定。
通过以上的视频帧图像过滤,可以将视线俯仰角的置信度低和/或视线偏航角的置信度低的视频帧图像过滤掉,举例来说,视频帧图像中没有驾驶员瞳孔图像的视频帧图像其视线俯仰角的置信度和/或视线偏航角的置信度往往较低,可以被过滤掉,从而使得驾驶员分心行为检测的基础数据更为准确、有效,从而可以提升驾驶员分心行为检测的准确性。
本申请实施例还提供一种电子设备,包括:处理器,所述处理器与存储器耦合,所述存储器中存储有计算机程序;计算机程序被所述处理器执行时,所述处理器用于实现本申请图3~图5所示实施例提供的方法。
本申请实施例还提供一种电子设备,所述设备包括存储介质和中央处理器,所述存储介质可以是非易失性存储介质,所述存储介质中存储有计算机可执行程序,所述 中央处理器与所述非易失性存储介质连接,并执行所述计算机可执行程序以实现本申请图3~图5所示实施例提供的方法。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请图3~图5所示实施例提供的方法。
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请图3~图5所示实施例提供的方法。
本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a和b,a和c,b和c或a和b和c,其中a,b,c可以是单个,也可以是多个。
本领域普通技术人员可以意识到,本文中公开的实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory;以下简称:ROM)、随机存取存储器(Random Access Memory;以下简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种参数检测方法,其特征在于,包括:
    获取第一视频帧图像;所述第一视频帧图像包括驾驶员的面部图像;
    确定第一面部特征参数的第一预测向量,所述第一预测向量通过将所述第一视频帧图像输入预设第一模型得到;所述第一模型用于检测图像中驾驶员的第一面部特征参数;
    根据所述第一面部特征参数的第一预测向量计算所述第一面部特征参数的检测值以及所述检测值的置信度。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一面部特征参数的第一预测向量计算所述第一面部特征参数的检测值以及所述检测值的置信度,包括:
    对所述第一预测向量进行归一化处理,得到所述第一面部特征参数的第一预测概率向量;
    根据所述第一预测概率向量计算所述第一面部特征参数的检测值和所述检测值的置信度。
  3. 根据权利要求2所述的方法,其特征在于,所述第一预测概率向量记为P=[S0,S1,…,SN],N是自然数,S0、S1、…、SN分别为第一预测概率向量的向量元素时,所述根据所述第一预测概率向量计算所述第一面部特征参数的检测值,包括:
    根据以下公式计算所述第一面部特征参数的检测值
    其中,i是整数,且i∈[0,N]。
  4. 根据权利要求2或3所述的方法,其特征在于,所述第一预测概率向量记为P=[S0,S1,…,SN],N是自然数,S0、S1、…、SN分别为第一预测概率向量的向量元素时,根据所述第一预测概率向量计算所述检测值的置信度,包括:
    使用以下公式计算所述检测值的置信度con f:
    con f=max(P)。
  5. 根据权利要求2或3所述的方法,其特征在于,所述第一预测概率向量记为P=[S0,S1,…,SN],N是自然数,S0、S1、…、SN分别为第一预测概率向量的向量元素时,所述根据所述第一面部特征参数的检测值计算所述检测值的置信度,包括:
    使用以下公式计算所述第一面部特征参数的置信度con f:
    其中,是所述第一面部特征参数的检测值,round表示四舍五入函数。
  6. 根据权利要求2或3所述的方法,其特征在于,所述对所述第一预测向量进行归一化处理,得到所述第一面部特征参数的第一预测概率向量,包括:
    使用归一化指数函数对所述第一预测向量进行归一化处理,得到所述第一预测概率向量。
  7. 根据权利要求1至3任一项所述的方法,其特征在于,所述第一模型的训练方法包括:
    获取第一样本,所述第一样本包括:第一图像,第一图像的第一面部特征参数的第一参数值;
    将所述第一样本输入预设初始模型,得到所述第一面部特征参数的第二预测向量;
    对所述第二预测向量进行归一化处理,得到所述第一面部特征参数的第二预测概率向量;
    对所述第一参数值进行编码,得到所述第一面部特征参数的真值向量;
    根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值;
    根据所述预测偏差值对所述初始模型的参数进行调整;
    在所述初始模型未收敛时,获取新的样本作为所述第一样本,返回所述将所述第一样本输入预设初始模型的步骤,直到所述初始模型收敛,得到所述第一模型。
  8. 根据权利要求7所述的方法,其特征在于,所述预设初始模型包括:卷积神经网络和全连接网络,所述卷积神经网络用于作为所述预设初始模型的输入,所述全连接网络用于作为所述预设初始模型的输出;
    所述全连接网络的输出端针对于所述第一面部特征参数设置有N+1个神经元,所述N+1个神经元的输出作为所述第一面部特征参数的第二预测向量的向量元素。
  9. 一种参数检测模型建立方法,其特征在于,包括:
    获取第一样本,所述第一样本包括:第一图像,第一图像的第一面部特征参数的第一参数值;
    将所述第一样本输入预设初始模型,得到所述第一面部特征参数的第二预测向量;
    对所述第二预测向量进行归一化处理,得到所述第一面部特征参数的第二预测概率向量;
    对所述第一参数值进行编码得到所述第一面部特征参数的真值向量;
    根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值;
    根据所述预测偏差值对所述初始模型的参数进行调整;
    在所述初始模型未收敛时,获取新的样本作为所述第一样本,返回所述将所述第一样本输入预设初始模型的步骤,直到所述初始模型收敛,得到所述参数检测模型。
  10. 根据权利要求9所述的方法,其特征在于,所述预设初始模型包括:卷积神经网络和全连接网络,所述卷积神经网络用于作为所述预设初始模型的输入,所 述全连接网络用于作为所述预设初始模型的输出;
    所述全连接网络的输出端针对于所述第一面部特征参数设置有N+1个神经元,所述N+1个神经元的输出作为所述第一面部特征参数的第二预测向量的向量元素。
  11. 根据权利要求9或10所述的方法,其特征在于,所述对所述第二预测向量进行归一化处理,包括:
    使用归一化指数函数对所述第二预测向量进行归一化处理。
  12. 根据权利要求9或10所述的方法,其特征在于,所述对所述第一参数值进行编码得到所述第一面部特征参数的真值向量,包括:
    使用高斯分布函数对所述第一参数值进行编码,得到所述第一面部特征参数的真值向量;或者,
    使用任意分布函数对所述第一参数值进行编码,得到所述第一面部特征参数的真值向量。
  13. 根据权利要求12所述的方法,其特征在于,在使用高斯分布函数对所述第一参数值进行编码时,所述根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值,包括:
    使用以下损失函数的公式计算所述预测偏差值包括的第i个向量元素Li
    其中,Si是所述第二预测概率向量的第i个向量元素,Ti是所述真值向量的第i个向量元素,i是整数,且i∈[0,N]。
  14. 根据权利要求12所述的方法,其特征在于,在使用任意分布函数对所述第一参数值进行编码时,所述根据所述第二预测概率向量以及所述真值向量使用预设损失函数计算预测偏差值,包括:
    使用以下损失函数的公式计算所述预测偏差值:
    L(Sm,Sm+1)=-((m+1-v)log(Sm)+(v-m)log(Sm+1))
    其中,m=int(v),v是所述第一参数值,int为取整操作,Sm是所述第二预测概率向量的第m个向量元素,Sm+1是所述第二预测概率向量的第m+1个向量元素。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器,所述处理器与存储器耦合,所述存储器中存储有一个或多个计算机程序;所述一个或多个计算机程序被所述处理器执行时,使得所述电子设备执行权利要求1至8任一项所述的方法。
  16. 一种电子设备,其特征在于,包括:
    一个或多个处理器,所述处理器与存储器耦合,所述存储器中存储有一个或多个计算机程序;所述一个或多个计算机程序被所述处理器执行时,使得所述电子设备执行权利要求9至14任一项所述的方法。
  17. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储 有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求1至8任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求9至14任一项所述的方法。
PCT/CN2023/085382 2022-06-28 2023-03-31 参数检测方法和设备 WO2024001365A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210748481.9 2022-06-28
CN202210748481.9A CN117351463A (zh) 2022-06-28 2022-06-28 参数检测方法和设备

Publications (1)

Publication Number Publication Date
WO2024001365A1 true WO2024001365A1 (zh) 2024-01-04

Family

ID=89356236

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085382 WO2024001365A1 (zh) 2022-06-28 2023-03-31 参数检测方法和设备

Country Status (2)

Country Link
CN (1) CN117351463A (zh)
WO (1) WO2024001365A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070067A (zh) * 2019-04-29 2019-07-30 北京金山云网络技术有限公司 视频分类方法及其模型的训练方法、装置和电子设备
WO2020124390A1 (zh) * 2018-12-18 2020-06-25 华为技术有限公司 一种面部属性的识别方法及电子设备
CN112257503A (zh) * 2020-09-16 2021-01-22 深圳微步信息股份有限公司 一种性别年龄识别方法、装置及存储介质
US20210026446A1 (en) * 2019-07-26 2021-01-28 Samsung Electronics Co., Ltd. Method and apparatus with gaze tracking
CN112307900A (zh) * 2020-09-27 2021-02-02 北京迈格威科技有限公司 面部图像质量的评估方法、装置和电子设备
CN113780249A (zh) * 2021-11-10 2021-12-10 腾讯科技(深圳)有限公司 表情识别模型的处理方法、装置、设备、介质和程序产品

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020124390A1 (zh) * 2018-12-18 2020-06-25 华为技术有限公司 一种面部属性的识别方法及电子设备
CN110070067A (zh) * 2019-04-29 2019-07-30 北京金山云网络技术有限公司 视频分类方法及其模型的训练方法、装置和电子设备
US20210026446A1 (en) * 2019-07-26 2021-01-28 Samsung Electronics Co., Ltd. Method and apparatus with gaze tracking
CN112257503A (zh) * 2020-09-16 2021-01-22 深圳微步信息股份有限公司 一种性别年龄识别方法、装置及存储介质
CN112307900A (zh) * 2020-09-27 2021-02-02 北京迈格威科技有限公司 面部图像质量的评估方法、装置和电子设备
CN113780249A (zh) * 2021-11-10 2021-12-10 腾讯科技(深圳)有限公司 表情识别模型的处理方法、装置、设备、介质和程序产品

Also Published As

Publication number Publication date
CN117351463A (zh) 2024-01-05

Similar Documents

Publication Publication Date Title
US20220076000A1 (en) Image Processing Method And Apparatus
CN110909630B (zh) 一种异常游戏视频检测方法和装置
US20210382542A1 (en) Screen wakeup method and apparatus
CN110781899B (zh) 图像处理方法及电子设备
KR20210111833A (ko) 타겟의 위치들을 취득하기 위한 방법 및 장치와, 컴퓨터 디바이스 및 저장 매체
CN110826521A (zh) 驾驶员疲劳状态识别方法、系统、电子设备和存储介质
WO2023138560A1 (zh) 风格化图像生成方法、装置、电子设备及存储介质
JP2023549070A (ja) 意味特徴の学習を介したUnseenドメインからの顔認識
WO2020124993A1 (zh) 活体检测方法、装置、电子设备及存储介质
WO2020124994A1 (zh) 活体检测方法、装置、电子设备及存储介质
US20220309780A1 (en) Method For Determining Validity Of Facial Feature, And Electronic Device
US20230306792A1 (en) Spoof Detection Based on Challenge Response Analysis
CN111612723A (zh) 图像修复方法及装置
WO2024001365A1 (zh) 参数检测方法和设备
CN113194281A (zh) 视频解析方法、装置、计算机设备和存储介质
CN110443752B (zh) 一种图像处理方法和移动终端
WO2024021504A1 (zh) 人脸识别模型训练方法、识别方法、装置、设备及介质
WO2023137923A1 (zh) 基于姿态指导的行人重识别方法、装置、设备及存储介质
CN114973347B (zh) 一种活体检测方法、装置及设备
WO2021218695A1 (zh) 一种基于单目摄像头的活体检测方法、设备和可读存储介质
EP3757878A1 (en) Head pose estimation
WO2021189321A1 (zh) 一种图像处理方法和装置
CN108804996B (zh) 人脸验证方法、装置、计算机设备及存储介质
CN107872619B (zh) 一种拍照处理方法、装置及设备
CN114743024A (zh) 一种图像识别方法、装置、系统及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23829555

Country of ref document: EP

Kind code of ref document: A1