WO2019126908A1

WO2019126908A1 - Image data processing method, device and equipment

Info

Publication number: WO2019126908A1
Application number: PCT/CN2017/118174
Authority: WO
Inventors: 张李亮; 李思晋; 封旭阳; 赵丛
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2017-12-25
Filing date: 2017-12-25
Publication date: 2019-07-04
Also published as: CN108701214A

Abstract

Provided are an image data processing method, a device and equipment. The method comprises steps of: receiving a first image of a target object acquired by a first image sensor, and a second image of the target object acquired by a second image sensor; inputting the first image and the second image of the target object to a preset identification model so as to obtain description information used for describing action characteristics within a designated area of the target object; and according to the description information of the action characteristics, determining state information of the target object. Thus, accuracy of fatigue detection can be improved.

Description

Image data processing method, device and device

Technical field

The present invention relates to the field of electronic technologies, and in particular, to an image processing method, apparatus, and device.

Background technique

With the development of transportation technology and the improvement of people's living standards, driving mode has become the best choice for most people with its unique superiority, which brings convenience and comfort to people's travel. However, traffic accidents caused by fatigue driving have a huge impact on people's lives and property.

In practical applications, it is detected whether the driver is in a fatigue driving state by detecting whether the vehicle is pressed to the traffic sign line on the road. However, if the driving skill level of the driver is not high, the traffic sign line on the road may be overwhelmed. As a result, the driver is mistakenly judged as fatigue driving, and it can be seen that the above-described method of detecting fatigue driving is low in accuracy.

Summary of the invention

The embodiment of the invention discloses an image data processing method, device and device, which can improve the accuracy of detecting fatigue driving by processing image data of a target object such as a driver.

In a first aspect, an embodiment of the present invention provides an image data processing method, where the method includes:

Receiving a first image of the target object collected by the first image sensor and a second image of the target object collected by the second image sensor, where the first image includes a grayscale image or an RGB image At least one of the second images comprising a depth image;

Inputting a first image of the target object and a second image of the target object into a preset recognition model to obtain description information for describing an action feature of a specified region of the target object;

Determining state information of the target object according to the action feature description information;

The preset recognition model is configured to identify a first image of the target object and a designated area of the second image of the target object.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, where the apparatus includes:

a receiving module, configured to receive a first image of the target object acquired by the first image sensor, and a second image of the target object collected by the second image sensor, where the first image includes a grayscale image Or at least one of RGB images, the second image comprising a depth image;

An identification module, configured to input a first image of the target object and a second image of the target object into a preset recognition model, to obtain description information for describing an action feature of a specified area of the target object;

a determining module, configured to determine state information of the target object according to the action feature description information;

In a third aspect, an embodiment of the present invention provides an image processing apparatus, including: a processor and a memory, wherein the processor and the memory are connected by a bus, and the memory stores executable program code, and the processing The apparatus is configured to invoke the executable program code to execute the image data processing method according to the first aspect of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored thereon, and when the computer program is executed by at least one processor, the image data processing method described in the first aspect may be implemented.

In a fifth aspect, an embodiment of the present invention provides a computer program product, comprising: a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause a computer to implement the first aspect described above Image data processing method.

According to the embodiment of the present invention, the first image (the first image includes the RGB image or the gray image) and the depth image collected by the first image sensor and the second image sensor can be used as the signal input of the preset recognition model, which can be realized. The first image data is complementary to the depth image data, and the recognition model is optimized in combination with the depth map based on the RGB image, the gray image, and the like, thereby improving driver fatigue detection in a designated area such as a cab. The accuracy of the security is improved.

DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings to be used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without paying for creative labor.

1 is a schematic flow chart of an image data processing method according to an embodiment of the present invention;

2 is a schematic structural diagram of another image data processing system according to an embodiment of the present invention;

3 is a schematic flowchart diagram of another image data processing method according to an embodiment of the present invention;

4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The embodiment of the present invention is applied to an image processing apparatus, where the image processing apparatus includes a first image sensor and a second image sensor, the first image sensor may be a monocular vision sensor, and the second image sensor may be a multi-eye vision sensor. The first image sensor and the second image sensor may be disposed in a camera of the image processing apparatus, for example, a monocular vision sensor is disposed in the monocular camera, and the multi-view visual sensor is disposed in the multi-view camera.

The image processing apparatus in the embodiment of the present invention may be connected to the vehicle and may be disposed in the vehicle, and the first image sensor and the second sensor of the image processing apparatus may dynamically adjust the collected image according to the posture change of the target object on the driving position. Angle so that the image of the target object on the driver's seat can be clearly captured.

The embodiment of the present invention can be applied to detect whether a target object (which may refer to a user) is in a fatigue state, and more specifically, can be applied to detect whether the driver is fatigue driving.

The first image in the embodiment of the present invention includes at least one of a grayscale image or an RGB image, and the second image includes a depth image.

The present invention provides an image data processing method, apparatus and device. The image processing device can receive a first image of a target object collected by the first image sensor, that is, RGB, based on the problem that the accuracy of the current fatigue driving detection method is low. (Red Green Blue) image is a color image with red, green, and blue colors, and the second image sensor collects a second image of the target object, and inputs the first image and the second image of the target object to a preset recognition. In the model, the description information of the action feature for describing the specified area of the target object is obtained, and the state information of the target object is determined according to the action feature description information, and the state information of the target object is used to indicate whether the target object is in fatigue status. The invention uses the image data of the target object collected by various sensors as the input of the signal, can realize the complementation of various signals, thereby providing sufficient information amount for the input end of the preset recognition model, so as to improve the accuracy of the fatigue detection. degree.

The embodiment of the invention discloses an image data processing method, device and device for detecting whether a target object is in a fatigue state based on an image data processing manner, so as to improve the accuracy of fatigue detection, which are respectively described in detail below.

Referring to FIG. 1 , FIG. 1 is a schematic flowchart of an image data processing method according to an embodiment of the present invention. The method is applicable to an image processing apparatus, where the image processing apparatus includes a first image sensor and a second image sensor. The image data method described in the example includes:

S101. Receive a first image of the target object collected by the first image sensor, and a second image of the target object collected by the second image sensor.

Wherein the first image comprises at least one of a grayscale image or an RGB image, the second image comprising a depth image.

In the embodiment of the present invention, if the image data collected by the monocular vision sensor is used as the input of the signal, the quality of the image data collected by the monocular vision sensor is greatly reduced in the case of insufficient ambient light, thereby making the image processing apparatus difficult to Obtaining the required information in the image data; if the image data acquired by the infrared sensor is used as the input of the signal, it is difficult for the infrared sensor to accurately capture the facial features of the target object, thereby making it difficult for the image processing apparatus to obtain the required information from the image data. . That is to say, if the image data collected by a single sensor is used as the input of the signal, it is difficult to ensure that a sufficient amount of information is provided for the input end of the preset recognition model. Therefore, the image processing apparatus can adopt images acquired by various image sensors. As an input to the signal, data can be complemented by multiple signals to provide enough information for the input of the preset recognition model.

Specifically, the image processing apparatus may receive the first image of the target object collected by the first image sensor and the second image of the target object collected by the second image sensor, so that the first image and the second image may be The image is used as an input to the signal.

As an optional implementation manner, the image processing apparatus may detect the light in the current scene, and the light of the current scene does not satisfy the preset light intensity, and the image processing apparatus may turn on the fill light to call the monocular vision sensor (ie, An image sensor is configured to acquire image data of the target object, and the image data of the acquired target object is used as an input of a preset recognition model.

In the embodiment of the present invention, in order to solve the problem that the monocular vision sensor has low imaging quality in a scene with weak light, the image processing apparatus can improve the image quality by turning on the fill light. That is to say, the image processing device can detect the light in the current scene, the light of the current scene does not satisfy the preset light intensity, and the image processing device can determine that the light in the current scene is weak, can open the fill light, and call the monocular. The visual sensor (ie, the first image sensor) collects the image data of the target object, and uses the image data of the acquired target object as an input of the preset recognition model to improve the quality of the captured image.

S102. Input the first image of the target object and the second image of the target object into a preset recognition model, and obtain description information for describing an action feature of the designated area of the target object.

The preset recognition model is used to identify a first image of the target object and a designated area of the second image of the target object, and the preset recognition model may refer to a neural network recognition model.

In the embodiment of the present invention, the image processing apparatus may input the first image of the target object and the second image of the target object into a preset recognition model, where the preset recognition model is used to initialize the first image. Identifying, identifying the target object in the first image, the preset recognition model is further configured to perform depth recognition on the second image according to the identified target object, that is, identify the target object in the second image a region, the description information of the action feature for describing the specified region of the target object is obtained, and the first image and the second image are used as signals of the input end of the recognition model, so that the accuracy of identifying the action feature of the specified region can be improved, and at the same time By identifying only the designated area, the efficiency of obtaining the description information of the action characteristics of the specified area of the target object can be improved, and the resource consumption of the image processing apparatus can be saved.

The designated area of the target object may refer to an eye area, a mouth area, a nose area, and the like of the target object, and the description information of the action feature may include description information of the closed eye feature of the eye area of the target object, or Descriptive information of the opening feature of the mouth region of the target object, or description information of the eye region, the distance feature of the mouth region and the nose region, and the like.

S103. Determine status information of the target object according to the action feature description information.

In the embodiment of the present invention, the image processing device may determine state information of the target object according to the action feature description information, where the state information may be used to indicate whether the target object is in a fatigue state, and may detect whether the target object is in fatigue through image data processing. State can improve the efficiency of fatigue detection.

As an optional implementation manner, if the designated area of the target object includes: a mouth area of the target object; the action feature description information includes: description information of the mouth area of the target object in an open feature; The specific manner of determining the state information of the target object by the action feature description information includes: according to the description information of the mouth region of the target object obtained in the preset time interval, the mouth region of the target object is in Zhang The number of times the feature is opened, if the number of times the mouth region of the target object is in the open feature is greater than the first preset value, determining state information indicating that the target object is in the specified state.

For example, the preset time interval is 1 minute, and the second preset threshold is 4 times, and the image processing apparatus obtains the mouth of the target object according to the multi-frame first image and the second image in the preset time interval. The part area is in the description information of the open feature, and the number of times the mouth area of the target object is in the open feature is counted. If the mouth area of the target object is in the open feature number of 5 times, the image processing apparatus can determine the The number of times the mouth region of the target object is in the open feature is greater than the second pre-threshold, and status information indicating that the target object is in a fatigue state is determined.

In the embodiment of the present invention, when the target object is in a fatigue state, the face of the target object may exhibit different motion features. Therefore, the image processing device may determine whether the target object is in a fatigue state according to the facial motion feature of the target object. In other words, the image processing apparatus may count, according to the description information of the open feature of the mouth region of the target object obtained in the preset time interval, the number of times the mouth region of the target object is in the open feature, if the target object And determining, by the image processing device, the state object of the target object The mouth area is in the manner of opening the feature number, and it is judged whether the target object is in a fatigue state, and the accuracy of detecting the fatigue state can be improved.

It should be noted that, in order to prevent the target object from being in a speaking state, it is erroneously determined that the target object is in a specified state (ie, the specified state refers to a fatigue state), and thus the mouth region is in an open feature and may refer to the target object. The distance between the upper lip and the lower lip is greater than a preset distance threshold to improve the accuracy of the image processing device in detecting the fatigue state.

In an optional implementation manner, the specified area of the target object includes: an eye area of the target object; the motion feature description information includes: description information of the eye area of the target object in a closed eye feature; The specific manner of determining the state information of the target object by the action feature description information includes: according to the description information of the eye region of the target object obtained in the preset time interval, the eye region of the target object is in the closed eye The number of times of the feature, if the number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold, determining state information indicating that the target object is in the designated state.

For example, the preset time interval is 1 minute, and the second preset threshold is 5 times, and the image processing apparatus obtains the eye of the target object according to the multi-frame first image and the second image in the preset time interval. The part area is in the description information of the closed-eye feature, and the number of times the eye area of the target object is in the closed-eye feature is counted. If the number of times the eye area of the target object is in the closed-eye feature is 6 times, the image processing apparatus can determine the The number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold, and status information indicating that the target object is in a fatigue state is determined.

In the embodiment of the present invention, the image processing apparatus may count, according to the description information of the closed eye feature of the eye region of the target object obtained in the preset time interval, the number of times the eye region of the target object is in the closed eye feature, if If the number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold value, determining state information indicating that the target object is in the specified state (the designated state may refer to the fatigue state), and the image processing device counts the target object by counting The manner in which the eye region is in the degree of closing the eye feature determines whether the target object is in a fatigue state, and the accuracy of detecting the fatigue state can be improved.

In the embodiment of the present invention, the image processing device may receive the first image captured by the first image sensor and the second image of the target object collected by the second image sensor, and the first image and the second image of the target object. The image is input into a preset recognition model, and the description information of the action feature for describing the specified area of the target object is obtained, and the state information of the target object is determined according to the description information of the action feature, and is collected by using multiple image sensors. As the signal input of the recognition model, the image data can be complemented by various signals, thereby providing sufficient information amount for the input end of the preset recognition model, and combining the depth map on the basis of the gray image or the RGB image. The recognition model is optimized to improve the accuracy of fatigue detection.

Based on the above description of the image data processing method, an embodiment of the present invention provides an image data processing system. As shown in FIG. 2, the image data processing system includes an image processing device 201, a vehicle 202, and a driving position located in the vehicle 201. The target object 203 (ie, the target object is also the driver), the image processing apparatus 201 may include a plurality of sensors (the first image sensor 2011 and the second image sensor 2012 are taken as an example), and the image processing apparatus 201 and The image processing device 201 may be disposed on the roof of the vehicle 202 near the driving position, or may be disposed on the console of the vehicle 202 so that the image data of the target object can be clearly collected. The image data processing system can be used to implement an image data processing method. Specifically, refer to FIG. 3, which is an image data processing method according to an embodiment of the present invention. The image data processing method includes:

S301. If the target object is detected, obtain an object identifier of the target object.

In the embodiment of the present invention, the image processing device 201 may use the first image sensor or the second image sensor to collect an image of the driving position of the vehicle 202 to determine whether the driving position of the vehicle has a target object, and if the target object exists, acquire the The object ID of the target object.

The object identifier of the target object may refer to a person's identifier, such as a name; or may be an identifier of the location of the target object, such as China; or may refer to a gender identifier of the target object, such as a man or a woman.

S302. Search for a recognition model associated with the object identifier of the target object, and use the associated recognition model as the preset recognition model.

In the embodiment of the present invention, the image processing apparatus 201 may search for an identification model associated with the object identifier of the target object, and use the associated recognition model as a preset recognition model, so as to adopt the recognition model associated with the object identifier to the target object. The image is identified to improve the accuracy of the recognition.

For example, if the object identifier is a person's name, the image processing apparatus 201 calls the recognition model associated with the name, and if the object identifier is the gender identifier of the target object (such as a man), the image processing apparatus 201 can call A recognition model associated with the gender of the target object.

It should be noted that the image processing apparatus 201 can store a large number of recognition models and call a required recognition model from the stored recognition model; the image processing apparatus 201 can also call a required recognition model from the network server through a network connection to save The memory space of the image processing apparatus 201.

S303. Receive a first image of the target object collected by the first image sensor, and a second image of the target object collected by the second image sensor.

S304. Input the first image of the target object and the second image of the target object into a preset recognition model, and obtain description information for describing an action feature of the designated area of the target object.

As an optional implementation manner, the first image and the second image of the training object are collected, and the first image of the training object and the designated area of the second image are trained by using an initial recognition model to obtain the trained recognition model. .

In the embodiment of the present invention, the image processing device 201 can optimize the recognition model by using the first image and the second image, and can improve the recognition accuracy of the action features of the target object. That is to say, the image processing device 201 can collect the first image and the second image of the training object, and use the initial recognition model to train the designated regions of the first image and the second image of the training object to obtain the trained recognition model. After a lot of training, you can go to the preset recognition model to improve the accuracy of identifying image data.

As an optional implementation manner, the initial recognition model is used to train the first image of the training object and the designated area of the second image, and the specific manner of obtaining the trained recognition model includes: acquiring the current training object Training corpus, using the initial recognition model to identify the first image of the training object and the designated area of the second image, obtaining training description information, determining the similarity between the current training corpus of the training object and the training description information, if the similarity If the degree is less than the preset similarity value, the identification parameter in the initial recognition model is adjusted to obtain the trained recognition model.

In the embodiment of the present invention, the image processing apparatus 201 may receive the input training corpus of the training object, and use the initial recognition model to identify the first image of the training object and the designated area of the second image, and obtain training description information, and determine a similarity between the current training corpus of the training object and the training description information. If the similarity is less than the preset similarity value, determining that the initial model has low recognition accuracy, the image processing apparatus 201 may adjust the initial recognition model. Identifying parameters, inputting the first image and the second image of the next training object into the adjusted recognition model, and repeatedly performing the above steps. After a large amount of training, if the training corpus of the training object is similar to the training description information The degree is greater than the preset similarity value, that is, the recognition model with high stability and recognition accuracy is used as the recognition model after training.

The image processing device 201 can collect the first image and the second image of the user in different regions, different environments or different scenarios as the first image and the second image of the training object, so that the robustness of the recognition model can be improved; image processing The device 201 may also collect only the first image and the second image of the user with a higher frequency of the vehicle 202 as the first image and the second image of the training object, thereby reducing training complexity and improving the recognition model. Utilization rate.

As an optional implementation manner, if a training instruction for the preset recognition model is detected, the first image sensor is called to acquire a training first image of the target object, and the second image sensor is called to collect the target. The second image of the object is trained, and the preset recognition model is trained according to the training first image and the training second image.

In the embodiment of the present invention, when the image processing device 201 detects that the recognition accuracy of the preset recognition model is low, or receives a training instruction for the preset recognition model, the image processing device 201 may invoke the first image sensor 2011. Collecting a training first image of the target object, and calling the second image sensor 2012 to collect a training second image of the target object, and training the preset recognition model according to the training first image and the training second image, The recognition accuracy of the preset recognition model is improved, thereby improving the accuracy of identifying fatigue driving.

S305. Determine status information of the target object according to the action feature description information.

The status information is used to determine whether at least the target object is in a fatigue driving state.

As an optional implementation manner, if it is determined according to the state information of the target object that the vehicle needs to be suspended, the prompt information is output, and the prompt information is used to prompt the target object to pause driving the vehicle.

In the embodiment of the present invention, if the image processing apparatus 201 determines that the target object is in a fatigue driving state according to the state information of the target object, it may be determined that the driving of the vehicle 202 needs to be suspended, and the image processing apparatus 201 may output prompt information to prompt the target object. Suspension of driving the vehicle can improve the safety of driving the vehicle.

The prompt information may be prompted by voice, or may be prompted by a manner displayed on a display screen, or a plurality of combined manners.

As an optional implementation manner, if it is determined according to the state information of the target object that the automatic driving mode of the vehicle needs to be activated, the vehicle is controlled to start the automatic driving mode.

In the embodiment of the present invention, if the image processing apparatus 201 determines that the target object is in a fatigue driving state according to the state information of the target object, it may be determined that the automatic driving mode of the vehicle 202 needs to be activated, and then controlling the vehicle to start the automatic driving mode may prevent Traffic accidents caused by fatigue driving can improve the safety of driving.

In the embodiment of the present invention, the image processing apparatus may establish a connection with the vehicle, the image processing apparatus detects the target object located in the driving position of the vehicle, acquires the object identifier of the target object, acquires an identification model associated with the object identifier, and associates The recognition model as a preset recognition model can improve the accuracy of recognition. In addition, the image processing apparatus adopts image data collected by a plurality of image sensors as a signal input of a preset recognition model, and can complement various signals, thereby providing a sufficient amount of information for the input end of the preset recognition model, and The recognition model is optimized based on the RGB image or the gray image combined with the depth map, thereby improving the accuracy of the fatigue driving detection and improving the driving safety of the vehicle.

Based on the above description of the image data processing method and the image data processing system, an embodiment of the present invention provides an image processing apparatus. Referring to FIG. 4, the image processing apparatus shown in FIG. 4 may include:

The first image sensor 401 is configured to collect a first image of the target object.

The second image sensor 402 is configured to collect a second image of the target object.

The receiving module 403 is configured to receive a first image of the target object collected by the first image sensor and a second image of the target object collected by the second image sensor.

An identification module 404, configured to input a first image of the target object and a second image of the target object into a preset recognition model, to obtain description information for describing an action feature of a specified area of the target object .

The determining module 405 is configured to determine state information of the target object according to the action feature description information.

The designated area of the target object includes: a mouth area of the target object; and the action feature description information includes: description information of the mouth area of the target object being in an open feature.

Optionally, the determining module 405 is configured to: according to the description information of the mouth region of the target object obtained in the preset time interval, the mouth region of the target object is in an open feature. The number of times; if the number of times the mouth region of the target object is in the open feature is greater than the first predetermined threshold, determining state information indicating that the target object is in the specified state.

The specified area of the target object includes: an eye area of the target object; and the action feature description information includes: description information of the eye area of the target object in a closed eye feature.

Optionally, the determining module 405 is configured to: according to the description information of the eye region of the target object obtained in the preset time interval, the eye region of the target object is in the closed eye feature The number of times; if the number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold, determining state information indicating that the target object is in the specified state.

Optionally, the image processing device is connected to a vehicle, and the image processing device is configured to collect image information of an object located in a driving position.

Optionally, the output module 406 is configured to: if it is determined that the vehicle needs to be paused according to the state information of the target object, output prompt information, where the prompt information is used to prompt the target object to pause driving the vehicle.

Optionally, the control module 407 is configured to control the vehicle to start the automatic driving mode if it is determined according to the state information of the target object that the automatic driving mode of the vehicle needs to be started.

Optionally, the calling module 408 is configured to: if the training instruction for the preset recognition model is detected, invoke the first image sensor to acquire a training first image of the target object, and invoke the second image A sensor acquires a training second image of the target object.

Optionally, the first training module 409 is configured to train the preset recognition model according to the training first image and the training second image.

Optionally, the first image sensor 401 is further configured to collect a first image of the training object.

Optionally, the second image sensor 402 is further configured to collect a second image of the training object.

Optionally, the second training module 410 is configured to train the designated areas of the first image and the second image of the training object by using an initial recognition model to obtain the trained recognition model.

Optionally, the second training module 410 is configured to acquire a current training corpus of the training object, and use the initial recognition model to identify a first image of the training object and a designated area of the second image to obtain training description information. Determining a similarity between the current training corpus of the training object and the training description information; if the similarity is less than a preset similarity value, adjusting the identification parameter in the initial recognition model to obtain the trained Identify the model.

Optionally, the obtaining module 411 is configured to acquire an object identifier of the target object if the target object is detected.

Optionally, the searching module 412 is configured to search for a recognition model associated with the object identifier of the target object.

Optionally, the receiving module 403 is specifically configured to use the associated recognition model as the preset recognition model, and execute the first image that receives the target object collected by the first image sensor, and The step of the second image of the target object collected by the second image sensor.

In the embodiment of the present invention, the image processing device may receive the first image captured by the first image sensor and the second image of the target object collected by the second image sensor, and the first image and the second image of the target object. The image is input into a preset recognition model, and the description information of the action feature for describing the specified area of the target object is obtained, and the state information of the target object is determined according to the description information of the action feature, and is collected by using multiple image sensors. As the signal input of the recognition model, the image data can realize the complementation of various signals, thereby providing sufficient information amount for the input end of the preset recognition model, thereby improving the accuracy of the fatigue detection.

Referring to FIG. 5, FIG. 5 is a schematic block diagram of an image processing apparatus according to an embodiment of the present invention. An image processing apparatus in this embodiment as shown may include: at least one processor 501, such as a CPU; at least one memory 502, a communication device 503, a sensor 504, a controller 505, the processor 501, and the memory 502. The communication device 503, the sensor 504, and the controller 505 are connected by a bus 506.

The communication device 503 can be used to output prompt information, and can also be used to establish a communication connection with the vehicle and send an instruction to the vehicle.

The sensor 504 includes a first image sensor and a second image sensor. The first image sensor may be a monocular vision sensor, the second image sensor may be a multi-eye vision sensor, and the first image sensor is used to collect a target object. An image, a second image sensor, for employing the second image of the target object.

The controller 505 is configured to control the vehicle to start the automatic mode when it is required to control the vehicle to start the automatic driving.

Memory 502 is used to store instructions, and processor 501 calls program code stored in memory 502.

Specifically, the processor 501 calls the program code stored in the memory 502 to perform the following operations:

Receiving a first image of the target object collected by the first image sensor, and a second image of the target object collected by the second image sensor;

Optionally, the specified area of the target object includes: a mouth area of the target object; the action feature description information includes: description information of the mouth area of the target object in an open feature; The program code stored in the memory 502 can also perform the following operations:

And counting, according to the description information of the open feature of the mouth region of the target object obtained in the preset time interval, counting the number of times the mouth region of the target object is in the open feature;

And determining, if the number of times the mouth region of the target object is in the open feature is greater than the first preset threshold, status information indicating that the target object is in the specified state.

Optionally, the specified area of the target object includes: an eye area of the target object; the action feature description information includes: description information of the eye area of the target object in a closed eye feature; The program code stored in the memory 502 can also perform the following operations:

And counting, according to the description information of the closed eye feature of the eye region of the target object obtained in the preset time interval, counting the number of times the eye region of the target object is in the closed eye feature;

If the number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold, determining state information indicating that the target object is in the designated state.

Optionally, the processor 501 calls the program code stored in the memory 502, and may also perform the following operations:

If it is determined according to the state information of the target object that the vehicle needs to be suspended, the prompt information is output, and the prompt information is used to prompt the target object to pause driving the vehicle.

If it is determined according to the state information of the target object that the automatic driving mode of the vehicle needs to be activated, the vehicle is controlled to start the automatic driving mode.

If the training instruction for the preset recognition model is detected, calling the first image sensor to acquire a training first image of the target object, and calling the second image sensor to acquire the training target of the target object image;

The preset recognition model is trained according to the training first image and the training second image.

Collecting a first image and a second image of the training object;

The first image of the training object and the designated area of the second image are trained by using an initial recognition model to obtain the trained recognition model.

Obtaining a current training corpus of the training object;

Using the initial recognition model to identify the first image of the training object and the designated area of the second image to obtain training description information;

Determining a similarity between the current training corpus of the training object and the training description information;

If the similarity is less than the preset similarity value, the identification parameter in the initial recognition model is adjusted to obtain the trained recognition model.

Obtaining an object identifier of the target object if the target object is detected;

Finding a recognition model associated with the object identifier of the target object;

Using the associated recognition model as the preset recognition model, and performing the receiving the first image of the target object collected by the first image sensor, and the target object collected by the second image sensor The step of the second image.

In the embodiment of the present invention, the image processing device may receive the first image captured by the first image sensor and the second image of the target object collected by the second image sensor, and the first image and the second image of the target object. The image is input into a preset recognition model, and the description information of the action feature for describing the specified area of the target object is obtained, and the state information of the target object is determined according to the description information of the action feature, and is collected by using multiple image sensors. As the signal input of the recognition model, the image data can realize the complementation of various signals, thereby providing sufficient information amount for the input end of the preset recognition model, and combining the depth map on the basis of the gray image and the RGB image. The recognition model is optimized to improve the accuracy of fatigue detection.

The present application also provides a computer program product comprising a non-transitory computer readable storage medium storing a computer program operative to cause a computer to perform the above-described embodiments of FIGS. 1 and 3 The steps of the image data method, the implementation of the problem and the beneficial effects of the computer program product can be referred to the embodiments and the beneficial effects of the image data method of FIG. 1 and FIG. 3 above, and the repeated description is omitted.

It should be noted that, for the foregoing various method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

A person skilled in the art may understand that all or part of the various steps of the foregoing embodiments may be performed by a program to instruct related hardware. The program may be stored in a computer readable storage medium, and the storage medium may include: Flash disk, Read-Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.

The above disclosure is only a part of the embodiments of the present invention, and of course, the scope of the present invention is not limited thereto, and those skilled in the art can understand all or part of the process of implementing the above embodiments, and according to the claims of the present invention. Equivalent changes made are still within the scope of the invention.

Claims

An image data processing method is applied to an image processing apparatus, the image processing apparatus comprising a first image sensor and a second image sensor, the method comprising:

Receiving a first image of the target object collected by the first image sensor and a second image of the target object collected by the second image sensor, where the first image includes a grayscale image or an RGB image At least one of the second images comprising a depth image;

Inputting a first image of the target object and a second image of the target object into a preset recognition model to obtain description information for describing an action feature of a specified region of the target object;

Determining state information of the target object according to the action feature description information.
The method according to claim 1, wherein the designated area of the target object comprises: a mouth area of the target object; and the action feature description information comprises: the mouth area of the target object is open Description of the feature.
The method according to claim 2, wherein the determining the state information of the target object according to the action feature description information comprises:

And counting, according to the description information of the open feature of the mouth region of the target object obtained in the preset time interval, counting the number of times the mouth region of the target object is in the open feature;

And determining, if the number of times the mouth region of the target object is in the open feature is greater than the first preset threshold, status information indicating that the target object is in the specified state.
The method according to claim 1, wherein the designated area of the target object comprises: an eye area of the target object; and the action feature description information comprises: the eye area of the target object is in a closed eye Description of the feature.
The method according to claim 4, wherein the determining the state information of the target object according to the action feature description information comprises:

And counting, according to the description information of the closed eye feature of the eye region of the target object obtained in the preset time interval, counting the number of times the eye region of the target object is in the closed eye feature;

If the number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold, determining state information indicating that the target object is in the designated state.
The method according to any one of claims 1 to 5, wherein the image processing device is connected to a vehicle, and the image processing device is configured to collect image information of an object located in a driving position.
The method of claim 6 further comprising:

If it is determined according to the state information of the target object that the vehicle needs to be suspended, the prompt information is output, and the prompt information is used to prompt the target object to pause driving the vehicle.
The method of claim 6 further comprising:

If it is determined according to the state information of the target object that the automatic driving mode of the vehicle needs to be activated, the vehicle is controlled to start the automatic driving mode.
The method according to claim 7 or 8, further comprising:

If the training instruction for the preset recognition model is detected, calling the first image sensor to acquire a training first image of the target object, and calling the second image sensor to acquire the training target of the target object image;

The preset recognition model is trained according to the training first image and the training second image.
The method according to claim 7 or 8, further comprising:

Collecting a first image and a second image of the training object;

The first image of the training object and the designated area of the second image are trained by using an initial recognition model to obtain the trained recognition model.
The method according to claim 10, wherein the initial recognition model is used to train the designated areas of the first image and the second image of the training object to obtain the trained recognition model, including:

Obtaining a current training corpus of the training object;

Using the initial recognition model to identify the first image of the training object and the designated area of the second image to obtain training description information;

Determining a similarity between the current training corpus of the training object and the training description information;

If the similarity is less than the preset similarity value, the identification parameter in the initial recognition model is adjusted to obtain the trained recognition model.
The method according to claim 1 or 11, further comprising:

Obtaining an object identifier of the target object if the target object is detected;

Finding a recognition model associated with the object identifier of the target object;

Using the associated recognition model as the preset recognition model, and performing the receiving the first image of the target object collected by the first image sensor, and the target object collected by the second image sensor The step of the second image.
An image processing apparatus, comprising: a first image sensor and a second image sensor, the apparatus comprising:

a receiving module, configured to receive a first image of the target object collected by the first image sensor, and a second image of the target object collected by the second image sensor, where the first image includes a grayscale image Or at least one of RGB images, the second image comprising a depth image;

An identification module, configured to input a first image of the target object and a second image of the target object into a preset recognition model, to obtain description information for describing an action feature of a specified area of the target object;

a determining module, configured to determine state information of the target object according to the action feature description information;

The preset recognition model is configured to identify a first image of the target object and a designated area of the second image of the target object.
The apparatus according to claim 13, wherein the designated area of the target object comprises: a mouth area of the target object; and the action feature description information comprises: the mouth area of the target object is open Description of the feature.
The device of claim 14 wherein:

The determining module is configured to: according to the description information of the mouth region of the target object obtained in the preset time interval, the number of times the mouth region of the target object is in the open feature; When the number of times the mouth region of the target object is in the open feature is greater than the first preset threshold, state information indicating that the target object is in the specified state is determined.
The apparatus according to claim 13, wherein the designated area of the target object comprises: an eye area of the target object; and the action feature description information comprises: an eye area of the target object is in a closed eye Description of the feature.
The device of claim 16 wherein:

The determining module is configured to count, according to the description information of the closed eye feature of the eye region of the target object obtained in the preset time interval, the number of times the eye region of the target object is in the closed eye feature; If the number of times the eye region of the target object is in the closed eye feature is greater than the second pre-threshold, state information indicating that the target object is in the specified state is determined.
Apparatus according to any one of claims 13-17, wherein said image processing means is coupled to a vehicle, said image processing means for acquiring image information of an object located in the driver's seat.
The device according to claim 18, further comprising:

And an output module, configured to output prompt information for prompting the target object to pause driving the vehicle if it is determined that the vehicle needs to be paused according to the state information of the target object.
The device according to claim 18, further comprising:

And a control module, configured to control the vehicle to start an automatic driving mode if it is determined according to state information of the target object that an automatic driving mode of the vehicle needs to be activated.
The device according to claim 19 or 20, further comprising:

Calling a module, if the training instruction for the preset recognition model is detected, calling the first image sensor to acquire a training first image of the target object, and calling the second image sensor to collect the target Training the second image of the object;

The first training module is configured to train the preset recognition model according to the training the first image and the training the second image.
The device according to claim 19 or 20, further comprising:

The first image sensor is further configured to collect a first image of the training object;

The second image sensor is further configured to collect a second image of the training object;

The image processing apparatus further includes:

And a second training module, configured to use the initial recognition model to train the first image of the training object and the designated area of the second image to obtain the trained recognition model.
The device according to claim 22, wherein

a second training module, configured to acquire a current training corpus of the training object; use the initial recognition model to identify a first image of the training object and a designated area of the second image, to obtain training description information; and determine the training The similarity between the current training corpus of the object and the training description information; if the similarity is less than the preset similarity value, adjusting the identification parameter in the initial recognition model to obtain the trained recognition model.
The device according to claim 13 or 23, further comprising:

An acquiring module, configured to acquire an object identifier of the target object if the target object is detected;

a finding module, configured to find a recognition model associated with the object identifier of the target object;

The receiving module is specifically configured to use the associated recognition model as the preset recognition model, and execute the first image that receives the target object collected by the first image sensor, and the second image sensor The step of acquiring the second image of the target object.
An image processing apparatus, comprising: a processor and a memory, wherein the processor and the memory are connected by a bus, the memory stores executable program code, and the processor is configured to invoke the executable program The code performs the image data processing method according to any one of claims 1 to 12.
A computer readable storage medium, characterized in that the computer storage medium stores a computer program, the computer program comprising program instructions, the program instructions, when executed by a processor, causing the processor to execute as claimed in claim 1. The steps of the image data method of any of 12.
A computer program product, comprising: a non-transitory computer readable storage medium storing a computer program, the computer program being operative to cause a computer to implement any one of claims 1 to 12. The steps of the image data method described.