CN112541553A

CN112541553A - Target object state detection method, apparatus, medium, and electronic device

Info

Publication number: CN112541553A
Application number: CN202011506057.0A
Authority: CN
Inventors: 孙杰
Original assignee: Shenzhen Horizon Robotics Science and Technology Co Ltd
Current assignee: Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-03-23
Anticipated expiration: 2040-12-18
Also published as: CN112541553B

Abstract

A method, an apparatus, a medium, and a device for detecting a state of a target object are disclosed, wherein the detection method includes: acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device to obtain a first image block; acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the preset time range of the acquisition time of the second video frame is determined by the acquisition time point of the first video frame and the acquisition time point of the first camera device for acquiring a third video frame, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and is separated from the first video frame by a preset number of frames; and detecting the state of the target object based on the first image block and the second image block. The technical scheme provided by the disclosure is beneficial to ensuring the accuracy of the state detection processing of the target object under the condition of consuming less computing resources.

Description

Target object state detection method, apparatus, medium, and electronic device

Technical Field

The present disclosure relates to computer vision technologies, and in particular, to a method for detecting a state of a target object, a device for detecting a state of a target object, a storage medium, and an electronic apparatus.

Background

In some computer vision applications, it is often necessary to cut out an image block containing a predetermined portion of a target object from a video frame in order to detect a state of the target object in the image block. For example, in a driver fatigue monitoring application, it is generally required to acquire an eye image block sequence of a driver from a plurality of video frames acquired by a camera device, and obtain an eye key point sequence of the driver based on the eye image block sequence, so that a current fatigue state of the driver can be determined based on the eye key point sequence.

In the process of capturing a video by the camera device, factors such as changes in external light (e.g., changes in intensity or changes in illumination direction) and changes in the posture of the target object may affect the sharpness of a predetermined portion (e.g., an eye) of the target object in a video frame captured by the camera device, and thus may affect the accuracy of a state detection result of the target object.

How to obtain an image block with better definition and containing a predetermined portion of a target object to ensure the accuracy of a state detection result of the target object based on the image block is a technical problem of great concern.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a method and a device for detecting the state of a target object, a storage medium and an electronic device.

According to an aspect of an embodiment of the present disclosure, there is provided a method for detecting a state of a target object, the method including: acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device to obtain a first image block; acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of a third video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame acquired by the first camera device and separated from the first video frame by a preset number of frames; and detecting the state of the target object based on the first image block and the second image block.

According to still another aspect of the embodiments of the present disclosure, there is provided a state detection apparatus of a target object, the apparatus including: the first acquisition module is used for acquiring an image block containing a preset part of a target object from a first video frame acquired by the first camera device to obtain a first image block; the second acquisition module is used for acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of a third video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame acquired by the first camera device and separated from the first video frame by a preset number of frames; and the state detection module is used for detecting the state of the target object based on the first image block obtained by the first acquisition module and the second image block obtained by the second acquisition module.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for implementing the above method.

According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the method.

Based on the method and the device for detecting the state of the target object provided by the above embodiments of the present disclosure, since the frame rate of the second image capturing device is higher than that of the first image capturing device, therefore, the second camera device is beneficial to avoiding the blurring phenomenon of the image content of the second video frame caused by factors such as light change, target object posture change and the like in the process of collecting the second video frame, the first video frame captured by the first camera device may have a problem of poor definition of image content due to factors such as light change and target object posture change, therefore, the present disclosure can make up for the defect of the first image block in the definition of the image content by the second image block containing the predetermined portion of the target object in the second video frame acquired by the second camera device, therefore, the influence of noise in the first image block on the state detection operation of the target object is avoided; in the process of performing state detection by using the first image block and the second image block, because the first image capturing device can adopt an image capturing device with a lower frame rate, the number of images which need to perform state detection operation in unit time can be less, and thus the state detection operation of the target object can be completed without consuming more computing resources. Therefore, the technical scheme provided by the disclosure is beneficial to realizing the compensation of the definition of the image content by utilizing the second video frame acquired by the second camera device under the condition of consuming less computing resources, so that the accuracy of the state detection result of the target object can be ensured.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a schematic diagram of a scenario in which the present disclosure is applicable;

FIG. 2 is a flow chart of one embodiment of a method for detecting a state of a target object of the present disclosure;

FIG. 3 is a flow chart of one embodiment of the present disclosure for determining an optical axis target orientation of a second imaging device;

fig. 4 is a flowchart of an embodiment of determining an optical axis target direction of the second image capturing device by using pixel points in the third image block according to the present disclosure;

FIG. 5 is a flow chart of one embodiment of the present disclosure for setting the amount of unit pixel shift using a checkerboard calibration plate;

FIG. 6 is a flowchart illustrating an embodiment of obtaining an image block including a predetermined portion of a target object from a first video frame according to the present disclosure;

FIG. 7 is a schematic diagram of one embodiment of a first video frame of the present disclosure;

FIG. 8 is a flow diagram of one embodiment of a state of detecting a target object of the present disclosure;

FIG. 9 is a flowchart illustrating an embodiment of obtaining a fourth image block including a predetermined portion of a target object by fusion according to the present disclosure;

FIG. 10 is a schematic diagram illustrating an embodiment of a device for detecting a status of a target object according to the present disclosure;

fig. 11 is a block diagram of an electronic device provided in an exemplary embodiment of the present application.

Detailed Description

Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, such as a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the present disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with an electronic device, such as a terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the disclosure

In the process of implementing the present disclosure, the inventor finds that, in order to avoid the problem that the imaging effect of the video frame acquired by the camera device is not good, two ways are generally adopted at present, wherein one way is: increasing the light supplement power to improve the definition of the video frame acquired by the camera device; the other mode is as follows: and video acquisition is carried out by adopting a camera device with high image resolution and high frame rate. This mode of increase light filling power can lead to the consumption to increase on the one hand to aggravate the phenomenon of generating heat of equipment, on the other hand, the light filling also can have certain injury to the target object, for example, the infrared ray of reinforcing can have certain injury to driver's health especially driver's eyes, and again, the light filling has and can lead to reflection of light phenomenon, like the glasses reflection of light that the driver wore etc. thereby can cause the regional detail of driver's eyes to lack. In the method of capturing a video by using an image capturing device with high image resolution and high frame rate, since a large amount of computing resources are often consumed when processing a video frame with high image resolution and high frame rate, a high-performance data processing unit is often required to support the processing, and thus the state detection cost of a target object can be increased.

Brief description of the drawings

The state detection technique of the target object of the present disclosure can be applied to applications such as fatigue state detection. Next, an application of the state detection technique of the target object of the present disclosure will be described with reference to fig. 1, taking fatigue state detection for a driver as an example.

In fig. 1, two cameras are provided at a position 101 in a vehicle 100, and the types of the two cameras may be set according to actual requirements, for example, the two cameras may be both RGB (Red Green Blue) based cameras or infrared based cameras, and the like. The frame rates of the two imaging devices are different, and the image resolutions (which may also be referred to as spatial resolutions) of the two imaging devices are also different. The image resolution of the image pickup apparatus with a high frame rate is lower than that of the image pickup apparatus with a low frame rate. In one example, it is assumed that the two imaging devices are a first imaging device and a Second imaging device, the frame rate of the first imaging device is 10FPS (Frames Per Second), the frame rate of the Second imaging device is 60FPS, the image resolution of the first imaging device is 720p (Progressive), and the image resolution of the Second imaging device is 240 p.

When the driver is in the driving position of the vehicle 100, the face of the driver should be located within the boundary line of the fields of vision of the two cameras in the vehicle 100, i.e., the videos captured by the two cameras usually include the face of the driver (such as the front face, etc.), and the face of the driver is preferably located in the center area of the fields of vision of the two cameras.

The two cameras in the vehicle 100 are time-synchronized and controlled to start video shooting at the same time point, and video frames respectively captured by the two cameras can be provided to a DMS (Driver Monitor System) provided in the vehicle 100 in real time. The DMS may respectively acquire image patches including the eyes of the driver from each of the video frames acquired by the first camera and acquire image patches including the eyes of the driver from each of the video frames acquired by the second camera, and the DMS may perform a fatigue state detection process of the target object once using one image patch from the first camera and six image patches from the second camera.

When the fatigue state of the driver in a period of time is determined to belong to the preset mild fatigue state, the DMS can remind the driver of taking a short rest in the modes of characters, voice, light, video and the like. When the fatigue state of the driver in a period of time is determined to belong to the preset moderate fatigue state, the DMS can remind the driver of having a short rest in the modes of characters, voice, light, videos and the like. When the fatigue state of the driver in a certain time range is determined to belong to the preset severe fatigue state, the DMS may give an emergency warning to the driver in the form of text, voice, light, video, etc. to prompt the driver that the current driving behavior is dangerous, and the driver must take a short break before driving on the road to ensure the driving safety of the vehicle 100.

Exemplary method

Fig. 2 is a flowchart of an embodiment of a method for detecting a state of a target object according to the present disclosure. The method shown in fig. 2 comprises: s200, S201, and S202. The following describes each step.

S200, acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device, and acquiring a first image block.

The first image pickup device in the present disclosure may include, but is not limited to: an RGB-based image pickup device, an infrared-based image pickup device, or the like. The target object in the present disclosure may be regarded as an object that needs to be subjected to state detection. For example, the target object may be a driver or an on-duty person, or the like. The predetermined part in the present disclosure may refer to a partial region on the target object, for example, the predetermined part may be a face or eyes or a mouth or hands, etc.

The first image block can be obtained from the first video frame by adopting a neural network or the like, for example, the first video frame is subjected to face detection processing through the neural network for face detection, and the first video frame is cut according to the result of the face detection processing to obtain the first image block.

S201, acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device, and acquiring a second image block.

The second image pickup device in the present disclosure may include, but is not limited to: an RGB-based image pickup device, an infrared-based image pickup device, or the like. The first and second camera devices may be the same type of camera device, for example, both the first and second camera devices are RGB-based camera devices or both infrared-based camera devices. Of course, the present disclosure does not exclude the case where the first and second image pickup devices are different types of image pickup devices.

The second image block can be obtained from the second video frame by adopting a neural network or the like, for example, the second video frame is subjected to face detection processing through the neural network for face detection, and the second video frame is cut according to the result of the face detection processing to obtain the second image block.

The present disclosure may obtain a plurality of second image blocks from a plurality of video frames captured by the second camera device, where each second image block includes a predetermined portion of the target object, that is, one first video frame may correspond to a plurality of second video frames, that is, the present disclosure may obtain a plurality of second image blocks from a plurality of second video frames while obtaining one first image block from one first video frame, that is, one first image block may correspond to a plurality of second image blocks.

In the present disclosure, the capturing time of a plurality of second video frames corresponding to a first video frame all belong to a predetermined time range, and the predetermined time range is determined by the capturing time point of the first video frame and the capturing time point of a third video frame. In one example, the capture time point of the first video frame is the minimum value in the predetermined time range, and the capture time point of the third video frame is the maximum value in the predetermined time range, at this time, the capture time points of all the second video frames corresponding to the first video frame should be greater than or equal to the capture time point of the first video frame and smaller than the capture time point of the third video frame. The third video frame in the present disclosure is a video frame captured by the first camera. The first video frame and the third video frame may be separated by a predetermined number of frames, and the predetermined number of frames may be an integer greater than or equal to 0. In one example, the first video frame and the third video frame may be two video frames adjacent to each other in front and back, that is, the predetermined number of frames apart between the first video frame and the third video frame is 0.

The frame rate of the first image pickup apparatus in the present disclosure is lower than that of the second image pickup apparatus, and the frame rate of the second image pickup apparatus is generally an integer multiple of the frame rate of the first image pickup apparatus. In one example, the frame rate of the first camera is 10FPS and the frame rate of the second camera is 60FPS, and in this case, one first video frame in the present disclosure may correspond to six second video frames.

The image resolution of the first image capture device in the present disclosure is typically not the same as the image resolution of the second image capture device. For example, the image resolution of the first image capture device is higher than the image resolution of the second image capture device. The image resolution of the first image capture device may be an integer multiple of the image resolution of the second image capture device. In one example, the image resolution of the first imaging device is 720p and the image resolution of the second imaging device is 240 p.

S202, detecting the state of the target object based on the first image block and the second image block.

The state of the detection target object in the present disclosure may include, but is not limited to: detecting a fatigue state of the target object, and the like. The present disclosure may perform the state detection process of the target object based on the frame rate of the first imaging device. That is, the present disclosure may perform the state detection processing of the target object once every time one first video frame is generated, that is, the number of times the present disclosure performs the state detection operation of the target object per unit time is determined by the frame rate of the first image pickup device, not by the frame rate of the second image pickup device.

The frame rate of the second camera device is higher than that of the first camera device, so that the second camera device is beneficial to avoiding the blurring phenomenon of the image content of the second video frame caused by factors such as light change, target object posture change and the like in the process of collecting the second video frame, and the first video frame collected by the first camera device may have the problem of poor image content definition caused by factors such as light change, target object posture change and the like, so that the defect of the first image block in the image content definition can be compensated through the second image block containing the preset part of the target object in the second video frame collected by the second camera device, and the influence of noise in the first image block on the state detection of the target object is favorably avoided; when the present disclosure performs state detection using the first image block and the second image block, since the first image capturing device may employ an image capturing device with a lower frame rate, the number of images that need to perform the state detection operation per unit time may be smaller, for example, if the frame rate of the first image capturing device is 10FPS and the frame rate of the second image capturing device is 60FPS, the number of images that perform the state detection operation within one second of the present disclosure may be 10 instead of 60; therefore, the method and the device can complete the state detection operation of the target object without consuming more computing resources. Therefore, the technical scheme provided by the disclosure is beneficial to realizing the compensation of the definition of the image content by utilizing the second video frame acquired by the second camera device under the condition of consuming less computing resources, so that the accuracy of the state detection result of the target object can be ensured.

In an optional example, in an application scene such as power-up or restart, the second camera device in the present disclosure may generally use a preset default optical axis direction as an optical axis current direction, and the present disclosure may perform optical axis direction updating processing on the optical axis current direction of the second camera device, and after the optical axis direction updating processing, use a video frame acquired by the second camera device as the second video frame in the present disclosure. The optical axis direction updating processing is carried out on the optical axis current direction of the second camera device, which can be regarded as initialization processing of the optical axis current direction of the second camera device, and the optical axis current direction after the initialization processing is not changed before the next initialization processing.

Optionally, the process of initializing the current pointing direction of the optical axis of the second image capturing apparatus according to the present disclosure may include: and determining the optical axis target direction of the second camera device according to the position of the target object preset part in the fourth video frame acquired by the first camera device, wherein the optical axis current direction of the second camera device is changed into the optical axis target direction. That is, the present disclosure initializes the current orientation of the optical axis of the second camera with a video frame captured by the first camera (referred to herein as a fourth video frame).

Optionally, the acquisition time point of the fourth video frame in the present disclosure is earlier than the acquisition time point of the first video frame. In one example, after the first camera and the second camera are powered on simultaneously, the present disclosure may use the 1 st video frame captured by the first camera, which includes a predetermined portion of the target object (e.g., the eyes of the driver), as the fourth video frame. In another example, after the first camera and the second camera are powered on simultaneously, the present disclosure may use the 1 st video frame, which contains a predetermined portion of the target object (such as the eyes of the driver) and has an image sharpness meeting the preset requirement, captured by the first camera as the fourth video frame. After the optical axis target direction of the second camera device is determined by using the fourth video frame, the camera device movement control equipment such as the motor can be controlled to execute corresponding actions by forming and outputting corresponding control commands and the like, so that the second camera device generates mechanical movement, and the optical axis of the second camera device points to the optical axis target direction really at present.

In one example, the present disclosure may first obtain an image area where a predetermined portion of a target object in a fourth video frame captured by a first camera device is located, and determine a position where an optical axis of a second camera device is currently directed in the fourth video frame.

According to the method, the optical axis target direction of the second camera device is determined according to the position of the target object preset part in the fourth video frame acquired by the first camera device, and the optical axis target direction of the second camera device can be associated with the target object preset part.

In an alternative example, a specific example of the present disclosure determining the optical axis target orientation of the second imaging device is shown in fig. 3.

In fig. 3, in step S300, an image block including a predetermined portion of the target object is obtained from the fourth video frame acquired by the first camera device, and a third image block is obtained.

Optionally, the present disclosure may adopt a neural network or the like to obtain the third image block from the fourth video frame acquired by the first camera device, for example, the fourth video frame is subjected to face detection processing by the neural network for face detection, a position of the predetermined portion of the target object in the fourth video frame is obtained according to a result of the face detection processing, and the third image block is obtained by cutting the fourth video frame according to the position. In one example, the third image block in the present disclosure may be an eye-based image block, i.e. the predetermined part is an eye.

And S301, determining the optical axis target direction of the second image pickup device according to the pixel points in the third image block.

Optionally, the present disclosure may determine the optical axis target direction of the second camera device according to a pixel point in the third image block, that is, when the optical axis target direction of the second camera device is mapped into the third image block, the optical axis target direction of the second camera device should have a certain association with the pixel point in the third image block, for example, when the optical axis target direction of the second camera device is mapped into the third image block, the position where the optical axis target direction of the second camera device is located may be the position where the pixel point in the third image block is located.

The method obtains a third image block from a fourth video frame, determines the optical axis target direction of a second camera device based on pixel points in the third image block, and can enable the optical axis target direction of the second camera device to be associated with the pixel points in the third image block, so that the optical axis target direction of the second camera device can be enabled to be close to a specified position in a preset part of a target object, thereby being beneficial to enabling a specified position in the preset part of the target object to be presented at a significant position of a second video frame collected by the second camera device as far as possible, reducing phenomena that the preset part of the target object cannot be presented in the second video frame or is located at the edge position of the second video frame as far as possible, influencing a state detection result of the target object, and further being beneficial to improving the accuracy of the state detection result of the target object.

In an alternative example, an example of determining the optical axis target direction of the second image capturing device by using the pixel points in the third image block is shown in fig. 4.

In fig. 4, S400, a central pixel point in the third image block is determined.

Optionally, under the condition that the number of pixels in the width direction and the number of pixels in the height direction of the third image block are both odd numbers larger than zero, the disclosure may use the pixel point where the intersection point of two diagonal lines of the third image block is located as the central position pixel point in the third image block. Under the condition that the number of pixels in the width direction or the height direction of the third image block is an even number greater than zero, any pixel point adjacent to the intersection point of two diagonal lines of the third image block in the third image block can be used as a central position pixel point in the third image block.

S401, based on the central position pixel point, determining a target pointing pixel point of an optical axis target pointing to the first camera device in an image coordinate system.

Alternatively, the image coordinate system of the first camera device may refer to a plane coordinate system of an image (such as a photo or a video frame) captured by the first camera device. For convenience of description, the present disclosure simply refers to the image coordinate system of the first image pickup device as the first image coordinate system. The origin of the first image coordinate system may be located in the upper left corner of the video frame captured by the first camera device. In one example, the width direction of the video frame captured by the first camera is the X coordinate axis direction of the first image coordinate system, and the height direction of the video frame captured by the first camera is the Y coordinate axis direction of the first image coordinate system. In one example, the present disclosure may directly point the center position pixel as a target pointing pixel of an optical axis target pointing of the second camera in the first image coordinate system. In another example, the present disclosure may use a pixel point around the central position pixel point as a target pointing pixel point of the optical axis target pointing of the second camera device in the first image coordinate system.

S402, determining the optical axis translation amount of the second camera device according to the target pointing pixel point and the current pointing pixel point of the optical axis of the second camera device pointing to the image coordinate system of the first camera device.

Optionally, in this disclosure, the current pointing pixel point of the optical axis of the second camera device in the image coordinate system of the first camera device may be: and after the current direction of the optical axis of the second camera device is mapped to the image coordinate system of the first camera device, the current direction of the optical axis of the second camera device is directed to the pixel point. For example, the current pointing direction of the optical axis of the second camera device may correspond to a pixel point in the image coordinate system of the second camera device, and the pixel point may be converted into the image coordinate system of the first camera device based on the relative position relationship between the first camera device and the second camera device, so as to obtain the current pointing pixel point of the current pointing direction of the optical axis of the second camera device in the image coordinate system of the first camera device.

Alternatively, the optical axis translation amount in the present disclosure may refer to a translation amount of the optical axis in the horizontal direction and a translation amount of the optical axis in the vertical direction. The horizontal direction and the vertical direction to which the translation amount in the present disclosure refers may refer to a horizontal direction and a vertical direction based on a world coordinate system. The method and the device can determine the optical axis translation amount of the second camera device according to the pixel distance between the target pointing pixel point and the current pointing pixel point of the optical axis of the second camera device in the image coordinate system of the first camera device.

Alternatively, the optical axis translation in the present disclosure may have a sign, and the sign may indicate the direction of the optical axis translation. In one example, when the translation amount of the optical axis in the horizontal direction is negative, the direction indicating the translation amount of the optical axis is to the left of the current pointing direction of the optical axis; when the translation amount of the current optical axis in the horizontal direction is positive, the direction of the translation amount of the current optical axis is the right direction to which the optical axis points currently; when the translation amount of the current optical axis in the vertical direction is negative, the direction of the translation amount of the current optical axis is the upward direction of the current optical axis; when the amount of translation of the optical axis in the vertical direction is positive, the direction indicating the amount of translation of the optical axis is below the current orientation of the optical axis.

Alternatively, an example of a specific process of determining the optical axis translation amount of the second imaging device according to the present disclosure is as follows:

firstly, a first pixel distance in the horizontal direction and a second pixel distance in the vertical direction between the coordinate of the target pointing pixel point and the current pointing pixel point are determined. The horizontal direction and the vertical direction herein refer to the horizontal direction and the vertical direction of the image coordinate system of the first imaging device. More specifically, the horizontal direction here may be a width direction of the first video frame captured by the first camera, and the vertical direction here may be a height direction of the first video frame captured by the first camera. The method and the device can take the difference value of the x coordinate of the target pointing pixel point and the x coordinate of the current pointing pixel point as the first pixel distance, and take the difference value of the y coordinate of the target pointing pixel point and the y coordinate of the current pointing pixel point as the second pixel distance.

Then, the optical axis horizontal translation amount and the optical axis vertical translation amount of the second imaging device are determined according to the first pixel distance, the second pixel distance, and the unit pixel movement amount of the optical axis stored in advance. That is, the present disclosure is previously provided with a unit pixel moving amount of an optical axis, which may include: a horizontal direction unit pixel movement amount of the optical axis and a vertical direction unit pixel movement amount of the optical axis. The horizontal direction unit pixel movement amount of the optical axis may refer to a movement amount of the optical axis currently directed in the horizontal direction of the world coordinate system when the optical axis of the second imaging device is currently directed to move by one unit pixel in the horizontal direction of the image coordinate system of the first imaging device. The vertical direction unit pixel movement amount of the optical axis may refer to a movement amount of the optical axis currently directed in the vertical direction of the world coordinate system when the optical axis of the second image pickup device is currently directed to move by one unit pixel in the vertical direction of the image coordinate system of the first image pickup device. One unit pixel herein may refer to one basic unit formed of one pixel or a plurality of pixels. In the case where one unit pixel is a basic unit formed of one pixel, the present disclosure may use a product of the first pixel distance and the horizontal direction unit pixel movement amount of the optical axis as the optical axis horizontal translation amount of the second image pickup device, and use a product of the second pixel distance and the vertical direction unit pixel movement amount of the optical axis as the optical axis vertical translation amount of the second image pickup device. In the case where one unit pixel is a basic unit formed of a plurality of pixels, the present disclosure may first convert the first pixel distance and the second pixel distance into magnitudes based on the unit pixel, respectively, and obtain the optical axis horizontal translation amount and the optical axis vertical translation amount of the second imaging device using the product of the magnitudes obtained by the conversion and the corresponding unit pixel movement amount.

Alternatively, the present disclosure may set a unit pixel shift amount using a calibration plate. Specifically, the unit pixel movement amount in the present disclosure is set according to the number of pixels between two corner points and the optical axis movement amount of the second imaging device when the optical axis of the second imaging device moves from one corner point to another corner point on a calibration board (e.g., a checkerboard calibration board). The present disclosure may store the unit pixel movement amount in the storage unit of the second imaging device, or may store the unit pixel movement amount in the DMS or the like. One example of setting the unit pixel shift amount using the checkerboard calibration plate in the present disclosure can be seen in the following description with respect to fig. 5.

And S403, determining the optical axis target direction of the second camera device according to the optical axis translation amount.

Optionally, the optical axis current direction and the optical axis translation amount of the second imaging device in the present disclosure may uniquely determine an optical axis target direction. The control command can be a control command for the camera device movement control equipment such as a motor, so that the camera device movement control equipment can generate corresponding actions based on the control command, the second camera device can generate mechanical movement, and finally the current direction of the optical axis of the second camera device is prompted to be changed into the target direction of the optical axis.

The optical axis target direction of the second camera device is determined based on the central position pixel point in the third image block, so that the optical axis target direction of the second camera device can be associated with the central point of the preset part of the target object, for example, the optical axis target direction of the second camera device can be made to be the central position of the preset part of the target object as far as possible, so that the preset part of the target object can be presented in the central area of the second video frame acquired by the second camera device as far as possible, for example, no matter how high a driver is positioned in a cockpit and how close the driver is to the second camera device, the eyes of the driver can be positioned in the central area of the second video frame acquired by the second camera device as far as possible, and the phenomena that the preset part of the target object cannot be presented in the second video frame completely or is positioned at the edge position of the second video frame and the like are reduced as much as possible, the influence on the state detection result of the target object is further favorable for improving the accuracy of the state detection result of the target object. The optical axis translation amount of the second camera device is determined by utilizing the difference between the target pointing pixel point and the current pointing pixel point and the unit pixel movement amount of the optical axis, and the optical axis target pointing direction of the second camera device is determined based on the optical axis translation amount, so that an easy-to-implement implementation mode is provided for adjusting the optical axis target pointing direction of the second camera device, and the usability of the technical scheme is improved.

Assuming that the image resolution of the first image pickup device is 720p, the image resolution of the second image pickup device is 240p, the relative positions of the first image pickup device and the second image pickup device are fixed, and the optical axis orientation of the second image pickup device can be changed. The checkerboard calibration plate should be fixedly disposed in the field angle range of the first and second imaging devices, and the first and second imaging devices respectively take images of the checkerboard calibration plate, and a first image and a second image can be obtained, for example, an image 501 in fig. 5 is a part of the first image, and an image 502 in fig. 5 is a part of the second image. The optical axis of the second camera device is enabled to be currently directed to a first angular point in the alignment chessboard pattern calibration plate for shooting, then the second camera device is controlled to generate mechanical movement, the optical axis of the second camera device is enabled to be currently directed to a second angular point (the first angular point and the second angular point are two different angular points) in the alignment chessboard pattern calibration plate, and image shooting operation is executed. The optical axis translation amount of the optical axis of the second camera device moving from the first corner point to the second corner point, namely the horizontal translation amount of the first optical axis and the vertical translation amount of the first optical axis, can be obtained according to the mechanical movement of the second camera device. The present disclosure may obtain a pixel distance in the horizontal direction (i.e., a third pixel distance) and a pixel distance in the vertical direction (i.e., a fourth pixel distance) between the first corner point and the second corner point from the image 501, for example, assuming that the coordinates of the first corner point in the first image coordinate system are (x1, y1) and the coordinates of the second corner point in the first image coordinate system are (x2, y2), the third pixel distance is x2-x1 and the fourth pixel distance is y2-y 1. The absolute value of the ratio of the first optical axis horizontal translation amount to the third pixel distance and the absolute value of the ratio of the first optical axis vertical translation amount to the fourth pixel distance may be used as the unit pixel movement amount of the second image pickup device.

In an alternative example, the present disclosure illustrates an example of acquiring an image block including a predetermined portion of a target object from a first video frame captured by a first camera as shown in fig. 6.

In fig. 6, S600, keypoint detection of the target object is performed on the first video frame to obtain at least one keypoint.

Optionally, the key point detection of the target object in the present disclosure may be face key point detection, that is, the present disclosure may perform face key point detection on the first video frame, so as to obtain a plurality of face key points. For example, the present disclosure may perform a face keypoint detection process on the first video frame 700 shown in fig. 7 through a neural network for face keypoint detection, and obtain a plurality of face keypoints of the target object based on the result of the detection process.

S601, determining the area of the target object in the first video frame according to at least one key point.

Alternatively, the present disclosure may determine the minimum x coordinate, the maximum x coordinate, the minimum y coordinate, and the maximum x coordinate of all currently obtained coordinates of the keypoints (i.e., two-dimensional coordinates based on the planar coordinate system of the first video frame 700), and form coordinates of four vertices, for example, the coordinates of the first vertex 701, the second vertex 702, the third vertex 703, and the fourth vertex 704 in fig. 7, using the minimum x coordinate, the maximum x coordinate, the minimum y coordinate, and the maximum x coordinate. The present disclosure may directly use the area formed by the coordinates of the four vertices as the area where the target object in the first video frame is located.

S602, determining the area where the preset part of the target object is located according to the area where the target object is located, and obtaining a first image block.

Optionally, the present disclosure may determine, according to a relative positional relationship between the predetermined portion and the target object, a region where the predetermined portion of the target object is located in the region where the target object is located, for example, the present disclosure may determine, according to a relative positional relationship between the eyes and the face (for example, the height of the eyes is one seventh of the height of the face, the lowest ends of the eyes are located at one fifth of the face, and the like), a region where the eyes are located in the region where the face is located. The method and the device can cut the first video frame based on the area where the preset part of the target object is located, so as to obtain the image block containing the preset part of the target object, namely obtain the first image block.

According to the method and the device, the area where the target object is located is obtained through key point detection, the image block containing the preset part of the target object is determined through the area where the target object is located, and an implementation mode which is easy to implement and accurate in result is provided for obtaining the first image block, so that the usability of the technical scheme of the method and the device is improved.

Optionally, the specific process of obtaining the second image block in the present disclosure may refer to the description in fig. 6 and fig. 7, for example, the present disclosure may first perform keypoint detection on the target object in the second video frame to obtain at least one keypoint, then determine the area where the target object in the second video frame is located according to the at least one keypoint, and finally determine the area where the predetermined portion of the target object is located according to the area where the target object is located to obtain the second image block. The specific procedures are not described in detail herein.

In an alternative example, the present disclosure detects an example of a state of a target object based on a first image block and a second image block as shown in fig. 8.

In fig. 8, S800, a fusion process is performed on the first image block and the second image block to obtain a fourth image block including a predetermined portion of the target object.

Alternatively, the fusion process in the present disclosure may refer to a process of forming one image block using a plurality of image blocks. The present disclosure may perform a fusion process on a first image and a plurality of second images corresponding to the first image, so as to obtain an image block including a predetermined portion of the target object, i.e., a fourth image block. The present disclosure does not limit the specific implementation of the fusion process, and one way of the fusion process provided by the present disclosure is as described below with respect to fig. 9.

S801, determining the state of the target object according to the image sequence formed by the fourth image block.

Optionally, each first video frame in the disclosure corresponds to a fourth image block, and all the fourth image blocks may be arranged according to the acquisition time point of the corresponding first video frame, so as to form an image sequence, for example, an eye image sequence may be formed. For example, the present disclosure performs fatigue state detection processing on an eye image sequence, so that a fatigue state of a target object, such as a non-fatigue state, a light fatigue state, a moderate fatigue state, or a heavy fatigue state, can be obtained.

Optionally, the fatigue state detection processing of the present disclosure may include: the method comprises the steps of detecting key points of human eyes, calculating corresponding distances and areas based on the key points of the human eyes and the like, and the specific implementation mode of fatigue state detection processing is not limited by the method.

The frame rate of the second camera device is higher than that of the first camera device, so that the second camera device is beneficial to avoiding the image blurring phenomenon of the second video frame caused by factors such as light change, target object posture change and the like in the process of acquiring the second video frame, and therefore, the noise in the first image block is beneficial to being eliminated by utilizing the fusion of a plurality of second image blocks containing the preset part of the target object in a plurality of second video frames acquired by the second camera device and the second image block containing the preset part of the target object in the first video frame acquired by the first camera device, and the third image block containing the preset part of the target object after the fusion processing has better definition; further, when the image block is used for state detection processing such as fatigue state detection, since the first imaging device can adopt an imaging device with a low frame rate, the number of images to be processed per unit time is small, and the subsequent processing of the images can be completed without consuming a large amount of computing resources. Therefore, the technical scheme provided by the disclosure is beneficial to improving the definition of the image acquired by the low-frame-rate camera device, so that the contradiction between the image definition and the computing resource consumption is avoided when the image is subjected to subsequent processing such as fatigue detection.

In an alternative example, the present disclosure performs a fusion process on the first image block and the second image block to obtain an example of a fourth image block including a predetermined portion of the target object as shown in fig. 9.

In fig. 9, S900, for any pixel point to be processed in the first image block, a pixel point corresponding to the pixel point to be processed is obtained from the second image block, and a pixel point to be fused is obtained.

Optionally, assuming that the pixel point to be processed is image picture information used for representing a corresponding position in the first image block, the pixel point corresponding to the pixel point to be processed in the present disclosure may be used for representing the image picture information at the position corresponding to the pixel point to be processed in the second image block.

Optionally, since the image resolutions of the first and second image capturing devices are different, the size of the first image block is usually different from the size of the second image block, and the size of the second image block is usually smaller than the size of the first image block. The size of the second image block can be converted into the size of the first image block, and therefore the pixel points at the same positions in the first image block and the second image block are the pixel points to be processed and the pixel points corresponding to the pixel points to be processed (namely the pixel points to be fused). In one example, the present disclosure may perform a transformation process on each second image block separately using a gaussian kernel function, so as to transform the second image block into an image block having the same size as the first image block.

Optionally, under the condition that one first image block corresponds to a plurality of second image blocks, the present disclosure may obtain, from each second image block corresponding to the first image block, a pixel point corresponding to the pixel point to be processed, so as to obtain a plurality of pixel points to be fused.

S901, carrying out pixel value weighted calculation on the pixel point to be processed and the pixel point to be fused to obtain the pixel value of the pixel point after fusion processing.

Optionally, when the first image block and the second image block are both single-channel-based image blocks, the method and the device can directly perform pixel value weighting calculation on the pixel point to be processed and the pixel point to be fused to obtain the pixel value of the pixel point after fusion processing. When the first image block and the second image block are both multi-channel-based image blocks, the method performs weighted calculation on pixel values of the same channel of the pixel point to be processed and the pixel point to be fused to obtain a weighted calculation result of each channel, so that the pixel value of the pixel point after fusion processing is obtained. In addition, the weights of the pixel points to be processed and the pixel points to be fused in the disclosure may be equal or unequal.

S902, obtaining a fourth image block containing the preset part of the target object based on the pixel values of the plurality of pixel points after fusion processing.

Optionally, after each pixel in the first image block is subjected to weighted calculation, all the pixels subjected to fusion processing form a fourth image block.

According to the method and the device, the pixel values of the pixel points to be processed in the first image block and the pixel points to be fused in the second image block are subjected to weighted calculation, noise in the fourth image block can be reduced through the second image block, and therefore the fourth image block can have better definition, and the accuracy of state detection can be improved.

Exemplary devices

Fig. 10 is a schematic structural diagram of an embodiment of a device for detecting a state of a target object according to the present disclosure.

The device of the embodiment can be used for realizing the corresponding method embodiment of the disclosure. The apparatus shown in fig. 10 includes: a first acquisition module 1000, a second acquisition module 1001, and a status detection module 1002. The apparatus of this embodiment may further optionally include: the target pointing direction 1003 is determined.

The first obtaining module 1000 is configured to obtain an image block including a predetermined portion of a target object from a first video frame acquired by a first camera device, so as to obtain a first image block.

The second obtaining module 1001 is configured to obtain an image block including a predetermined portion of the target object from a second video frame acquired by a second camera device, and obtain a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of a third video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame acquired by the first camera device and separated from the first video frame by a preset number of frames.

The state detection module 1002 is configured to detect a state of the target object based on the first image block obtained by the first obtaining module 1000 and the second image block obtained by the second obtaining module 1001.

The determined target orientation 1003 is used for determining an optical axis target orientation of the second camera according to the position of the target object preset part in the fourth video frame acquired by the first camera. Wherein a capture time point of the fourth video frame is earlier than a capture time point of the first video frame.

Determining the target bearing 1003 may include: a first sub-module 10031 and a second sub-module 10032. The first sub-module 10031 is configured to obtain an image block including a predetermined portion of the target object from the fourth video frame acquired by the first camera device, and obtain a third image block. The second sub-module 10032 is configured to determine an optical axis target direction of the second camera device according to a pixel point in the third image block obtained by the first sub-module 10031.

Optionally, the second sub-module 10032 may include: a first unit, a second unit, a third unit, and a fourth unit (not shown in fig. 10). The first unit is used for determining a central position pixel point in the third image block. The second unit is used for determining the target pointing pixel point of the optical axis target pointing to the first camera device in the image coordinate system based on the central position pixel point. The third unit is used for determining the optical axis translation amount of the second camera device according to the target pointing pixel point and the current pointing pixel point of the optical axis of the second camera device pointing to the image coordinate system. The fourth unit is used for determining the optical axis target direction of the second camera device according to the optical axis translation amount.

Optionally, the third unit is further configured to determine a first pixel distance in the horizontal direction and a second pixel distance in the vertical direction between the target pointing pixel points and the current pointing pixel points, and determine an optical axis horizontal translation amount and an optical axis vertical translation amount of the second camera device according to the first pixel distance, the second pixel distance, and a unit pixel movement amount of a prestored optical axis. The unit pixel moving amount is set according to the number of pixels between two corner points and the moving amount of the second camera when the optical axis of the second camera moves from one corner point to the other corner point on the calibration plate.

Optionally, the first obtaining module 1000 may include: a third submodule 10001, a fourth submodule 10002, and a fifth submodule 10003. The third sub-module 10001 is configured to perform keypoint detection on the target object on the first video frame, and obtain at least one keypoint. The fourth sub-module 10002 is configured to determine, according to the at least one key point, a region where a target object in the first video frame is located. The fifth sub-module 10003 is configured to determine, according to the area where the target object is located, an area where a predetermined portion of the target object is located, and obtain a first image block.

Optionally, the status detecting module 1002 may include: a sixth sub-module 10021 and a seventh sub-module 10022. The sixth sub-module 10021 is configured to perform fusion processing on the first image block and the second image block to obtain a fourth image block including a predetermined portion of the target object. The seventh sub-module 10022 is configured to determine the status of the target object according to the image sequence formed by the fourth image blocks.

Optionally, the sixth sub-module 10021 may include: a fifth unit, a sixth unit, and a seventh unit (not shown in fig. 10). The fifth unit may be configured to, for any pixel point to be processed in the first image block, obtain a pixel point corresponding to the pixel point to be processed from the second image block, and obtain a pixel point to be fused. The sixth unit is used for performing pixel value weighted calculation on the pixel point to be processed and the pixel point to be fused to obtain the pixel value of the pixel point after fusion processing. The seventh unit is configured to obtain a fourth image block including the predetermined portion of the target object based on the pixel values of the plurality of pixel points after the fusion processing.

Exemplary electronic device

An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 11. FIG. 11 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 11, the electronic device 111 includes one or more processors 1111 and memory 1112.

The processor 1111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 111 to perform desired functions.

Memory 1112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random Access Memory (RAM) and/or cache memory (cache), etc. The nonvolatile memory, for example, may include: read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium and executed by the processor 11 to implement the above-described state detection method of the target object of the various embodiments of the present disclosure and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 111 may further include: an input device 1113, and an output device 1114, among other components, interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 1113 may also include, for example, a keyboard, mouse, or the like. The output device 1114 can output various information to the outside. The output devices 1114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 111 relevant to the present disclosure are shown in fig. 11, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 111 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of detecting a state of a target object according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method of detecting a state of a target object according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, and systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," comprising, "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of detecting a state of a target object, comprising:

acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device to obtain a first image block;

acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of a third video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame acquired by the first camera device and separated from the first video frame by a preset number of frames;

and detecting the state of the target object based on the first image block and the second image block.

2. The method of claim 1, wherein the method further comprises, prior to the step of obtaining the first image block:

determining the optical axis target direction of the second camera device according to the position of the target object preset part in a fourth video frame acquired by the first camera device;

wherein a capture time point of the fourth video frame is earlier than a capture time point of the first video frame.

3. The method of claim 2, wherein the determining an optical axis target orientation of the second camera device from the position of the target object predetermined location in the fourth video frame captured by the first camera device comprises:

acquiring an image block containing a preset part of a target object from a fourth video frame acquired by the first camera device to obtain a third image block;

and determining the optical axis target direction of the second camera device according to the pixel points in the third image block.

4. The method according to claim 3, wherein the determining, according to the pixel points in the third image block, the optical axis target direction of the second image capturing device includes:

determining a central position pixel point in the third image block;

determining a target pointing pixel point of an optical axis target of the second camera device pointing to an image coordinate system of the first camera device based on the central position pixel point;

determining the optical axis translation amount of the second camera device according to the target pointing pixel point and the current pointing pixel point of the optical axis of the second camera device pointing to the image coordinate system;

and determining the optical axis target direction of the second camera device according to the optical axis translation amount.

5. The method of claim 4, wherein the determining an optical axis translation amount of the second camera according to the target pointing pixel and a currently pointing pixel at which the optical axis of the second camera is currently pointing in the image coordinate system comprises:

determining a first pixel distance in the horizontal direction and a second pixel distance in the vertical direction between the coordinates of the target pointing pixel point and the current pointing pixel point;

determining the optical axis horizontal translation amount and the optical axis vertical translation amount of the second camera device according to the first pixel distance, the second pixel distance and the prestored unit pixel movement amount of the optical axis;

the unit pixel moving amount is set according to the number of pixels between two corner points and the moving amount of the second camera when the optical axis of the second camera moves from one corner point to the other corner point on the calibration plate.

6. The method according to any one of claims 1 to 5, wherein the acquiring an image block containing a predetermined portion of a target object from a first video frame acquired by a first camera device to obtain a first image block comprises:

performing key point detection of a target object on the first video frame to obtain at least one key point;

determining the area of a target object in the first video frame according to the at least one key point;

and determining the area of the preset part of the target object according to the area of the target object to obtain a first image block.

7. The method of any of claims 1-6, wherein the detecting the state of the target object based on the first patch and the second patch comprises:

performing fusion processing on the first image block and the second image block to obtain a fourth image block containing a preset part of the target object;

determining a state of the target object from a sequence of images formed by the fourth image block.

8. A state detection apparatus of a target object, comprising:

the first acquisition module is used for acquiring an image block containing a preset part of a target object from a first video frame acquired by the first camera device to obtain a first image block;

the second acquisition module is used for acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of a third video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame acquired by the first camera device and separated from the first video frame by a preset number of frames;

and the state detection module is used for detecting the state of the target object based on the first image block obtained by the first acquisition module and the second image block obtained by the second acquisition module.

9. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-7.