CN112541553B

CN112541553B - Method, device, medium and electronic equipment for detecting state of target object

Info

Publication number: CN112541553B
Application number: CN202011506057.0A
Authority: CN
Inventors: 孙杰
Original assignee: Shenzhen Horizon Robotics Science and Technology Co Ltd
Current assignee: Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2024-04-30
Anticipated expiration: 2040-12-18
Also published as: CN112541553A

Abstract

The invention discloses a method, a device, a medium and equipment for detecting the state of a target object, wherein the method for detecting the state of the target object comprises the following steps: acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device to acquire a first image block; acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the preset time range of the acquisition time of the second video frame is determined by the acquisition time point of the first video frame and the acquisition time point of the first video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and is separated from the first video frame by a preset frame number; based on the first image block and the second image block, a state of the target object is detected. The technical scheme provided by the disclosure is beneficial to ensuring the accuracy of the state detection processing of the target object under the condition of consuming less computing resources.

Description

Method, device, medium and electronic equipment for detecting state of target object

Technical Field

The present disclosure relates to computer vision, and more particularly, to a method for detecting a state of a target object, a device for detecting a state of a target object, a storage medium, and an electronic apparatus.

Background

In some computer vision applications, it is often necessary to extract image blocks containing predetermined portions of the target object from the video frame in order to detect the state of the target object in the image blocks. For example, in driver fatigue monitoring applications, it is often necessary to acquire a sequence of eye image blocks of a driver from a plurality of video frames acquired by an imaging device, and obtain a sequence of eye keypoints of the driver based on the sequence of eye image blocks, so that the current fatigue state of the driver can be determined based on the sequence of eye keypoints.

In the process of capturing video, factors such as external light changes (such as intensity changes or irradiation direction changes) and posture changes of a target object can influence the definition of a preset part (such as eyes) of the target object in a video frame acquired by the imaging device, so that the accuracy of a state detection result of the target object can be influenced.

How to obtain an image block with better definition and containing a predetermined portion of a target object so as to ensure the accuracy of a state detection result of the target object based on the image block is a technical problem which is worth focusing.

Disclosure of Invention

The present disclosure has been made in order to solve the above technical problems. The embodiment of the disclosure provides a state detection method and device of a target object, a storage medium and electronic equipment.

According to an aspect of the embodiments of the present disclosure, there is provided a state detection method of a target object, the method including: acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device to acquire a first image block; acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of the first video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and separated from the first video frame by a preset frame number; based on the first image block and the second image block, a state of the target object is detected.

According to still another aspect of the embodiments of the present disclosure, there is provided a state detection apparatus of a target object, the apparatus including: the first acquisition module is used for acquiring an image block containing a preset part of a target object from a first video frame acquired by the first camera device to acquire a first image block; the second acquisition module is used for acquiring an image block containing the preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of the first video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and separated from the first video frame by a preset frame number; and the state detection module is used for detecting the state of the target object based on the first image block acquired by the first acquisition module and the second image block acquired by the second acquisition module.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for implementing the above method.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method described above.

According to the state detection method and device for the target object, the frame rate of the second image pickup device is higher than that of the first image pickup device, so that the second image pickup device is beneficial to avoiding the blurring phenomenon of the image content of the second video frame caused by factors such as light change and target object posture change in the process of collecting the second video frame, and the problem of poor image content definition caused by factors such as light change and target object posture change possibly exists in the first video frame collected by the first image pickup device, and the defect of the first image block on the image content definition can be compensated through the second image block containing the target object preset part in the second video frame collected by the second image pickup device, so that the influence of noise points in the first image block on the state detection operation of the target object is beneficial to be avoided; in the process of carrying out state detection by utilizing the first image block and the second image block, the first image pickup device can adopt the image pickup device with lower frame rate, so that the number of images needing to carry out state detection operation in unit time can be smaller, and the state detection operation of the target object can be completed without consuming more calculation resources. Therefore, the technical scheme provided by the disclosure is beneficial to realizing the compensation of the definition of the image content by using the second video frame acquired by the second camera device under the condition of consuming less calculation resources, so that the accuracy of the state detection result of the target object can be ensured.

The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing embodiments thereof in more detail with reference to the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, not to limit the disclosure. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a schematic illustration of a scenario in which the present disclosure is applicable;

FIG. 2 is a flow chart of one embodiment of a method of detecting a state of a target object of the present disclosure;

FIG. 3 is a flow chart of one embodiment of the present disclosure for determining that an optical axis target of a second camera is directed;

FIG. 4 is a flow chart of one embodiment of the present disclosure for determining an optical axis target pointing direction of a second image capture device using pixels in a third image block;

FIG. 5 is a flow chart of one embodiment of the present disclosure for setting the amount of unit pixel movement using a checkerboard calibration plate;

FIG. 6 is a flow chart of one embodiment of the present disclosure for acquiring an image block containing a predetermined portion of a target object from a first video frame;

FIG. 7 is a schematic diagram of one embodiment of a first video frame of the present disclosure;

FIG. 8 is a flow chart of one embodiment of detecting a state of a target object of the present disclosure;

FIG. 9 is a flow chart of one embodiment of obtaining a fourth image block containing a predetermined portion of a target object by fusion in accordance with the present disclosure;

FIG. 10 is a schematic diagram illustrating a configuration of an embodiment of a state detection apparatus for a target object of the present disclosure;

fig. 11 is a block diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, such as a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the present disclosure are applicable to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Summary of the disclosure

In the process of implementing the present disclosure, the inventor finds that, in order to avoid the problem that the imaging effect of the video frame acquired by the image capturing device is poor, two modes are generally adopted at present, wherein one mode is as follows: increasing the light supplementing power to improve the definition of the video frames acquired by the camera device; another way is: and video acquisition is performed by adopting a camera device with high image resolution and high frame rate. The way of increasing the light filling power can lead to an increase of power consumption on the one hand, so that the heating phenomenon of the equipment is aggravated, on the other hand, the light filling can also possibly have a certain damage to a target object, for example, the enhanced infrared rays can have a certain damage to the body of a driver, especially the eyes of the driver, and furthermore, the light filling can possibly lead to a reflection phenomenon, such as the reflection of the glasses worn by the driver, and the like, so that the detail of the eyes area of the driver can be lost. In the method of performing video acquisition by using the high-image-resolution high-frame-rate image pickup device, a large amount of computing resources are often required to process the high-image-resolution high-frame-rate video frames, and therefore, a high-performance data processing unit is often required to support, so that the state detection cost of the target object is increased.

Exemplary overview

The state detection technique of the target object of the present disclosure may be applied to applications such as fatigue state detection. Next, with reference to fig. 1, an application of the state detection technique of the object of the present disclosure will be described taking as an example a driver's fatigue state detection.

In fig. 1, two image capturing devices are disposed at a position 101 in the vehicle 100, and the two image capturing devices may be of a type according to actual requirements, for example, the two image capturing devices may be both RGB (Red Green Blue) based image capturing devices or infrared based image capturing devices. The frame rates of the two image capturing apparatuses are not the same, and the image resolutions (which may also be referred to as spatial resolutions) of the two image capturing apparatuses are also different. The image resolution of the image pickup apparatus with high frame rate is lower than that of the image pickup apparatus with low frame rate. In one example, it is assumed that two image pickup apparatuses are a first image pickup apparatus and a Second image pickup apparatus, the frame rate of the first image pickup apparatus is 10FPS (FRAMES PER seconds, transmission frame number per Second), the frame rate of the Second image pickup apparatus is 60FPS, the image resolution of the first image pickup apparatus is 720p (Progressive), and the image resolution of the Second image pickup apparatus is 240p.

When the driver is in the driving position of the vehicle 100, the face of the driver should be located within the boundary line of the fields of view of the two image capturing devices in the vehicle 100, i.e., the video captured by the two image capturing devices typically includes the face of the driver (e.g., the front face, etc.), and the face of the driver is preferably located in the center region of the fields of view of the two image capturing devices.

The two image capturing devices in the vehicle 100 are time-synchronized and controlled to start video capturing at the same time point, and video frames respectively captured by the two image capturing devices can be provided to a DMS (Driver Monitor System, driver monitoring system) provided in the vehicle 100 in real time. The DMS may acquire image blocks including the eyes of the driver from each video frame acquired by the first image pickup device, and acquire image blocks including the eyes of the driver from each video frame acquired by the second image pickup device, respectively, and may perform the fatigue state detection processing of the target object once using one image block from the first image pickup device and six image blocks from the second image pickup device.

When the fatigue state of the driver in a period of time is determined to be in a preset mild fatigue state, the DMS can remind the driver of considering short rest in a mode of words, voice, light, video and the like. When the fatigue state of the driver in a period of time is determined to be in a preset moderate fatigue state, the DMS can remind the driver of short rest through words, voice, light, video and the like. When it is determined that the fatigue state of the driver in a certain time range belongs to a preset severe fatigue state, the DMS can carry out emergency warning on the driver in a manner of text, voice, light, video and the like so as to prompt the driver that the current driving behavior is dangerous driving behavior, and the driver must carry out a short rest and then drive on the road so as to ensure the driving safety of the vehicle 100.

Exemplary method

FIG. 2 is a flow chart of one embodiment of a method of detecting a state of a target object of the present disclosure. The method as shown in fig. 2 includes: s200, S201, and S202. The steps are described separately below.

S200, acquiring an image block containing a preset part of a target object from a first video frame acquired by a first image pickup device to obtain a first image block.

The first image pickup apparatus in the present disclosure may include, but is not limited to: an RGB-based imaging device, an infrared-based imaging device, or the like. The target object in the present disclosure may be regarded as an object that needs state detection. For example, the target object may be a driver or a duty person, etc. The predetermined portion in the present disclosure may refer to a partial region on the target object, for example, the predetermined portion may be a face or eyes or a mouth or hands, or the like.

The present disclosure may obtain a first image block from a first video frame by using a neural network or the like, for example, performing face detection processing on the first video frame through the neural network for face detection, and cutting the first video frame according to the result of the face detection processing to obtain the first image block.

S201, acquiring an image block containing a preset part of a target object from a second video frame acquired by a second image pickup device to acquire a second image block.

The second image pickup apparatus in the present disclosure may include, but is not limited to: an RGB-based imaging device, an infrared-based imaging device, or the like. The first image capturing device and the second image capturing device may be the same type of image capturing device, for example, the first image capturing device and the second image capturing device are both RGB-based image capturing devices or both infrared-based image capturing devices. Of course, the present disclosure also does not exclude the case where the first image pickup device and the second image pickup device are different types of image pickup devices.

The present disclosure may obtain the second image block from the second video frame by using a neural network or the like, for example, performing face detection processing on the second video frame through the neural network for face detection, and cutting the second video frame according to the result of the face detection processing to obtain the second image block.

The present disclosure may obtain a plurality of second image blocks from a plurality of video frames acquired by the second image capturing apparatus, where each second image block includes a predetermined portion of the target object, that is, one first video frame may correspond to the plurality of second video frames, that is, the present disclosure may obtain a plurality of second image blocks from the plurality of second video frames while obtaining one first image block from one first video frame, that is, one first image block may correspond to the plurality of second image blocks.

The acquisition time of a plurality of second video frames corresponding to a first video frame in the present disclosure all belongs to a predetermined time range, and the predetermined time range is determined by the acquisition time point of the first video frame and the acquisition time point of a third video frame. In one example, the collection time point of the first video frame is the minimum value in the predetermined time range, and the collection time point of the third video frame is the maximum value in the predetermined time range, where all the collection time points of the second video frames corresponding to the first video frame should be greater than or equal to the collection time point of the first video frame and less than the collection time point of the third video frame. The third video frame in the present disclosure is a video frame acquired by the first image pickup device. The first video frame and the third video frame may be separated by a predetermined number of frames, and the predetermined number of frames may be an integer of 0 or more. In one example, the first video frame and the third video frame may be two video frames adjacent to each other one after the other, that is, a predetermined number of frames separated between the first video frame and the third video frame is 0.

The frame rate of the first image capturing apparatus in the present disclosure is lower than the frame rate of the second image capturing apparatus, and the frame rate of the second image capturing apparatus is typically an integer multiple of the frame rate of the first image capturing apparatus. In one example, the frame rate of the first camera is 10FPS and the frame rate of the second camera is 60FPS, at which time one first video frame in the present disclosure may correspond to six second video frames.

The image resolution of the first image capturing device in the present disclosure is generally different from the image resolution of the second image capturing device. For example, the image resolution of the first image pickup device is higher than that of the second image pickup device. The image resolution of the first image capturing device may be an integer multiple of the image resolution of the second image capturing device. In one example, the image resolution of the first image capturing device is 720p, and the image resolution of the second image capturing device is 240p.

S202, detecting the state of the target object based on the first image block and the second image block.

The state of the detection target object in the present disclosure may include, but is not limited to: detecting a fatigue state of the target object, and the like. The present disclosure may perform the state detection processing of the target object based on the frame rate of the first image capturing apparatus. That is, the present disclosure may perform the state detection processing of the target object once every time one first video frame is generated, i.e., the number of times the present disclosure performs the state detection operation of the target object per unit time is determined by the frame rate of the first image capturing apparatus, not by the frame rate of the second image capturing apparatus.

The frame rate of the second camera device is higher than that of the first camera device, so that the second camera device is beneficial to avoiding the blurring phenomenon of the image content of the second video frame caused by factors such as light change, target object posture change and the like in the process of acquiring the second video frame, and the first video frame acquired by the first camera device possibly has the problem of poor image content definition caused by factors such as light change, target object posture change and the like, so that the defect of the first image block on the image content definition can be compensated by the second image block containing the target object preset part in the second video frame acquired by the second camera device, and the influence of noise points in the first image block on the state detection of the target object can be avoided; in the present disclosure, when performing state detection using the first image block and the second image block, since the first image pickup device may employ an image pickup device having a lower frame rate, the number of images required to perform the state detection operation per unit time may be small, for example, if the frame rate of the first image pickup device is 10FPS and the frame rate of the second image pickup device is 60FPS, the number of images for performing the state detection operation within one second may be 10 instead of 60; therefore, the method and the device can complete the state detection operation of the target object without consuming more computing resources. Therefore, the technical scheme provided by the disclosure is beneficial to realizing the compensation of the definition of the image content by using the second video frame acquired by the second camera device under the condition of consuming less calculation resources, so that the accuracy of the state detection result of the target object can be ensured.

In an optional example, in an application scenario such as powering up or restarting the second image capturing device in the present disclosure, a preset default optical axis direction is generally used as an optical axis current direction, and the present disclosure may perform an optical axis direction update process on the optical axis current direction of the second image capturing device, and after the optical axis direction update process, a video frame acquired by the second image capturing device is used as a second video frame in the present disclosure. The present disclosure performs an optical axis direction update process on the current optical axis direction of the second image capturing apparatus, which may be regarded as an initialization process on the current optical axis direction of the second image capturing apparatus, and the current optical axis direction after the initialization process generally does not change before the next initialization process.

Optionally, the initializing the process of the optical axis current pointing direction of the second image capturing apparatus according to the present disclosure may include: and determining the optical axis target pointing direction of the second image pickup device according to the position of the target object preset part in the fourth video frame acquired by the first image pickup device, wherein the optical axis current pointing direction of the second image pickup device is changed into the optical axis target pointing direction. That is, the present disclosure initializes the current orientation of the optical axis of the second image capturing apparatus with a video frame (referred to herein as a fourth video frame) captured by the first image capturing apparatus.

Optionally, the acquisition time point of the fourth video frame in the present disclosure is earlier than the acquisition time point of the first video frame. In one example, after the first image capturing device and the second image capturing device are powered on simultaneously, the present disclosure may use the 1 st video frame acquired by the first image capturing device and including the predetermined portion of the target object (such as the eyes of the driver) as the fourth video frame. In another example, after the first image capturing device and the second image capturing device are powered on simultaneously, the present disclosure may use the 1 st video frame, which includes the predetermined portion of the target object (such as the eyes of the driver) and has the image sharpness meeting the preset requirement, collected by the first image capturing device as the fourth video frame. After determining that the optical axis target of the second image capturing device is pointed by using the fourth video frame, the present disclosure may control the image capturing device movement control device such as the motor to execute the corresponding action by forming and outputting the corresponding control command, so as to make the second image capturing device mechanically move, and further make the optical axis of the second image capturing device point to the optical axis target actually.

In one example, the disclosure may first obtain an image area where a predetermined portion of a target object in a fourth video frame acquired by the first image capturing device is located, and determine a position where an optical axis of the second image capturing device is currently pointed in the fourth video frame, and the disclosure may determine, according to the image area and the position, that the optical axis target of the second image capturing device is pointed.

According to the method and the device, the optical axis target direction of the second camera device is determined according to the position of the target object preset part in the fourth video frame acquired by the first camera device, so that the optical axis target direction of the second camera device is associated with the target object preset part, for example, the optical axis target direction of the second camera device can be enabled to be in the position range of the target object preset part as far as possible, and therefore the target object preset part is enabled to be presented at the obvious position of the second video frame acquired by the second camera device as far as possible, phenomena that the target object preset part cannot be completely presented in the second video frame or is positioned at the edge position of the second video frame and the like are reduced as far as possible, influence on the state detection result of the target object is reduced, and the accuracy of the state detection result of the target object is improved.

In an alternative example, a specific example of the present disclosure for determining the optical axis target pointing direction of the second image capturing apparatus is shown in fig. 3.

In fig. 3, S300, an image block including a predetermined portion of the target object is acquired from a fourth video frame acquired by the first image capturing device, and a third image block is acquired.

Optionally, the present disclosure may use a neural network or the like to obtain the third image block from the fourth video frame acquired by the first image capturing device, for example, perform face detection processing on the fourth video frame through the neural network for face detection, obtain, according to a result of the face detection processing, a position of the predetermined portion of the target object in the fourth video frame, and cut the fourth video frame according to the position, so as to obtain the third image block. In one example, the third image patch in the present disclosure may be an eye-based image patch, i.e., the predetermined location is the eye.

S301, determining the optical axis target pointing direction of the second image pickup device according to the pixel points in the third image block.

Optionally, the present disclosure may determine the optical axis target pointing direction of the second image capturing device according to a pixel point in the third image block, that is, when mapping the optical axis target pointing direction of the second image capturing device into the third image block, the optical axis target pointing direction of the second image capturing device should have a certain association with the pixel point in the third image block, for example, when mapping the optical axis target pointing direction of the second image capturing device into the third image block, the position where the optical axis target pointing direction of the second image capturing device is located may be the position where the pixel point in the third image block is located.

The method and the device have the advantages that the third image block is obtained from the fourth video frame, the optical axis target direction of the second image pickup device is determined based on the pixel points in the third image block, and the optical axis target direction of the second image pickup device can be associated with the pixel points in the third image block, so that the optical axis target direction of the second image pickup device can be close to a designated position in the target object preset position, the designated position in the target object preset position can be presented at the obvious position of the second video frame acquired by the second image pickup device as much as possible, the phenomenon that the target object preset position cannot be completely presented in the second video frame or is positioned at the edge position of the second video frame can be reduced as much as possible, the influence on the state detection result of the target object can be reduced, and the accuracy of the state detection result of the target object can be improved.

In an alternative example, an example of determining the optical axis target pointing direction of the second image capturing device using the pixel points in the third image block is shown in fig. 4.

In fig. 4, S400, a center position pixel point in the third image block is determined.

Optionally, in the case that the number of pixels in the width direction and the height direction of the third image block is an odd number greater than zero, the present disclosure may use a pixel point where an intersection point of two diagonal lines of the third image block is located as a center position pixel point in the third image block. In the case where the number of pixels in the width direction or the height direction of the third image block is an even number greater than zero, the present disclosure may use any one of the pixel points in the third image block adjacent to the intersection of the two diagonal lines of the third image block as the center position pixel point in the third image block.

S401, determining that an optical axis target of the second image pickup device points to a target pointing pixel point in an image coordinate system of the first image pickup device based on the central position pixel point.

Alternatively, the image coordinate system of the first image capturing device may refer to a plane coordinate system of an image (such as a photograph or a video frame) captured by the first image capturing device. For convenience of description, the present disclosure refers to an image coordinate system of the first image capturing apparatus simply as a first image coordinate system. The origin of the first image coordinate system may be located in the upper left corner of the video frame captured by the first camera device. In one example, the width direction of the video frame collected by the first image capturing device is the X-axis direction of the first image coordinate system, and the height direction of the video frame collected by the first image capturing device is the Y-axis direction of the first image coordinate system. In one example, the present disclosure may directly point the center position pixel point as the optical axis target of the second image capturing device to the target-pointing pixel point in the first image coordinate system. In another example, the disclosure may target a pixel point around the center pixel point as the optical axis of the second image capturing device to the target pointing pixel point in the first image coordinate system.

S402, determining the optical axis translation amount of the second image pickup device according to the target pointing pixel point and the current pointing pixel point of the optical axis of the second image pickup device in the image coordinate system of the first image pickup device.

Optionally, the current pointing pixel point in the image coordinate system of the first image capturing device, where the optical axis of the second image capturing device in the present disclosure is currently pointed, may refer to: after the current direction of the optical axis of the second image pickup device is mapped to the image coordinate system of the first image pickup device, the current direction of the optical axis of the second image pickup device is directed to the pixel point. For example, the current orientation of the optical axis of the second image capturing device may correspond to a pixel point in the image coordinate system of the second image capturing device, and the pixel point may be converted into the image coordinate system of the first image capturing device based on the relative positional relationship between the first image capturing device and the second image capturing device, so as to obtain the current orientation of the optical axis of the second image capturing device in the image coordinate system of the first image capturing device.

Alternatively, the optical axis shift amount in the present disclosure may refer to a shift amount of the optical axis in the horizontal direction and a shift amount of the optical axis in the vertical direction. The horizontal direction and the vertical direction to which the translation amount in the present disclosure refers may refer to the horizontal direction and the vertical direction based on the world coordinate system. The present disclosure may determine an amount of optical axis translation of the second image capturing device based on a pixel distance between a target pointing pixel point and a current pointing pixel point of the second image capturing device in an image coordinate system of the first image capturing device.

Alternatively, the optical axis translation in the present disclosure may have a sign, which may indicate the direction of the optical axis translation. In an example, when the amount of translation of the current optical axis in the horizontal direction is negative, the direction indicating the amount of translation of the optical axis is to the left of the current pointing direction of the optical axis; the translation amount of the current optical axis in the horizontal direction is positive, and the direction representing the translation amount of the optical axis is the right direction to which the optical axis is currently pointed; when the translation amount of the current optical axis in the vertical direction is negative, the direction representing the translation amount of the optical axis is the upper direction to which the optical axis points currently; the current translation amount of the optical axis in the vertical direction is positive, and the direction indicating the translation amount of the optical axis is the lower direction to which the optical axis is currently directed.

Optionally, an example of a specific procedure of determining the optical axis translation amount of the second image capturing apparatus according to the present disclosure is as follows:

First, a first pixel distance in a horizontal direction and a second pixel distance in a vertical direction between coordinates of a target pointing pixel point and a current pointing pixel point are determined. The horizontal direction and the vertical direction herein refer to the horizontal direction and the vertical direction of the image coordinate system of the first image pickup device. More specifically, the horizontal direction here may be a wide direction of the first video frame captured by the first image capturing apparatus, and the vertical direction here may be a high direction of the first video frame captured by the first image capturing apparatus. The present disclosure may use a difference between an x-coordinate of a target pointing pixel point and an x-coordinate of a current pointing pixel point as a first pixel distance, and a difference between a y-coordinate of the target pointing pixel point and a y-coordinate of the current pointing pixel point as a second pixel distance.

Next, an optical axis horizontal shift amount and an optical axis vertical shift amount of the second image pickup device are determined based on the first pixel distance, the second pixel distance, and a unit pixel shift amount of the optical axis stored in advance. That is, the present disclosure is provided with a unit pixel shift amount of an optical axis in advance, which may include: a horizontal direction unit pixel shift amount of the optical axis and a vertical direction unit pixel shift amount of the optical axis. The unit pixel shift amount in the horizontal direction of the optical axis may refer to a shift amount in the horizontal direction of the world coordinate system in which the optical axis of the second image pickup device is currently directed when the optical axis of the second image pickup device is shifted by one unit pixel in the horizontal direction of the image coordinate system of the first image pickup device. The unit pixel shift amount in the vertical direction of the optical axis may refer to a shift amount in the vertical direction of the world coordinate system in which the optical axis of the second image pickup device is currently directed when the optical axis of the second image pickup device is shifted by one unit pixel in the vertical direction of the image coordinate system of the first image pickup device. Here, one unit pixel may refer to one basic unit formed of one pixel or a plurality of pixels. In the case where one unit pixel is a basic unit formed of one pixel, the present disclosure may take the product of the first pixel distance and the horizontal direction unit pixel movement amount of the optical axis as the optical axis horizontal translation amount of the second image pickup device, and take the product of the second pixel distance and the vertical direction unit pixel movement amount of the optical axis as the optical axis vertical translation amount of the second image pickup device. In the case where one unit pixel is a basic unit formed of a plurality of pixels, the present disclosure may first convert the first pixel distance and the second pixel distance into unit-pixel-based magnitudes, respectively, and obtain the optical axis horizontal shift amount and the optical axis vertical shift amount of the second image pickup device using the products of the magnitudes obtained by the conversion and the respective unit-pixel shift amounts.

Alternatively, the present disclosure may set the unit pixel shift amount using a calibration plate. Specifically, the unit pixel shift amount in the present disclosure is set according to the number of pixels between two corner points and the optical axis shift amount of the second image pickup device when the optical axis of the second image pickup device is shifted from one corner point to another corner point on a calibration board (e.g., a checkerboard calibration board). The present disclosure may store the unit pixel movement amount in the storage unit of the second image pickup device, or may store the unit pixel movement amount in the DMS or the like. An example of the present disclosure for setting the amount of unit pixel shift using the checkerboard calibration plate can be seen from the following description with respect to fig. 5.

S403, determining the optical axis target pointing direction of the second image pickup device according to the optical axis translation amount.

Optionally, the current optical axis direction and the optical axis translation of the second image capturing device in the present disclosure may uniquely determine an optical axis target direction. The method and the device can form corresponding control commands according to the optical axis translation amount, and the control commands can be control commands aiming at camera device movement control equipment such as a motor, so that the camera device movement control equipment can generate corresponding actions based on the control commands, so that the second camera device generates mechanical movement, and finally the current optical axis direction of the second camera device is promoted to be changed into the optical axis target direction.

According to the method and the device, the optical axis target direction of the second camera device is determined based on the central position pixel point in the third image block, so that the optical axis target direction of the second camera device is associated with the central point of the target object preset position, for example, the phenomenon that the optical axis target of the second camera device is directed to the central position of the target object preset position as far as possible can be realized, the target object preset position is presented in the central area of the second video frame collected by the second camera device as far as possible, for example, the eyes of a driver are positioned in the central area of the second video frame collected by the second camera device as far as possible no matter the height of the driver is positioned in the cockpit or the distance of the driver from the second camera device, the influence on the state detection result of the target object is reduced as far as possible, and the state detection result of the target object is improved. The optical axis translation amount of the second image pickup device is determined by utilizing the difference among the target pointing pixel points, the current pointing pixel points and the unit pixel movement amount of the optical axis, and the optical axis target pointing of the second image pickup device is determined based on the optical axis translation amount, so that an easy-to-implement implementation manner is provided for adjusting the optical axis target pointing of the second image pickup device, and the usability of the technical scheme is improved.

Assuming that the image resolution of the first image capturing device is 720p and the image resolution of the second image capturing device is 240p, the relative positions of the first image capturing device and the second image capturing device are fixed, and the optical axis orientation of the second image capturing device can be changed. The checkerboard calibration plate should be fixedly arranged in the view angle range of the first image pickup device and the second image pickup device, and the first image pickup device and the second image pickup device can respectively pick up the first image and the second image for the checkerboard calibration plate, for example, the image 501 in fig. 5 is a part of the first image, and the image 502 in fig. 5 is a part of the second image. The present disclosure causes an optical axis of a second image pickup device to be currently directed to a first corner point in an alignment checkerboard calibration plate to take a picture, and then causes a mechanical movement to be generated by controlling the second image pickup device, causes the optical axis of the second image pickup device to be currently directed to a second corner point (the first corner point and the second corner point are two different corner points) in the alignment checkerboard calibration plate, and performs an image taking operation. The present disclosure can obtain an optical axis translation amount by which the optical axis of the second image pickup device moves from the first corner point to the second corner point, that is, a first optical axis horizontal translation amount and a first optical axis vertical translation amount, from the mechanical movement of the second image pickup device. The present disclosure may obtain a pixel distance in the horizontal direction (i.e., a third pixel distance) and a pixel distance in the vertical direction (i.e., a fourth pixel distance) between the first corner and the second corner from the image 501, for example, assuming that the coordinates of the first corner in the first image coordinate system are (x 1, y 1) and the coordinates of the second corner in the first image coordinate system are (x 2, y 2), the third pixel distance is x2-x1 and the fourth pixel distance is y2-y1. The present disclosure may use an absolute value of a ratio of the first optical axis horizontal shift amount to the third pixel distance and an absolute value of a ratio of the first optical axis vertical shift amount to the fourth pixel distance as the unit pixel shift amount of the second image pickup device.

In an alternative example, an example of acquiring an image block containing a predetermined portion of a target object from a first video frame acquired by a first image capturing apparatus is shown in fig. 6.

In fig. 6, S600, a first video frame is subjected to keypoint detection of a target object, and at least one keypoint is obtained.

Optionally, the keypoint detection of the target object in the present disclosure may be face keypoint detection, that is, the present disclosure may perform face keypoint detection on the first video frame, thereby obtaining a plurality of face keypoints. For example, the present disclosure may perform a face key point detection process on the first video frame 700 shown in fig. 7 through a neural network for face key point detection, and obtain a plurality of face key points of a target object based on the result of the detection process.

S601, determining a region where a target object in a first video frame is located according to at least one key point.

Alternatively, the present disclosure may determine the minimum x-coordinate, the maximum x-coordinate, the minimum y-coordinate, and the maximum x-coordinate of the coordinates of all currently obtained key points (i.e., the two-dimensional coordinates based on the planar coordinate system of the first video frame 700), and form the coordinates of four vertices, for example, the coordinates of the first vertex 701, the second vertex 702, the third vertex 703, and the fourth vertex 704 in fig. 7, using the minimum x-coordinate, the maximum x-coordinate, the minimum y-coordinate, and the maximum x-coordinate. The method and the device can directly take the region formed by the coordinates of the four vertexes as the region where the target object in the first video frame is located.

S602, determining the area of the target object according to the area of the target object, and obtaining a first image block.

Optionally, the present disclosure may determine an area where the target object is located in an area where the target object is located according to a relative positional relationship between the predetermined portion and the target object, for example, the present disclosure may determine an area where the eyes are located in an area where the face is located according to a relative positional relationship between the eyes and the face (for example, the height of the eyes is one seventh of the height of the face, the lowest end of the eyes is located at one fifth of the position of the face, etc.). The method and the device can cut the first video frame based on the area where the target object preset part is located, so that an image block containing the target object preset part is obtained, namely, the first image block is obtained.

According to the method and the device, the region where the target object is located is obtained through detection of the key points, the image block containing the preset part of the target object is determined through the region where the target object is located, an implementation mode which is easy to implement and accurate in result is provided for obtaining the first image block, and therefore usability of the technical scheme of the method and the device is improved.

Optionally, for a specific process of obtaining the second image block in the present disclosure, reference may be made to the description of fig. 6 and fig. 7, for example, the present disclosure may first perform keypoint detection of the target object on the second video frame to obtain at least one keypoint, then determine, according to the at least one keypoint, an area where the target object in the second video frame is located, and finally determine, according to the area where the target object is located, an area where a predetermined portion of the target object is located, to obtain the second image block. The specific process is not described in detail here.

In an alternative example, one example of detecting the state of the target object based on the first image block and the second image block is shown in fig. 8.

In fig. 8, S800, fusion processing is performed on the first image block and the second image block, and a fourth image block including a predetermined portion of the target object is obtained.

Alternatively, the fusion process in the present disclosure may refer to a process of forming one image block from a plurality of image blocks. The method and the device can perform fusion processing on a first image and a plurality of second images corresponding to the first image, so as to obtain an image block containing a preset part of a target object, namely a fourth image block. The present disclosure is not limited to a particular implementation of the fusion process, and one manner of fusion process provided by the present disclosure is described below with respect to fig. 9.

S801, determining a state of the target object according to the image sequence formed by the fourth image block.

Optionally, each first video frame in the disclosure corresponds to a fourth image block, and all the fourth image blocks may be arranged according to the respective acquisition time points of the corresponding first video frames, so as to form an image sequence, for example, an eye image sequence may be formed, and the disclosure may obtain the state of the target object by performing a state detection process on the image sequence. For example, the present disclosure performs fatigue state detection processing on an eye image sequence, so that a fatigue state of a target object, such as a non-fatigue state, a mild fatigue state, a moderate fatigue state, or a severe fatigue state, can be obtained.

Optionally, the fatigue state detection process of the present disclosure may include: human eye key point detection, corresponding distance calculation and area calculation based on the human eye key points and other operations, the specific implementation mode of fatigue state detection processing is not limited in the present disclosure.

The frame rate of the second camera device is higher than that of the first camera device, so that the second camera device is beneficial to avoiding the blurring phenomenon of the second video frame image caused by factors such as light change, target object posture change and the like in the process of acquiring the second video frame, and the fusion of a plurality of second image blocks containing target object preset parts in a plurality of second video frames acquired by the second camera device and a second image block containing target object preset parts in a first video frame acquired by the first camera device is beneficial to eliminating noise points in the first image block, so that a third image block containing the target object preset parts after fusion processing has better definition; further, in the present disclosure, when performing state detection processing such as fatigue state detection by using such an image block, since the first image capturing device may use an image capturing device with a lower frame rate, the number of images to be processed per unit time is smaller, so that the present disclosure may complete subsequent processing of images without consuming more computing resources. Therefore, the technical scheme provided by the disclosure is beneficial to improving the definition of the image acquired by the low-frame-rate image pickup device, so that the contradiction between the definition of the image and the consumption of computing resources is avoided when the image is subjected to subsequent processing such as fatigue detection.

In an alternative example, the present disclosure performs fusion processing on the first image block and the second image block, and an example of obtaining a fourth image block containing a predetermined portion of the target object is shown in fig. 9.

In fig. 9, S900, for any pixel to be processed in the first image block, a pixel corresponding to the pixel to be processed is obtained from the second image block, and the pixel to be fused is obtained.

Optionally, assuming that the pixel to be processed is image frame information for representing a corresponding position in the first image block, the pixel corresponding to the pixel to be processed in the present disclosure may refer to image frame information for representing a position corresponding to the pixel to be processed in the second image block.

Alternatively, since the image resolutions of the first image pickup device and the second image pickup device are different, the size of the first image block and the size of the second image block are generally different, and the size of the second image block is generally smaller than the size of the first image block. The size of the second image block can be converted into the size of the first image block, so that the pixel points at the same position in the first image block and the second image block are the pixel points to be processed and the pixel points corresponding to the pixel points to be processed (i.e. the pixel points to be fused). In one example, the present disclosure may perform a conversion process on each of the second image blocks using a gaussian kernel function, respectively, to convert the second image blocks into image blocks having the same size as the first image blocks.

Optionally, in the case that one first image block corresponds to a plurality of second image blocks, the disclosure may obtain, from each second image block corresponding to the first image block, a pixel point corresponding to a pixel point to be processed, so as to obtain a plurality of pixel points to be fused.

And S901, performing pixel value weighted calculation on the pixel points to be processed and the pixel points to be fused to obtain the pixel value of the pixel points after fusion processing.

Optionally, when the first image block and the second image block are both image blocks based on a single channel, the present disclosure may directly perform pixel value weighted calculation on the pixel point to be processed and the pixel point to be fused, to obtain a pixel value of the pixel point after the fusion processing. When the first image block and the second image block are both image blocks based on multiple channels, the method and the device perform weighted calculation on pixel values of the pixel points to be processed and the same channel of the pixel points to be fused, and obtain a weighted calculation result of each channel, so that the pixel values of the pixel points after the fusion processing are obtained. In addition, the weights of the pixel points to be processed and the pixel points to be fused in the disclosure may be equal or unequal.

S902, obtaining a fourth image block containing a preset part of the target object based on the pixel values of the plurality of pixel points after the fusion processing.

Optionally, after weighting calculation is performed on each pixel point in the first image block, all the fused pixel points form a fourth image block.

According to the method and the device, the pixel points to be processed in the first image block and the pixel values of the pixel points to be fused in the second image block are subjected to weighted calculation, so that noise points in the fourth image block can be reduced through the second image block, the fourth image block has better definition, and the accuracy of state detection is improved.

Exemplary apparatus

Fig. 10 is a schematic structural diagram of an embodiment of a state detection device of a target object of the present disclosure.

The apparatus of this embodiment may be used to implement the corresponding method embodiments of the present disclosure. The apparatus as shown in fig. 10 includes: a first acquisition module 1000, a second acquisition module 1001 and a state detection module 1002. The apparatus of this embodiment may optionally further include: the target pointing direction is determined 1003.

The first obtaining module 1000 is configured to obtain, from a first video frame acquired by the first image capturing device, an image block including a predetermined portion of the target object, and obtain a first image block.

The second obtaining module 1001 is configured to obtain, from a second video frame collected by a second image capturing device, an image block including a predetermined portion of the target object, to obtain a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of the first video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and separated from the first video frame by a preset frame number.

The state detection module 1002 is configured to detect a state of the target object based on the first image block obtained by the first acquisition module 1000 and the second image block obtained by the second acquisition module 1001.

The determining target direction 1003 is used for determining the optical axis target direction of the second camera according to the position of the target object preset part in the fourth video frame acquired by the first camera. Wherein the acquisition time point of the fourth video frame is earlier than the acquisition time point of the first video frame.

Determining the target pointing direction 1003 may include: a first sub-module 10031 and a second sub-module 10032. The first sub-module 10031 is configured to obtain an image block including a predetermined portion of the target object from a fourth video frame acquired by the first image capturing device, and obtain a third image block. The second sub-module 10032 is configured to determine an optical axis target pointing direction of the second image capturing device according to the pixel point in the third image block obtained by the first sub-module 10031.

Optionally, the second submodule 10032 may include: the first unit, the second unit, the third unit, and the fourth unit (not shown in fig. 10). The first unit is used for determining a central position pixel point in the third image block. The second unit is used for determining that an optical axis target of the second image pickup device points to a target pointing pixel point in an image coordinate system of the first image pickup device based on the central position pixel point. And the third unit is used for determining the optical axis translation amount of the second image pickup device according to the target pointing pixel point and the current pointing pixel point of the optical axis of the second image pickup device in the image coordinate system. And the fourth unit is used for determining the optical axis target pointing direction of the second image pickup device according to the optical axis translation amount.

Optionally, the third unit is further configured to determine a first pixel distance in a horizontal direction and a second pixel distance in a vertical direction between the coordinates of the target pointing pixel point and the current pointing pixel point, and determine an optical axis horizontal translation amount and an optical axis vertical translation amount of the second image capturing device according to the first pixel distance, the second pixel distance, and a prestored unit pixel movement amount of the optical axis. The unit pixel moving amount is set according to the number of pixels between two corner points and the moving amount of the second image pickup device when the optical axis of the second image pickup device moves from one corner point to the other corner point on the calibration plate.

Optionally, the first obtaining module 1000 may include: a third submodule 10001, a fourth submodule 10002 and a fifth submodule 10003. The third submodule 10001 is configured to detect a key point of the target object for the first video frame, and obtain at least one key point. The fourth submodule 10002 is configured to determine, according to the at least one key point, an area where the target object in the first video frame is located. The fifth submodule 10003 is configured to determine, according to the area where the target object is located, an area where a predetermined portion of the target object is located, and obtain a first image block.

Alternatively, the state detection module 1002 may include: sixth submodule 10021 and seventh submodule 10022. The sixth submodule 10021 is configured to perform fusion processing on the first image block and the second image block, and obtain a fourth image block including the predetermined portion of the target object. The seventh submodule 10022 is configured to determine a state of the target object according to the image sequence formed by the fourth image block.

Optionally, the sixth submodule 10021 may include: a fifth unit, a sixth unit, and a seventh unit (not shown in fig. 10). The fifth unit may be configured to obtain, for any pixel to be processed in the first image block, a pixel corresponding to the pixel to be processed from the second image block, and obtain a pixel to be fused. And the sixth unit is used for carrying out pixel value weighted calculation on the pixel points to be processed and the pixel points to be fused to obtain the pixel value of the pixel points after the fusion processing. The seventh unit is configured to obtain a fourth image block including the predetermined portion of the target object based on the pixel values of the plurality of pixel points after the fusion processing.

Exemplary electronic device

An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 11. Fig. 11 shows a block diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 11, the electronic device 111 includes one or more processors 1111 and memory 1112.

The processor 1111 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities and may control other components in the electronic device 111 to perform the desired functions.

Memory 1112 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example: random Access Memory (RAM) and/or cache, etc. The nonvolatile memory may include, for example: read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 11 to implement the method of detecting the state of a target object and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.

In one example, the electronic device 111 may further include: input devices 1113, output devices 1114, and so forth, interconnected by a bus system and/or other form-factor connection mechanism (not shown). In addition, the input device 1113 may also include, for example, a keyboard, a mouse, and the like. The output device 1114 can output various information to the outside. The output devices 1114 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device 111 relevant to the present disclosure are shown in fig. 11 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device 111 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer readable storage Medium

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a method of state detection of a target object according to various embodiments of the present disclosure described in the above "exemplary methods" section of the present description.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform the steps in a method of state detection of a target object according to various embodiments of the present disclosure described in the above section of the "exemplary method" of the present disclosure.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatus, devices, and systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, changes, additions, and sub-combinations thereof.

Claims

1. A method for detecting a state of a target object, comprising:

acquiring an image block containing a preset part of a target object from a first video frame acquired by a first camera device to acquire a first image block;

acquiring an image block containing a preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of the first video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, the image resolution of the first camera device is higher than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and separated from the first video frame by a preset frame number;

Based on the first image block and the second image block, a state of the target object is detected.

2. The method of claim 1, wherein prior to the step of obtaining the first image block, the method further comprises:

Determining the optical axis target pointing direction of the second camera according to the position of the target object preset part in the fourth video frame acquired by the first camera;

wherein the acquisition time point of the fourth video frame is earlier than the acquisition time point of the first video frame.

3. The method of claim 2, wherein the determining the optical axis target orientation of the second camera based on the position of the target object predetermined location in the fourth video frame acquired by the first camera comprises:

acquiring an image block containing a preset part of a target object from a fourth video frame acquired by the first camera device to acquire a third image block;

and determining the optical axis target pointing direction of the second image pickup device according to the pixel points in the third image block.

4. A method according to claim 3, wherein said determining the optical axis target pointing direction of the second image capturing device from the pixel points in the third image block comprises:

Determining a central position pixel point in the third image block;

determining that an optical axis target of the second image pickup device points to a target-pointing pixel point in an image coordinate system of the first image pickup device based on the central-position pixel point;

determining the optical axis translation amount of the second image pickup device according to the target pointing pixel point and the current pointing pixel point of the optical axis of the second image pickup device in the image coordinate system;

And determining the optical axis target pointing direction of the second image pickup device according to the optical axis translation amount.

5. The method of claim 4, wherein the determining the amount of optical axis translation of the second image capture device from the target pointing pixel point and the current pointing pixel point in the image coordinate system to which the optical axis of the second image capture device is currently pointed comprises:

determining a first pixel distance in a horizontal direction and a second pixel distance in a vertical direction between the coordinates of the target pointing pixel point and the current pointing pixel point;

Determining an optical axis horizontal translation amount and an optical axis vertical translation amount of the second image pickup device according to the first pixel distance, the second pixel distance and a prestored unit pixel movement amount of the optical axis;

the unit pixel moving amount is set according to the number of pixels between two corner points and the moving amount of the second image pickup device when the optical axis of the second image pickup device moves from one corner point to the other corner point on the calibration plate.

6. The method according to any one of claims 1 to 5, wherein the acquiring an image block including a predetermined portion of the target object from the first video frame acquired by the first image capturing device, the acquiring a first image block, includes:

Detecting key points of a target object on the first video frame to obtain at least one key point;

determining a region where a target object in the first video frame is located according to the at least one key point;

And determining the area of the target object according to the area of the target object, and obtaining a first image block.

7. The method of any of claims 1-6, wherein the detecting the state of the target object based on the first image block and the second image block comprises:

Performing fusion processing on the first image block and the second image block to obtain a fourth image block containing the preset part of the target object;

and determining the state of the target object according to the image sequence formed by the fourth image block.

8. A state detection apparatus of a target object, comprising:

The first acquisition module is used for acquiring an image block containing a preset part of a target object from a first video frame acquired by the first camera device to acquire a first image block;

The second acquisition module is used for acquiring an image block containing the preset part of the target object from a second video frame acquired by a second camera device to acquire a second image block; the acquisition time of the second video frame belongs to a preset time range, the preset time range is determined by the acquisition time point of the first video frame and the acquisition time point of the first video frame acquired by the first camera device, the frame rate of the first camera device is lower than that of the second camera device, the image resolution of the first camera device is higher than that of the second camera device, and the third video frame is a video frame which is acquired by the first camera device and separated from the first video frame by a preset frame number;

And the state detection module is used for detecting the state of the target object based on the first image block acquired by the first acquisition module and the second image block acquired by the second acquisition module.

9. A computer readable storage medium storing a computer program for performing the method of any one of the preceding claims 1-7.

10. An electronic device, the electronic device comprising:

A processor;

A memory for storing the processor-executable instructions;

The processor being configured to read the executable instructions from the memory and execute the instructions to implement the method of any of the preceding claims 1-7.