CN115082968A

CN115082968A - Behavior identification method based on infrared light and visible light fusion and terminal equipment

Info

Publication number: CN115082968A
Application number: CN202211013357.4A
Authority: CN
Inventors: 李月忠
Original assignee: Tianjin Ruijin Intelligent Technology Co ltd
Current assignee: Tianjin Ruijin Intelligent Technology Co ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-09-20
Anticipated expiration: 2042-08-23
Also published as: CN115082968B

Abstract

The application is suitable for the technical field of behavior recognition, and provides a behavior recognition method based on infrared light and visible light fusion and a terminal device, wherein the method comprises the following steps: acquiring at least one registered image group to be identified, wherein the registered image group comprises a registered visible light image and an infrared light image; denoising the visible light images in the registered image groups aiming at each registered image group to obtain a denoised image of the visible light images; fusing the infrared light image in the registered image group with the de-noising image of the visible light image to obtain a fused image of the registered image group under a preset constraint condition; under the preset constraint condition, the pixel difference between the fused image and the infrared light image is minimum, and the gradient difference between the fused image and the de-noised image is minimum; and determining the behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition. The scheme can improve the accuracy of behavior recognition.

Description

Behavior identification method based on infrared light and visible light fusion and terminal equipment

Technical Field

The application belongs to the technical field of behavior recognition, and particularly relates to a behavior recognition method based on infrared light and visible light fusion and a terminal device.

Background

Animal behaviors refer to actions of animals adapted to the environment under the stimulation of the external environment, and the behaviors of the animals may have certain influence on the self reproduction of the animals or the behaviors of other animals, so that the study of the animal behaviors is helpful for understanding the behavior characteristics or requirements of the animals, and the animal caretakers can be assisted to realize the management of the animals. The basis for studying animal behavior is the accurate identification of the behavior of the animal.

The traditional behavior identification method includes the steps that an infrared light image containing animal thermal radiation information is collected through a non-contact infrared light camera, or a visible light image containing object appearance information is collected through a visible light camera device, animal behaviors are identified based on the infrared light image alone, or the animal behaviors are identified based on the visible light image alone, and the behavior identification accuracy of the behavior identification mode is low.

Disclosure of Invention

In view of this, the embodiment of the present application provides a behavior identification method and a terminal device based on fusion of infrared light and visible light, so as to solve the technical problem that the behavior identification accuracy of the existing behavior identification method is low.

In a first aspect, an embodiment of the present application provides a behavior identification method based on fusion of infrared light and visible light, including:

acquiring at least one registered image group to be identified; the registered image group comprises a visible light image and an infrared light image which are registered;

for each registered image group, carrying out denoising treatment on a visible light image in the registered image group to obtain a denoising image of the visible light image;

fusing the infrared light image and the de-noised image to obtain a fused image of the registered image group under a preset constraint condition; under the preset constraint condition, the pixel difference between the fused image and the infrared light image is minimum, and the gradient difference between the fused image and the de-noised image is minimum;

and determining the behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition.

In an optional implementation manner of the first aspect, the denoising the visible light image in the registered image group to obtain a denoised image of the visible light image includes:

determining a horizontal second-order gradient and a vertical second-order gradient of each pixel in the visible light image by adopting a first gradient function based on the gray value of each pixel in the visible light image; the first gradient function is:

VIS _{h i ()} =[1/2(vis _i -vis _{r i()} )+1/2(vis _i -vis _{l i()} )] ² ；

VIS _{v i ()} =[1/2(vis _i -vis _{b i()} )+1/2(vis _i -vis _{o i()} )] ² ；

wherein the content of the first and second substances,VIS _{h i()} is the first in the visible light imageiThe horizontal second order gradient of an individual pixel,VIS _{v i()} is the first in the visible light imageiThe vertical second order gradient of a pixel,vis _i is the first in the visible light imageiThe gray-scale value of each pixel,vis _r(i) to be located at the secondiRight side of each pixel and the firstiThe gray values of the pixels adjacent to the individual pixels,vis _l(i) to be located at the secondiLeft side of each pixel and the second sideiThe gray values of the pixels adjacent to the individual pixels,vis _b(i) to be located at the secondiUnder each pixel and with the firstiThe gray values of the pixels adjacent to the individual pixels,vis _o(i) to be located at the secondiAbove each pixel and with the secondiPixels adjacent to each otherThe gray value of (a);

for each pixel in the visible light image, performing quadratic operation on the sum of the horizontal second-order gradient and the vertical second-order gradient of the pixel to obtain a comprehensive gradient of the pixel;

determining the sum of the comprehensive gradients of all pixels in the visible light image as a denoising adjustment factor;

determining the column vector of the de-noised image by adopting a preset de-noising function based on the column vector of the visible light image, the de-noising adjustment factor and a preset regularization weight; the preset denoising function is as follows:

DeN=Vis+λ*DeN _vis ；

wherein the content of the first and second substances,DeNis a column vector of the de-noised image,Visis a column vector of the visible light image, λ is the preset regularization weight,DeN _vis and adjusting the factor for the denoising.

In an optional implementation manner of the first aspect, the fusing the infrared light image and the denoised image to obtain a fused image of the registered image group under a preset constraint condition includes:

determining a column vector to be adjusted of the fused image by adopting a preset constraint function based on the column vector of the infrared light image and the column vector of the de-noised image; the preset constraint function is as follows:

；

wherein the content of the first and second substances,MIX ^* for the column vector to be adjusted of the fused image,InFis a column vector of the infrared light image +MIX ^* Is a gradient vector of the fused image and,DeN ^* calculating the (| luminance) of the de-noised imageMIX ^* -InF|| ₂ For representingMIX ^* -InFL2 norm, | vMIX ^* - DeN ^* || ₁ For representing +MIX ^* - DeN ^* The norm of L1, λ is a preset regularization weight;

the value of each element in the gradient vector of the fused image is determined by the following formula:

MIX ^* ₁ =[1/2(MIX ^* _i -MIX ^* _{r i()} )+1/2(MIX ^* _i -MIX ^* _{l i()} )] ² ；

MIX ^* ₂ =[1/2(MIX ^* _i -MIX ^* _{b i()} )+1/2(MIX ^* _i -MIX ^* _{o i()} )] ² ；

；

wherein the content of the first and second substances,MIX ^* _i for the second in the column vector to be adjusted of the fused imageiThe gray value of the pixel to which the individual element corresponds,MIX ^* _{r i()} to be located at the secondiPixel right side corresponding to each element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{l i()} to be located at the secondiLeft side of pixel corresponding to each element and corresponding to the secondiThe gray values of pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{b i()} to be located at the secondiPixel under the corresponding element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{o i()} to be located at the secondiPixel above the pixel corresponding to each element and corresponding to the second elementiOf pixels adjacent to the pixel corresponding to the elementGray value;

and carrying out standardization processing on the column vector to be adjusted of the fusion image to obtain the column vector of the fusion image.

In an optional implementation manner of the first aspect, the normalizing the to-be-adjusted column vector of the fused image to obtain the column vector of the fused image includes:

standardizing the column vector to be adjusted of the fused image based on a preset standardization formula to obtain the column vector of the fused image; the preset standardized formula is as follows:

；

wherein the content of the first and second substances,MIX ^* _i for the second in the column vector to be adjusted of the fused imageiThe gray value of the pixel to which the individual element corresponds,MIX _i as the column vector of the fused imageiThe gray value of the pixel corresponding to each element.

In an optional implementation manner of the first aspect, the determining, based on fused images of all the registered image groups under a preset constraint condition, a behavior class of a target object in the registered image groups includes:

importing all the fusion images into a context attention network to obtain dynamic behavior data of the target object in the registered image group; the dynamic behavior data is described by a position change vector between the target object and the environmental object in every two adjacent fusion images;

and importing the dynamic behavior data into a behavior recognition model to obtain the action type of the target object.

In a second aspect, an embodiment of the present application provides a terminal device, including:

a first acquisition unit for acquiring at least one registered image group to be identified; the registered image group comprises a visible light image and an infrared light image which are registered;

the image denoising unit is used for denoising the visible light images in the registered image groups according to each registered image group to obtain denoised images of the visible light images;

the image fusion unit is used for fusing the infrared light image and the de-noising image to obtain a fusion image of the registered image group under a preset constraint condition; under the preset constraint condition, the pixel difference between the fused image and the infrared light image is minimum, and the gradient difference between the fused image and the de-noised image is minimum;

and the behavior identification unit is used for determining the behavior category of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the behavior recognition method according to the first aspect or any one of the options of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the behavior recognition method according to the first aspect or any one of the alternatives of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to execute the method for behavior recognition according to the first aspect or any one of the alternatives of the first aspect.

In a sixth aspect, an embodiment of the present application provides a behavior recognition system, which includes an image pickup device and the terminal device according to the second or third aspect, wherein the image pickup device is connected to the terminal device.

The behavior identification method based on infrared light and visible light fusion, the terminal device, the computer readable storage medium and the computer program product provided by the embodiment of the application have the following beneficial effects:

according to the behavior identification method based on the fusion of the infrared light and the visible light, the denoising image of the visible light image in each registered image group is obtained by denoising the visible light image in each registered image group; fusing the infrared light image in each registered image group with the de-noised image to obtain a fused image of each registered image group under a preset constraint condition; and finally, determining the behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition. Under the preset constraint condition, the pixel difference between the fused image and the infrared light image is minimum, and the gradient difference between the fused image and the de-noised image is minimum, so that the fused image and the infrared light image have similar pixel intensity, and the fused image and the visible light image have similar gradient (namely edge), so that the fused image can simultaneously keep the thermal radiation information of an object in the infrared light image and the appearance information of the object in the visible light image, namely the fused image can be regarded as the infrared light image with detailed scene description, and therefore, the target object is subjected to behavior recognition based on the fused image, and the accuracy of the behavior recognition can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of a behavior recognition system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a behavior recognition method based on fusion of infrared light and visible light according to an embodiment of the present application;

fig. 3 is a flowchart of specific implementation of S22 in a behavior identification method based on fusion of infrared light and visible light according to an embodiment of the present application;

fig. 4 is a flowchart of specific implementation of S23 in a behavior identification method based on fusion of infrared light and visible light according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal device according to another embodiment of the present application.

Detailed Description

It is to be understood that the terminology used in the embodiments of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the application. In the description of the embodiments of the present application, "a plurality" means two or more than two, "at least one", "one or more" means one, two or more than two, unless otherwise specified. The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a definition of "a first" or "a second" feature may explicitly or implicitly include one or more of the features.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The traditional behavior identification method is that an infrared image pickup device collects an infrared image containing thermal radiation information of a target object, or a visible light image containing appearance information of the target object is collected through a visible light image pickup device, and then the behavior of the target object is identified based on the infrared image or the visible light image alone. However, the visible light image is easily affected by the illumination change, so that some texture information is lost, and the infrared light image is less affected by the illumination change but lacks detail information, so that the accuracy of identifying the target object behavior is reduced by identifying the target object behavior based on the infrared light image or the visible light image alone.

In order to solve the technical problem, an embodiment of the present application first provides a behavior recognition system. Please refer to fig. 1, which is a schematic structural diagram of a behavior recognition system according to an embodiment of the present application. As shown in fig. 1, the behavior recognition system may include an infrared camera 11, a visible light camera 12, and a terminal device 13. Wherein, the infrared camera 11 and the visible light camera 12 are both connected with the terminal equipment 13.

The infrared camera 11 is used for collecting infrared light images, and the visible light camera 12 is used for collecting visible light images. In the embodiment of the present application, the main optical axis of the infrared imaging device 11 coincides with the main optical axis of the visible light imaging device 12, and the visual field range of the infrared imaging device 11 is the same as the visual field range of the visible light imaging device 12, that is, the infrared light image acquired by the infrared imaging device 11 at a certain time is the same as the scene corresponding to the visible light image acquired by the visible light imaging device 12 at the certain time.

As shown in fig. 1, when a target object passes through the field of view of the infrared camera 11 and the visible light camera 12, the infrared camera 11 may collect a plurality of temporally consecutive infrared light images 111 containing the target object, and the visible light camera 12 may collect a plurality of temporally consecutive visible light images 121 containing the target object. The infrared light image includes heat radiation information of the target object, and the visible light image includes appearance information (i.e., texture information) of the target object.

Illustratively, the target object may be a living body, e.g., a human or an animal, etc.

The terminal device 13 is configured to acquire the infrared light image sequence acquired by the infrared camera 11 from the infrared camera 11, acquire the visible light image sequence acquired by the visible light camera 12 from the visible light camera 12, and register the infrared light image sequence and the visible light image sequence. The registering of the infrared light image sequence and the visible light image sequence specifically means that the infrared light image and the visible light image having the same scene in the infrared light image sequence and the visible light image sequence are paired, that is, the infrared light image and the visible light image respectively acquired by the infrared camera 11 and the visible light camera 12 at the same time are paired. Based on the method, after the infrared light image sequence and the visible light image sequence are registered, a plurality of registered image groups can be obtained, each registered image group comprises a visible light image and an infrared light image, and the visible light image and the infrared light image in each registered image group have the same scene.

For example, as shown in fig. 1, if the infrared camera 11 captures an infrared light image 1111 at a first time and the visible light camera 12 captures a visible light image 1211 at the first time, the infrared light image 1111 and the visible light image 1211 may form a registered image group.

It will be appreciated that since the infrared light images in the infrared light image sequence are consecutive in time and the visible light images in the visible light image sequence are consecutive in time, the plurality of registered image sets obtained by the terminal device are also consecutive in time.

It can be understood that, after the terminal device 13 obtains a plurality of registered image groups, both the infrared light image and the visible light image in each registered image group may be preprocessed into grayscale images, and the preprocessed registered image group is used as the registered image group to be identified. Because the visible light image and the infrared light image in the registered image group to be identified are both gray level images, the range of the gray level value of each pixel in the visible light image in the registered image group to be identified is 0-255, and similarly, the range of the gray level value of each pixel in the infrared light image in the registered image group to be identified is 0-255.

Optionally, the terminal device 13 may store the registered image group to be recognized in its local memory for behavior recognition of a subsequent target object, that is, the terminal device 13 may also be configured to perform each step in a subsequent method embodiment, please refer to the related description in the method embodiment, which will not be detailed here in detail.

In a specific application, the terminal device 13 may include a smart phone, a tablet computer, a notebook computer, or a desktop computer, and the specific type of the terminal device 13 is not particularly limited in this embodiment.

In a specific application, the connection modes between the infrared camera 11 and the visible light camera 12 and the terminal equipment 13 can be wired connection or wireless connection.

The wired connection may include a wired connection based on a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), or the like. The wireless connection may include a wireless connection based on bluetooth, wireless fidelity (WIFI), or mobile communication technology, among others. By way of example, the mobile communication technology may include, but is not limited to, a fifth generation mobile communication technology (5G for short), a fourth generation mobile communication technology (4G for short), and the like.

Based on the behavior recognition system provided in the foregoing embodiment, an embodiment of the present application further provides a behavior recognition method based on fusion of infrared light and visible light, where an execution subject of the behavior recognition method is the terminal device 13 in the embodiment corresponding to fig. 1. In a specific application, a target script file may be configured to the terminal device 13, and the target script file describes the behavior recognition method based on the fusion of infrared light and visible light provided in the embodiment of the present application, so that the terminal device 13 executes the target script file when the behavior recognition of the target object is required, and further executes each step in the behavior recognition method based on the fusion of infrared light and visible light provided in the embodiment of the present application.

Please refer to fig. 2, which is a schematic flowchart of a behavior recognition method based on infrared light and visible light fusion according to an embodiment of the present application. As shown in FIG. 2, the method may include S21-S24, which are detailed as follows:

s21: at least one registered image set to be identified is acquired.

In an embodiment of the present application, the terminal device may obtain at least one registered image group to be identified from its local memory. Each registered image set includes a visible light image and an infrared light image that have been registered. Illustratively, the resolution of the visible light image and the resolution of the infrared light image may each be m × n, i.e., the visible light image and the infrared light image may each include m rows × n columns of pixels.

S22: and for each registered image group, carrying out denoising treatment on the visible light image in the registered image group to obtain a denoising image of the visible light image.

In a possible implementation manner, when denoising the visible light image in each registered image group, in order to avoid losing gradient information (i.e., texture information) of the visible light image, the terminal device may obtain a denoised image of the visible light image by using S221 to S224 shown in fig. 3.

S221: and determining the horizontal second-order gradient and the vertical second-order gradient of each pixel in the visible light image by adopting a first gradient function based on the gray value of each pixel in the visible light image.

In particular, the first gradient function may be:

wherein, the first and the second end of the pipe are connected with each other,VIS _{h i()} is the first in visible light imageiThe horizontal second order gradient of an individual pixel,VIS _{v i()} is the first in visible light imageiThe vertical second order gradient of a pixel,vis _i is the first in visible light imageiThe gray-scale value of each pixel,vis _r(i) to be located atiRight side of each pixel andithe gray values of the pixels adjacent to the individual pixels,vis _l(i) to be located atiLeft side of the pixel and the firstiThe gray values of the pixels adjacent to the individual pixels,vis _b(i) to be located atiUnder each pixel and with the firstiThe gray values of the pixels adjacent to the individual pixels,vis _o(i) to be located atiAbove the pixel and with the secondiThe gray values of the pixels adjacent to the individual pixels.

It should be noted that, for each pixel in line 1 in the visible light image,vis _o(i) =vis _i (ii) a For each pixel in the last row in the visible image,vis _l(i) =vis _i (ii) a For each pixel in column 1 in the visible image,vis _l(i) =vis _i (ii) a For each pixel in the last column of the visible image,vis _r(i) =vis _i 。

it is understood that the order of the pixels in the visible light image is obtained by sequencing the pixels in the visible light image from left to right and from top to bottom. For example, if the visible light image includes 3 × 3 pixels, the 3 pixels in the 1 st row are the 1 st pixel, the 2 nd pixel and the 3 rd pixel of the visible light image in sequence from left to right, the 3 pixels in the 2 nd row are the 4th pixel, the 5th pixel and the 6 th pixel of the visible light image in sequence from left to right, and the 3 pixels in the 3 rd row are the 7 th pixel, the 8 th pixel and the 9 th pixel of the visible light image in sequence from left to right.

For example, taking the 5th pixel (i.e., the 2 nd row and 2 nd column pixel) in the visible light image as an example, the pixel located at the right side of the 5th pixel and adjacent to the 5th pixel is the 6 th pixel (i.e., the 2 nd row and 2 nd column pixel) in the visible light image, the pixel located at the left side of the 5th pixel and adjacent to the 5th pixel is the 4th pixel (i.e., the 2 nd row and 1 st column pixel) in the visible light image, the pixel located below the 5th pixel and adjacent to the 5th pixel is the 8 th pixel (i.e., the 3 rd row and 2 nd column pixel) in the visible light image, and the pixel located above the 5th pixel and adjacent to the 5th pixel is the 2 nd pixel (i.e., the 1 st row and 2 nd column pixel) in the visible light image.

S222: and performing quadratic operation on the sum of the horizontal second-order gradient and the vertical second-order gradient of each pixel in the visible light image to obtain the comprehensive gradient of the pixel.

In this implementation manner, for each pixel in the visible light image, the terminal device may calculate a sum of a horizontal second-order gradient and a vertical second-order gradient of the pixel, and perform quadratic operation on the sum of the horizontal second-order gradient and the vertical second-order gradient of the pixel to obtain a comprehensive gradient of the pixel.

S223: and determining the sum of the comprehensive gradients of all pixels in the visible light image as a denoising adjustment factor.

After the terminal device obtains the comprehensive gradient of each pixel in the visible light image, the sum of the comprehensive gradients of all the pixels in the visible light image can be calculated, and the sum of the comprehensive gradients of all the pixels in the visible light image is determined as a denoising adjustment factor.

The denoising adjustment factor is used for adjusting the gray value of each pixel in the visible light image.

In this implementation, sinceVIS _{h i()} May embodyiThe second order gradient of the individual pixels in the horizontal direction,VIS _{v i()} may embodyiThe second-order gradient of each pixel in the vertical direction can enable the visible light image to retain more gradient information relative to the first-order gradient, and therefore, the denoising adjustment factor obtained based on S221-S223 adjusts the gray value of each pixel in the visible light image, and the gradient of the visible light image can be well retained while denoising the visible light imageDegree information.

S224: and determining the column vector of the de-noised image by adopting a preset de-noising function based on the column vector of the visible light image, the de-noising adjustment factor and a preset regularization weight.

It is understood that the column vector of the visible light image may be obtained by arranging the respective pixels in the visible light image into a column in order from small to large.

The value of the preset regularization weight may be set according to actual requirements, and is not particularly limited herein.

Specifically, the preset denoising function may be:

DeN=Vis+λ*DeN _vis ；

wherein the content of the first and second substances,DeNto be the column vector of the de-noised image,Visis a column vector of the visible light image, lambda is a preset regularization weight,DeN _vis and adjusting the factor for the denoising.

It should be noted that, in the following description,DeN∈R ^mn×1 ，Vis∈R ^mn×1 。

it can be understood that, after the terminal device obtains the column vector of the denoised image, the column vector of the denoised image can be restored to the denoised image with the resolution of m × n.

In another possible implementation manner, the terminal device may further perform denoising processing on the visible light images in each registered image group by using a median filter or a mean filter, and the like, so as to obtain a denoised image of the visible light images in each registered image group.

S23: and fusing the infrared light image and the de-noised image to obtain a fused image of the registered image group under a preset constraint condition.

It should be noted that, under the preset constraint condition, the pixel difference between the fused image and the infrared light image is the minimum, and the gradient difference between the fused image and the denoised image is the minimum. That is, by means of the constraint of the preset constraint condition, the gradient information of the visible light image can be transferred to the corresponding position of the infrared light image, so that the fused image and the infrared light image have similar pixel intensity, and the fused image and the visible light image have similar gradient, and thus, the fused image can simultaneously retain the heat radiation information of the object in the infrared light image and the appearance information of the object in the visible light image.

In one possible implementation, the preset constraint condition may be described by a preset constraint function. Based on this, S23 can be specifically realized by S231 to S232 shown in fig. 4, which are detailed as follows:

s231: and determining the column vector to be adjusted of the fused image by adopting a preset constraint function based on the column vector of the infrared light image and the column vector of the de-noised image.

It is understood that the column vector of the infrared light image may be obtained by arranging the respective pixels in the infrared light image in order of small to large as a column.

Specifically, the preset constraint function may be:

；

wherein the content of the first and second substances,MIX ^* in order to fuse the column vectors of the image to be adjusted,InFis a column vector of the infrared light image +MIX ^* In order to fuse the gradient vectors of the image,DeN ^* to de-noize the column vector of the image, | luminanceMIX ^* -InF|| ₂ For representingMIX ^* -InFL2 norm, | vMIX ^* - DeN ^* || ₁ For representing +MIX ^* - DeN ^* Is the preset regularization weight, λ is the L1 norm of.

It should be noted that the number of elements included in the gradient vector of the fused image is the same as the number of pixels included in the visible light image or the infrared light image, and each element in the gradient vector of the fused image may correspond to one pixel in the visible light image or the infrared light image.

Alternatively, the value of each element in the gradient vector of the fused image may be determined by the following formula:

；

wherein the content of the first and second substances,MIX ^* _i for fusing the second in the column vector to be adjusted of the imageiValue of an element, i.e. ofiThe gray value of the pixel corresponding to each element;MIX ^* _{r i()} in the column vector to be adjusted for fusing the images, is located at the secondiRight side of pixel corresponding to each element and corresponding to the secondiThe gray value of the pixel adjacent to the pixel corresponding to the element;MIX ^* _{l i()} in the column vector to be adjusted for fusing the images, is located at the secondiLeft side of pixel corresponding to each element and corresponding to the secondiThe gray value of the pixel adjacent to the pixel corresponding to the element;MIX ^* _{b i()} in the column vector to be adjusted for fusing the images, is located at the secondiUnder the pixel corresponding to each element and corresponding to the secondiThe gray value of the pixel adjacent to the pixel corresponding to the element;MIX ^* _{o i()} in the column vector to be adjusted for fusing the images, is located at the secondiAbove the pixel corresponding to each element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel corresponding to the element.

S232: and carrying out standardization processing on the column vector to be adjusted of the fusion image to obtain the column vector of the fusion image.

Because the values of each element in the column vector to be adjusted of the fused image obtained by the preset constraint function are not necessarily between 0 and 255, the vector to be adjusted of the fused image needs to be standardized, so that the values of all the elements in the finally obtained column vector of the fused image are within 0 to 255.

In a possible implementation manner, the terminal device may perform standardization processing on the to-be-adjusted column vector of the to-be-fused image based on a preset standardization formula to obtain the column vector of the to-be-fused image.

Specifically, the preset standardized formula is as follows:

；

wherein the content of the first and second substances,MIX ^* _i for fusing the second in the column vector to be adjusted of the imageiThe gray value of the pixel to which the individual element corresponds,MIX _i as the column vector of the fused imageiThe gray value of the pixel corresponding to each element.

It is understood that, after the terminal device obtains the column vector of the fused image, the column vector of the fused image may be restored to the fused image with the resolution of m × n.

By performing the above steps of S21-S23, for each registered image group, a fused image corresponding to the registered image group can be obtained, that is, a fused image of each registered image group under a preset constraint condition can be obtained.

S24: and determining the behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition.

Since the plurality of registered image sets are temporally continuous, the fused images corresponding to each of the plurality of registered image sets are also temporally continuous.

Optionally, the terminal device may be configured with a contextual attention network and a behavior recognition model.

Wherein the contextual attention network is configured to determine a change in relative position between the target object and the environmental object in the registered set of images, thereby obtaining dynamic behavior data of the target object. That is, the dynamic behavior data of the target object may be described by the position change vector between the target object and the environmental object in each of the two adjacent fused images.

The behavior recognition model is used for determining the action type of the target object based on the dynamic behavior data of the target object. In a specific application, the behavior recognition model may be obtained by training a classification model based on a preset sample set by using a deep learning method. For example, the preset sample set may include a plurality of sample data, each of which may include dynamic behavior data and an action type of one sample object. When the classification model is trained, the dynamic behavior data of the sample object in each piece of sample data can be used as the input of the classification model, and the action type of the sample object in each piece of sample data can be used as the output of the classification model, so that the classification model learns the corresponding relation between the dynamic behavior data and the action type in the training process. The terminal device may determine the classification model trained by using the sample data as a behavior recognition model.

Based on this, S24 may specifically include the following steps:

step a, importing all the fusion images into a context attention network to obtain dynamic behavior data of the target object in the registered image group.

And b, importing the dynamic behavior data into a behavior recognition model to obtain the action type of the target object.

In the context awareness network, the terminal device performs object recognition, key point recognition, human body recognition, and the like on each fused image. The environmental object in the fusion image can be identified through object identification, the target object in the fusion image can be identified through human body identification, the action change of the target object can be identified through key point identification, and finally, the context attention can be paid through a convolutional neural network, so that the dynamic behavior data of the target object can be obtained.

As can be seen from the above, the behavior identification method based on infrared light and visible light fusion provided in this embodiment obtains the denoised image of the visible light image in each registered image group by performing denoising processing on the visible light image in each registered image group; fusing the infrared light image in each registered image group with the de-noised image to obtain a fused image of each registered image group under a preset constraint condition; and finally, determining the behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition. Under the preset constraint condition, the pixel difference between the fused image and the infrared light image is minimum, and the gradient difference between the fused image and the de-noised image is minimum, so that the fused image and the infrared light image have similar pixel intensity, and the fused image and the visible light image have similar gradient (namely edge), so that the fused image can simultaneously keep the thermal radiation information of an object in the infrared light image and the appearance information of the object in the visible light image, namely the fused image can be regarded as the infrared light image with detailed scene description, and therefore, the behavior recognition accuracy of a target object can be improved based on the fused image.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Based on the behavior identification method based on the fusion of the infrared light and the visible light provided by the embodiment, the embodiment of the application further provides the embodiment of the terminal device for realizing the embodiment of the method. Please refer to fig. 5, which is a schematic structural diagram of a terminal device according to an embodiment of the present application. For convenience of explanation, only the portions related to the present embodiment are shown. As shown in fig. 5, the terminal device 50 may include: a first acquisition unit 51, an image denoising unit 52, an image fusion unit 53, and a behavior recognition unit 54. Wherein:

the first acquisition unit 51 is configured to acquire at least one registered image group to be identified; the registered image group includes a visible light image and an infrared light image which are registered.

The image denoising unit 52 is configured to perform denoising processing on the visible light image in the registered image group for each registered image group, so as to obtain a denoised image of the visible light image.

The image fusion unit 53 is configured to fuse the infrared light image and the denoised image to obtain a fusion image of the registered image group under a preset constraint condition; under the preset constraint condition, the pixel difference between the fused image and the infrared light image is minimum, and the gradient difference between the fused image and the de-noised image is minimum.

The behavior recognition unit 54 is configured to determine a behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition.

Optionally, the image denoising unit 52 may include a first determining unit, a first calculating unit, a second determining unit, and a third determining unit. Wherein:

the first determining unit is used for determining a horizontal second-order gradient and a vertical second-order gradient of each pixel in the visible light image by adopting a first gradient function based on the gray value of each pixel in the visible light image; the first gradient function is:

wherein the content of the first and second substances,VIS _{h i()} is the first in the visible light imageiThe horizontal second order gradient of an individual pixel,VIS _{v i()} is the first in the visible light imageiThe vertical second order gradient of a pixel,vis _i is the first in the visible light imageiThe gray-scale value of each pixel,vis _r(i) to be located at the secondiRight side of each pixel and the firstiThe gray values of the pixels adjacent to the individual pixels,vis _l(i) to be located at the secondiLeft side of each pixel and the second sideiThe gray values of the pixels adjacent to the individual pixels,vis _b(i) to be located at the secondiUnder each pixel and with the firstiThe gray values of the pixels adjacent to the individual pixels,vis _o(i) to be located at the secondiAbove each pixel and with the secondiThe gray values of pixels adjacent to the individual pixels.

The first calculation unit is used for performing quadratic operation on the sum of the horizontal second-order gradient and the vertical second-order gradient of each pixel in the visible light image to obtain the comprehensive gradient of the pixel.

The second determining unit is used for determining the sum of the comprehensive gradients of all pixels in the visible light image as a denoising adjustment factor.

The third determining unit is used for determining the column vector of the de-noised image by adopting a preset de-noising function based on the column vector of the visible light image, the de-noising adjustment factor and a preset regularization weight; the preset denoising function is as follows:

DeN=Vis+λ*DeN _vis ；

wherein the content of the first and second substances,DeNis a column vector of the de-noised image,Visis a column vector of the visible light image, λ is the preset regularization weight,DeN _vis and adjusting the factor for denoising.

Alternatively, the image fusion unit 53 may include a fourth determination unit and a normalization unit. Wherein:

the third determining unit is used for determining the column vector to be adjusted of the fusion image by adopting a preset constraint function based on the column vector of the infrared light image and the column vector of the de-noised image; the preset constraint function is as follows:

；

wherein, the first and the second end of the pipe are connected with each other,MIX ^* for the column vector to be adjusted of the fused image,InFis a column vector of the infrared light imageMIX ^* Is the gradient vector of the fused image,DeN ^* calculating the (| luminance) of the de-noised imageMIX ^* -InF|| ₂ For representingMIX ^* -InFL2 norm, | vMIX ^* - DeN ^* || ₁ For representing +MIX ^* - DeN ^* The norm of L1, λ is a preset regularization weight;

；

wherein the content of the first and second substances,MIX ^* _i for the second in the column vector to be adjusted of the fused imageiThe gray value of the pixel to which the individual element corresponds,MIX ^* _{r i()} to be located at the secondiPixel right side corresponding to each element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{l i()} to be located at the secondiLeft side of pixel corresponding to each element and corresponding to the secondiThe gray values of pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{b i()} to be located at the secondiPixel under the corresponding element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{o i()} to be located at the secondiPixel above the pixel corresponding to each element and corresponding to the second elementiThe gray values of the pixels adjacent to the pixel corresponding to the element.

The standardization unit is used for carrying out standardization processing on the column vector to be adjusted of the fusion image to obtain the column vector of the fusion image.

Optionally, the normalization unit is specifically configured to:

；

Alternatively, the behavior recognizing unit may include a dynamic behavior determining unit and an action type determining unit. Wherein:

the dynamic behavior determining unit is used for importing all the fusion images into a context attention network to obtain dynamic behavior data of the target object in the registered image group; the dynamic behavior data is described by a position change vector between the target object and the environmental object in every two adjacent fusion images.

And the action type determining unit is used for importing the dynamic behavior data into a behavior recognition model to obtain the action type of the target object.

It should be noted that, for the information interaction, the execution process, and other contents between the above units, the specific functions and the technical effects brought by the method embodiments of the present application are based on the same concept, and specific reference may be made to the method embodiment part, which is not described herein again.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of each functional unit is merely illustrated, and in practical applications, the foregoing function distribution may be performed by different functional units according to needs, that is, the internal structure of the terminal device is divided into different functional units to perform all or part of the above-described functions. Each functional unit in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application. The specific working process of the units in the system may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present disclosure. As shown in fig. 6, the terminal device 6 provided in this embodiment may include: a processor 60, a memory 61 and a computer program 62 stored in the memory 61 and executable on the processor 60, for example a program corresponding to a behavior recognition method based on fusion of infrared light and visible light. The processor 60 executes the computer program 62 to implement the steps of the above-mentioned behavior recognition method embodiment based on the fusion of infrared light and visible light, such as S21-S24 shown in FIG. 2. Alternatively, the processor 60, when executing the computer program 62, implements the functions of each module/unit in the terminal device embodiments described above, such as the functions of the units 51 to 54 shown in fig. 5.

Illustratively, the computer program 62 may be divided into one or more modules/units, which are stored in the memory 61 and executed by the processor 60 to accomplish the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 62 in the terminal device 6. For example, the computer program 62 may be divided into a first obtaining unit, an image denoising unit, an image fusion unit, and a behavior recognition unit, and specific functions of each unit refer to the related description in the embodiment corresponding to fig. 5, which is not described herein again.

Those skilled in the art will appreciate that fig. 6 is merely an example of a terminal device 6 and does not constitute a limitation of terminal device 6, and may include more or less components than those shown, or some components in combination, or different components.

The processor 60 may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, or a flash memory card (flash card) provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit of the terminal device 6 and an external storage device. The memory 61 is used for storing computer programs and other programs and data required by the terminal device. The memory 61 may also be used to temporarily store data that has been output or is to be output.

The embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.

Embodiments of the present application provide a computer program product, which, when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments.

In the above embodiments, the description of each embodiment has its own emphasis, and parts that are not described or illustrated in a certain embodiment may refer to the description of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A behavior identification method based on infrared light and visible light fusion is characterized by comprising the following steps:

acquiring at least one registered image group to be identified; the registered image group comprises a registered visible light image and an infrared light image;

and determining the behavior category of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition.

2. The behavior recognition method according to claim 1, wherein the denoising the visible light image in the registered image group to obtain a denoised image of the visible light image comprises:

wherein the content of the first and second substances,VIS _{h i()} is the first in the visible light imageiThe horizontal second order gradient of an individual pixel,VIS _{v i()} is the first in the visible light imageiThe vertical second order gradient of a pixel,vis _i is the first in the visible light imageiThe gray-scale value of each pixel,vis _r(i) to be located at the secondiRight side of each pixel and the firstiThe gray values of the pixels adjacent to the individual pixels,vis _l(i) to be located at the secondiLeft side of each pixel and the second sideiThe gray values of the pixels adjacent to the individual pixels,vis _b(i) to be located at the secondiUnder each pixel and with the firstiThe gray values of the pixels adjacent to the individual pixels,vis _o(i) to be located at the secondiAbove each pixel and with the secondiThe gray values of the pixels adjacent to each pixel;

DeN=Vis+λ*DeN _vis ；

3. The behavior recognition method according to claim 1, wherein the fusing the infrared light image and the denoised image to obtain a fused image of the registered image group under a preset constraint condition comprises:

；

wherein the content of the first and second substances,MIX ^* for the column vector to be adjusted of the fused image,InFis a column vector of the infrared light image +MIX ^* Is the gradient vector of the fused image,DeN ^* calculating the (| luminance) of the de-noised imageMIX ^* -InF|| ₂ For representingMIX ^* -InFL2 norm, | vMIX ^* - DeN ^* || ₁ For representing +MIX ^* - DeN ^* The norm of L1, λ is a preset regularization weight;

；

wherein the content of the first and second substances,MIX ^* _i for the first in the column vector to be adjusted of the fused imageiThe gray value of the pixel to which the individual element corresponds,MIX ^* _{r i()} to be located at the secondiPixel right side corresponding to each element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{l i()} to be located at the secondiLeft side of pixel corresponding to each element and corresponding to the secondiThe gray values of pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{b i()} to be located at the secondiPixel under the corresponding element and corresponding to the secondiThe gray values of the pixels adjacent to the pixel to which the respective element corresponds,MIX ^* _{o i()} to be located at the secondiPixel above the pixel corresponding to each element and corresponding to the second elementiThe gray value of the pixel adjacent to the pixel corresponding to the element;

4. The behavior recognition method according to claim 3, wherein the normalizing the to-be-adjusted column vector of the fused image to obtain the column vector of the fused image includes:

；

5. The behavior recognition method according to claim 4, wherein the determining the behavior class of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition comprises:

6. A terminal device, comprising:

and the behavior recognition unit is used for determining the behavior category of the target object in the registered image group based on the fused images of all the registered image groups under the preset constraint condition.

7. A terminal device, characterized in that it comprises a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the behavior recognition method according to any one of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method for behavior recognition according to any one of claims 1 to 5.

9. A behavior recognition system comprising an infrared camera, a visible light camera, and a terminal device according to claim 6 or 7, the infrared camera and the visible light camera being connected to the terminal device.