CN113902749A

CN113902749A - Image processing method and device, computer equipment and storage medium

Info

Publication number: CN113902749A
Application number: CN202111165670.5A
Authority: CN
Inventors: 张卿麒; 张彬; 吴阳平; 许亮
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-07

Abstract

The present disclosure provides an image processing method, apparatus, computer device, and storage medium, wherein the method comprises: acquiring an image of a target object as a first image; performing size adjustment on the first image based on size information of the first image and a preset size conversion parameter to obtain a second image, wherein the size conversion parameter comprises a width coefficient and a height coefficient, the width of the second image is a first integral multiple of the width coefficient, and the height of the second image is a second integral multiple of the height coefficient; dividing the second image into a plurality of grid images, wherein the widths and the heights of the grid images are respectively a first integer and a second integer; obtaining a processing result of the first image by executing preset conversion processing on each grid image; attribute information and/or behavior information of the target object is determined based on the processing result of the first image. The embodiment of the disclosure can improve the flexibility of adjusting the size information of the image.

Description

Image processing method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing and model training technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

In the vision technology based on the deep learning, a large number of images are processed based on a deep learning model, images acquired by different image acquisition devices have large differences, sizes and positions of target objects contained in the images are random, and the deep learning model generally needs to process the whole image or a target object region extracted from the image after the size of the target object region is uniformized.

In the prior art, a common method for unifying sizes is to preset a target size, for example, in a deep learning task related to a face image, when the image size of an acquired face sample image is smaller than the target size, pixel points of the face sample image are increased by an interpolation method so that the image size reaches the target size; and under the condition that the image size is larger than the target size, deleting part of pixel points in a down-sampling mode so as to enable the image size to reach the target size. However, these approaches may introduce more interference information and may also prune more important useful information.

Disclosure of Invention

The embodiment of the disclosure at least provides an image processing method, an image processing device, computer equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including:

acquiring an image of a target object as a first image;

performing size adjustment on the first image based on size information of the first image and a preset size conversion parameter to obtain a second image, wherein the size conversion parameter comprises a width coefficient and a height coefficient, the width of the second image is a first integral multiple of the width coefficient, and the height of the second image is a second integral multiple of the height coefficient;

segmenting the second image into a plurality of grid images, the width and height of the grid images being the first integer and the second integer, respectively;

obtaining a processing result of the first image by performing preset conversion processing on each of the mesh images;

determining attribute information and/or behavior information of the target object based on a processing result of the first image.

In this embodiment, different size information may correspond to different numbers of filling pixels when adjusted by using the preset size conversion parameter information, for example, when the number of pixels corresponding to the size information is 497, the number of filling pixels is 3, and when the number of pixels corresponding to the size information is 591, the number of filling pixels is 9. The size information of the first image is adjusted through the preset size conversion parameters, so that the size information can be adjusted to the size corresponding to the preset size conversion parameters on the basis of filling a small number of pixel points, the flexibility of size information adjustment is improved, and the useful characteristics of the first image can be effectively reserved. Furthermore, the accuracy of subsequently determining the attribute information and/or the behavior information of the target object in the first image is improved; and then, by carrying out segmentation processing on the second image, the number of the grid images in the height direction of the image obtained after the first image is segmented can be the same as the height coefficient, the number of the grid images in the width direction of the image can be the same as the width coefficient, and the second images with different size information can be divided into a plurality of grid images matched with the height coefficient and the width coefficient.

In a possible embodiment, the obtaining of the processing result of the first image by performing a preset conversion process on each of the mesh images includes:

fusing pixels in each grid image into a pixel point;

combining pixel points obtained after the pixels of each grid image in the plurality of grid images are fused into a row to obtain a third image containing a row of pixels, and using the third image as a processing result of the first image.

This embodiment, through fusing the pixel in the grid image into a pixel, make the pixel after fusing can characterize all pixels in the grid image, that is, the pixel after fusing is as the minimum unit of the grid image that its corresponds of characterization, and then through make up into one with the pixel after fusing that each grid image corresponds, can realize converting the second image into the third image with the product assorted of height coefficient and width coefficient, that is, the third image is the image of a product of height coefficient and width coefficient 1, thereby, can realize all converting the first image of different size information into the third image with the product assorted of height coefficient and width coefficient, realized that the size of the third image that obtains after the conversion is unified. Meanwhile, the pixels of the grid image are fused, so that the size of the image is effectively reduced, and resources occupied by image storage and subsequent processing are reduced.

In a possible embodiment, the fusing the pixels in each of the mesh images into one pixel point includes:

and carrying out weighted summation on all pixel points in the grid image by adopting a preset weight value.

This embodiment fuses all pixel points in the grid image according to the weight value that predetermines, can fuse the pixel information of all pixel points in the grid image into the corresponding pixel information of a pixel point, can enough guarantee that the pixel information of the pixel point in every grid image can not lose, can also realize all pixel points in every grid image and all fuse as a pixel point that this grid image corresponds.

In one possible embodiment, the method further comprises: and splicing the processing results of the first images of the frames of the preset number along the direction of pixel rows to obtain spliced images corresponding to the first images of the frames of the preset number.

According to the embodiment, the processing results corresponding to the first images of the preset number of frames are spliced, normalization processing of the plurality of processing results can be achieved, the spliced image containing time information is obtained, for example, the spliced image containing continuous face information is obtained, and therefore the accuracy of the attribute information and/or behavior information of the determined target object is improved.

In one possible embodiment, the acquiring the image of the target object as the first image includes:

detecting the target object in each image frame in a video stream;

adding the image frame in which the target object is detected to an image sequence of the target object;

and extracting a preset number of image frames from the image sequence by adopting a sliding window, taking each image frame in the extracted image frames as a frame of the first image, wherein the width of the sliding window is equal to the preset number.

According to the embodiment, the image frames including the target object can be accurately detected by detecting each image frame in the video stream, and further, each image frame in the obtained image sequence can be ensured to include the target object.

In a possible embodiment, the sliding step of the sliding window is smaller than the width of the sliding window.

According to the embodiment, under the condition that the sliding step length of the sliding window is smaller than the width of the sliding window, each frame image frame including the target object can be contained in the image sequence obtained by sliding according to the sliding step length, omission of the image frame including the target object in the video stream is avoided, and therefore the rationality of the obtained image sequence is improved.

In a possible embodiment, the resizing the first image based on the size information of the first image and a preset size conversion parameter includes:

expanding the width and the height of the first image to a first integral multiple of the width coefficient and a second integral multiple of the height coefficient respectively through pixel filling, wherein the first integral multiple of the width coefficient is larger than the width of the first image and the difference between the width of the first image and the width of the first image is smaller than the width coefficient, and the first integral multiple of the height coefficient is larger than the height of the first image and the difference between the height of the first image and the height of the first image is smaller than the height coefficient.

Thus, based on the determined first integral multiple and the second integral multiple, the first image with different size information can be realized, the flexibility of adjusting the image size information can be improved corresponding to different height and width after filling, and the first integral multiple of the width coefficient is larger than the width of the first image and the difference value between the width of the first image and the width of the first image is smaller than the width coefficient, so that the number of pixel points filled in the width direction can be ensured to be the minimum value matched with the preset size conversion parameter, and similarly, the number of pixel points filled in the height direction can also be ensured to be the minimum value matched with the preset size conversion parameter, thereby realizing that the number of the pixel points filled corresponding to the first image is the minimum, effectively reducing the number of the pixel points needing to be filled, and avoiding the damage to the image characteristics of the first image, for example, the damage to the image characteristics corresponding to the association information between the face and the heartbeat frequency in the first image is avoided.

In a possible embodiment, the method further comprises the step of determining the preset weight value:

and determining a preset weight value corresponding to each grid image based on the number of the grid images.

In this embodiment, the preset weight value is determined based on the number of the mesh images, and flexibility in setting the preset weight value of the first image of different size information is improved.

storing each grid image to a target computing device;

and fusing pixels in each grid image in parallel by using the target operation equipment based on the preset weight value corresponding to each grid image to obtain fused pixel points corresponding to each grid image.

According to the embodiment, each grid image is stored in the target operation device, the pixels of the plurality of grid images in parallel can be fused by the target operation device, and the speed and the efficiency of determining the fused pixels corresponding to each grid image are improved.

In a possible implementation manner, after obtaining a stitched image corresponding to a preset number of frames of the first image, the method further includes:

and using the target number of spliced images as a set of training data for training a model for attribute analysis and/or behavior analysis of the target object.

According to the embodiment, the target number of spliced images is used as a group of training data, so that the integrity of the training data is improved, and the prediction accuracy of the trained model is improved.

In a second aspect, an embodiment of the present disclosure further provides an image processing apparatus, including:

the acquisition module is used for acquiring an image of a target object as a first image;

the adjusting module is used for adjusting the size of the first image based on the size information of the first image and a preset size conversion parameter to obtain a second image, wherein the size conversion parameter comprises a width coefficient and a height coefficient, the width of the second image is a first integral multiple of the width coefficient, and the height of the second image is a second integral multiple of the height coefficient;

a dividing module, configured to divide the second image into a plurality of grid images, where widths and heights of the grid images are the first integer and the second integer, respectively;

the processing module is used for obtaining a processing result of the first image by executing preset conversion processing on each grid image;

a first determination module for determining attribute information and/or behavior information of the target object based on a processing result of the first image.

In a possible implementation manner, the processing module is configured to fuse pixels in each of the grid images into one pixel point;

In a possible implementation manner, the processing module is configured to perform weighted summation on all pixel points in the grid image by using a preset weight value.

In a possible embodiment, the apparatus further comprises:

and the splicing module is used for splicing the processing results of the first images of the frames with the preset number along the direction of pixel rows to obtain spliced images corresponding to the first images of the frames with the preset number.

In a possible implementation, the obtaining module is configured to detect the target object in each image frame in a video stream;

In a possible implementation, the adjusting module is configured to expand the width and the height of the first image to a first integer multiple of the width coefficient and a second integer multiple of the height coefficient by pixel filling, respectively, where the first integer multiple of the width coefficient is greater than the width of the first image and a difference between the width of the first image and the width of the first image is smaller than the width coefficient, and the first integer multiple of the height coefficient is greater than the height of the first image and a difference between the height of the first image and the height of the first image is smaller than the height coefficient.

In a possible embodiment, the apparatus further comprises:

a second determining module, configured to determine the preset weight value according to the following steps:

In a possible implementation, the processing module is configured to store each of the mesh images to a target computing device;

In a possible implementation manner, the stitching module is further configured to, after a preset number of frames of stitched images corresponding to the first image are obtained, use the stitched images of the target number as a set of training data for training a model for attribute analysis and/or behavior analysis of the target object.

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the image processing apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the image processing method, which is not repeated here.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of an image processing method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating a mesh image corresponding to a second image provided by an embodiment of the disclosure;

fig. 3 is a schematic diagram illustrating a combination of pixels obtained by fusing pixels of each mesh image into a column according to an embodiment of the present disclosure;

fig. 4 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Furthermore, the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.

Reference herein to "a plurality or a number" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

In the vision technology based on the deep learning, images are generally processed based on a deep learning model, but images acquired by different image acquisition devices have large difference, the size and the position of a target object contained in the images have randomness, and the deep learning model generally needs to process after homogenizing the size of the whole image or a target object region extracted from the image. Therefore, the human face sample images need to be unified in size and then used for training the model. In the prior art, a commonly used method for unifying sizes is to preset a target size, and after a face sample image is obtained, under the condition that the image size of the face sample image is smaller than the target size, pixel points of the face sample image are increased by an interpolation method so as to enable the image size to reach the target size; and under the condition that the image size is larger than the target size, deleting part of pixel points in a down-sampling mode so as to enable the image size to reach the target size.

However, the interpolation mode will increase too much useless background information, the down-sampling mode may delete part of important pixel points, and both the two modes will destroy part of the associated information between the face and the heartbeat frequency, thereby reducing the prediction accuracy of the trained model.

Based on the above research, the present disclosure provides an image processing method, an image processing apparatus, a computer device, and a storage medium, where different size information may correspond to different numbers of filling pixels when adjusted by using preset size conversion parameter information, for example, when the number of pixels corresponding to the size information is 497, the number of filling pixels is 3, and when the number of pixels corresponding to the size information is 591, the number of filling pixels is 9. The size information of the first image is adjusted through the preset size conversion parameter, so that the size information can be adjusted to the size corresponding to the preset size conversion parameter on the basis of filling a small number of pixel points, the flexibility of size information adjustment is improved, the image characteristics of the first image can be prevented from being damaged, and the damage to the characteristics corresponding to the incidence relation between the face and the heartbeat frequency in the first image can be avoided. Furthermore, the accuracy of subsequently determining the attribute information and/or the behavior information of the target object in the first image is improved; and then, by carrying out segmentation processing on the second image, the number of the grid images in the height direction of the image obtained after the first image is segmented can be the same as the height coefficient, the number of the grid images in the width direction of the image can be the same as the width coefficient, and the second images with different size information can be divided into a plurality of grid images matched with the height coefficient and the width coefficient.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

It should be noted that specific terms mentioned in the embodiments of the present disclosure include:

OpenCV: the software library is a cross-platform computer vision and machine learning software library issued based on BSD license (open source), and can run on Linux, Windows, Android and Mac OS operating systems. The method is light and efficient, is composed of a series of C functions and a small number of C + + classes, provides interfaces of languages such as Python, Ruby, MATLAB and the like, and realizes a plurality of general algorithms in the aspects of image processing and computer vision.

GPU: graphics Processing Unit, Graphics processor, also called display core, visual processor, display chip, is a special microprocessor for image and Graphics related operations on personal computers, workstations, game machines and some mobile devices (such as tablet computers, smart phones, etc.).

CUDA: computer Unified Device Architecture is an operating platform offered by the video card vendor NVIDIA (england).

To facilitate understanding of the present embodiment, first, an image processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the image processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and in some possible implementations, the image processing method may be implemented by a processor calling a computer readable instruction stored in a memory.

The following describes an image processing method provided by an embodiment of the present disclosure, taking an execution subject as a computer device as an example.

As shown in fig. 1, a flowchart of an image processing method provided in an embodiment of the present disclosure may include the following steps:

s101: an image of a target object is acquired as a first image.

Here, the target object is included in the first image, and the target object may include a target person, for example, a driver of the vehicle; the preset part corresponding to the target person can also be taken as the face, head and the like of the target person; target objects, such as vehicles, lane markings, etc., may also be included.

The first image may be a separately captured image including the target object, or may be an image frame selected from a captured video stream including the target object, and the size information of different first images is different, for example, the first image 1 may be 365 × 441 (width × height), and the first image 2 may be 498 × 697.

For example, the first image may be a target face image that is not uniform in size and is used for training the model, and the target face image includes a face area of the vehicle driver.

S102: and carrying out size adjustment on the first image based on the size information of the first image and a preset size conversion parameter to obtain a second image, wherein the size conversion parameter comprises a width coefficient and a height coefficient, the width of the second image is a first integral multiple of the width coefficient, and the height of the second image is a second integral multiple of the height coefficient.

Here, the preset size conversion parameter is a parameter for resizing the first image, and the preset size conversion parameter may include a height coefficient for adjusting an image height of the first image and a width coefficient for adjusting an image width of the first image.

In this step, after the first image is acquired, size information of the first image may be determined, wherein the size information may include a height and a width of the first image. Then, the height of the first image may be adjusted by using the height coefficient, for example, filling a part of pixel points in the image height direction, and adjusting the height of the first image to a height matching the height coefficient, specifically, the height of the first image may be adjusted to a first integral multiple of the height coefficient; the width of the first image may be adjusted by using the width coefficient, for example, filling partial pixel points in the image width direction, and also adjusting the width of the first image to a width matching the width coefficient, specifically, the width of the first image may be adjusted to a second integral multiple of the width coefficient. Furthermore, the first image may be adjusted to a second image, that is, a second image having a width that is a first integer multiple of the width coefficient and a height that is a second integer multiple of the height coefficient may be obtained. For example, if the first image is a 498 × 697 image, the width coefficient is 5, and the height coefficient is 7, the first image may be compensated by a first integer of 100, a first integer multiple of 500, a second integer of 100, and a second integer multiple of 700, so that the second image is a 500 × 700 image.

The first integer and the second integer may be the same or different, and the corresponding first integer and the second integer are determined according to the determined width coefficient and the determined height coefficient, and the width and the height of the first image. That is, when the preset size conversion parameter is used to perform size adjustment on the first image, the number of pixel points required to be filled in the first image with different size information is related to the preset size conversion parameter and the size information, and the image size of the converted second image is also different, that is, the first integer multiple and the second integer multiple corresponding to the adjusted second image are different for the first image with different size information.

S103: and dividing the second image into a plurality of grid images, wherein the widths and the heights of the grid images are respectively a first integer and a second integer.

In a specific implementation, after the second image is obtained, the width of the second image may be divided by using the first integer, and the height of the second image may be divided by using the first integer, so as to obtain a plurality of grid images with the width being the first integer and the height being the second integer. That is, the second image may be divided into grid images with width coefficients and height coefficients, where the width of each grid image is a first integer and the height of each grid image is a second integer. Fig. 2 is a schematic diagram of a grid image corresponding to a second image according to an embodiment of the disclosure.

Continuing from the above, when the first integer is 100, the second integer is 100, the width of the second image is 500, and the height of the second image is 700, the width of the second image is divided by using the first integer and the second integer, the height of the second image is divided by using the first integer, the obtained grid images are all 100 in width and 100 in height, different grid images correspond to different positions of the second image, and the image ranges corresponding to the grid images are added to be equal to the image range corresponding to the second image.

S104: and obtaining a processing result of the first image by executing preset conversion processing on each grid image.

Here, the preset conversion process may include an operation of fusing all the pixel points in the mesh image.

In an embodiment, after obtaining each grid image corresponding to the second image, for each grid image, a preset conversion process may be performed on the grid image, that is, all pixel points in the grid image are fused into one pixel point, so as to obtain a fused pixel point corresponding to the grid image, and further obtain a fused pixel point corresponding to each grid image. The fusion mode includes but is not limited to any one of weighted summation, weighted averaging and averaging.

Then, the pixels obtained by fusing the pixels of each of the plurality of grid images may be combined into a row, specifically, the fused pixels corresponding to each grid image are combined into a row according to the position of each grid image in the second image and the way of preceding and succeeding the row, so as to obtain a third image including a row of pixels, and the third image is used as the processing result of the first image.

As shown in fig. 3, a schematic diagram is provided for combining pixel points obtained by fusing pixels of each grid image into a row according to the embodiment of the present disclosure.

Or after the fused pixel points corresponding to the grid images are obtained, the fused pixel points corresponding to the grid images can be combined into a column according to the positions of the grid images in the second image and in a first-column and second-row manner, so as to obtain a third image containing a column of pixels, and the third image is used as the processing result of the first image.

The grid images are divided after the size is adjusted by using the preset size conversion parameters, and each grid image is fused into one pixel point, so that a third image containing a width coefficient multiplied by a height coefficient is obtained, and the first image is converted into the third image with a fixed size. The first images with different sizes can be converted into the third images with fixed sizes, the homogenization processing of the image size of the target object is realized, and the effective information of the original image can be effectively reserved in the images after the homogenization processing.

S105: attribute information and/or behavior information of the target object is determined based on the processing result of the first image.

Here, the attribute information may represent an attribute of the target object, for example, the target object is a driver or a passenger in a vehicle, the attribute information may include a gender attribute, an age attribute, a heart rate attribute, and the like of the target object, and the behavior information may represent a behavior of the target object, for example, the behavior information may include a smoking behavior, a call making behavior, a dangerous driving behavior (e.g., one-handed holding of a steering wheel), and the like.

In specific implementation, the processing result of the first image (i.e., the third image) may be input into a pre-trained model for attribute analysis and/or behavior analysis of the target object, and the processing result of the first image may be analyzed by using the model, so as to output attribute information and/or behavior information of the target object.

In this way, when the size information is adjusted by using the preset size conversion parameter information, the size information may correspond to different numbers of the filling pixels, for example, when the number of the pixels corresponding to the size information is 497, the number of the filling pixels is 3, and when the number of the pixels corresponding to the size information is 591, the number of the filling pixels is 9. The size information of the first image is adjusted through the preset size conversion parameters, so that the size information can be adjusted to the size corresponding to the preset size conversion parameters on the basis of filling a small number of pixel points, the flexibility of size information adjustment is improved, the damage to the image characteristics of the first image can be reduced, and the useful characteristics in the first image are effectively reserved. Furthermore, the accuracy of subsequently determining the attribute information and/or the behavior information of the target object in the first image is improved; and then, by carrying out segmentation processing on the second image, the number of the grid images in the height direction of the image obtained after the first image is segmented can be the same as the height coefficient, the number of the grid images in the width direction of the image can be the same as the width coefficient, and the second images with different size information can be divided into a plurality of grid images matched with the height coefficient and the width coefficient.

In an embodiment, the step of fusing the pixels in each grid image into one pixel point may be implemented according to the following steps:

Here, the weight values are used to fuse channel values of pixel points in the mesh image on each color channel, where the color channels may include an R channel, a G channel, and a B channel, and the first image may be an RGB image. In an embodiment, the preset weight value may be a preset fixed weight value, that is, the same weight value is used when pixels in the grid image in any one of the converted second images are fused.

In an embodiment, the preset weight value corresponding to each grid image may be further determined based on the number of the grid images in the converted second image. Specifically, based on the number of the grid images, the total weight value 1 is equally divided, and a weight value corresponding to each grid image is obtained. For example, in the case where the number of mesh images is 35, the weight value corresponding to each mesh image may be 1/35.

In an embodiment, the weighting values respectively corresponding to each grid image and used for weighting and summing may also be determined according to the position of each grid image in the second image and a preset weighting value matched with each position, which is preset. Here, the preset weight values corresponding to different positions may be different, for example, the grid image in the central area of the second image contains more image information than the information contained in the edge area, and therefore, the preset weight value corresponding to the set central area may be higher than the preset weight value corresponding to the edge area.

After the preset weight values corresponding to the grid images are determined, for each grid image, the initial channel values of each pixel point in the grid image corresponding to each color channel can be weighted and summed according to the preset weight values corresponding to the grid image to obtain a target fusion value, the target fusion value is used as the pixel value of the fused pixel point corresponding to the grid image, and further, the pixel fusion of each grid image can be realized to obtain the fused pixel point and the pixel value corresponding to each grid image.

Further, based on the above steps, the fused pixel point corresponding to each grid image can be determined.

In one embodiment, the first image to be processed includes a preset number of frames, for example, 200 frames, since the first image may be used for training the model. The image processing method provided by the embodiment of the disclosure may further perform stitching on the processing results corresponding to the preset number of frames of the first images along the direction of the pixel row after the processing results corresponding to the preset number of frames of the first images are obtained, so as to obtain a stitched image corresponding to the preset number of frames of the first images. In the process of splicing the processing results corresponding to the first images of the preset number of frames in the direction of the pixel row, the processing results corresponding to each frame of the first images can be spliced in the direction of the pixel row according to the shooting time corresponding to each frame of the first images of the preset number of frames and the sequence of time to obtain a spliced image.

Here, since the processing result corresponding to each first image is a column of pixels, and the pixels of the column of pixels correspond to the width coefficient and the height coefficient, the processing results corresponding to each first image are further stitched in the direction of the pixel row, so that the size of the stitched image is (width coefficient and height coefficient) row and the preset number of columns corresponding to the preset number of frames. For example, if the width coefficient is 5, the height coefficient is 7, and the preset number of frames is 200 frames, the size of the obtained stitched image is 35 rows × 200 columns.

Therefore, the spliced images with fixed sizes can be obtained by splicing the third images corresponding to the first images of the preset number of frames, and size standardization processing of the image frames is realized. The spliced image can be used as standard training sample data and directly input into an attribute analysis model and/or a behavior analysis model of the target object for training or testing, so that a deep learning model for analyzing attribute information and/or behavior information of the target object is obtained. By performing the above size normalization process on a large number of image frames in a video stream or a large number of images acquired by different devices, a large number of sample data that can be used for training the attribute analysis model and/or the behavior analysis model of the target object can be quickly generated.

In one embodiment, for S101, the following steps may be performed:

s101-1: a target object is detected in each image frame in the video stream.

Here, the video stream may be a video corresponding to a photographed target object, the video stream includes a number of image frames, and in a part of the image frames, the target object may not be included, or an incomplete target object may be included. For example, it may be an image frame of a human face including 1/5.

In specific implementation, for each image frame in a video stream in the video stream, object detection may be performed on each image frame according to the shooting time of each image frame and the sequence of the shooting time, so as to determine whether a target object exists in each image frame, that is, determine whether each image frame can detect the target object.

S101-2: the image frames in which the target object is detected are added to the image sequence of the target object.

Here, after the object detection is performed on each image frame, the image frame in which the target object is detected may be added to the image sequence of the target object, and finally, the image sequence corresponding to the video stream is obtained.

S101-3: the method comprises the steps of extracting a preset number of image frames from an image sequence by adopting a sliding window, taking each image frame in the extracted image frames as a first image, wherein the width of the sliding window is equal to the preset number.

The width of the sliding window is equal to the preset number, and the preset number is the number of the images corresponding to the spliced images.

In a specific implementation, a sliding window with a width equal to a preset number may be used to extract a preset number of image frames from the image sequence, and each image frame of the extracted preset number of image frames may be used as a first image frame.

In one embodiment, a sliding window may be used to extract multiple sets of image groups including a preset number of first images, and the first images included in each image group may not be included at all or may partially overlap.

In specific implementation, after the image group including the preset number of first images is extracted from the image sequence by using the sliding window, the sliding window may be slid by using a sliding step length, and based on the slid sliding window, a new image group including the preset number of first images is selected. The sliding step length is smaller than the width of the sliding window, so that no window neutral position exists between the head position of the sliding window after sliding and the tail position of the sliding window before sliding, that is, the situation that a certain image frame in the image sequence is omitted when the first image is extracted from the image sequence can be ensured.

In one embodiment, for S102, the width and the height of the first image may be expanded by pixel filling to a first integer multiple of a width coefficient and a second integer multiple of a height coefficient, respectively, the first integer multiple of the width coefficient being greater than the width of the first image and having a difference with the width of the first image being less than the width coefficient, the first integer multiple of the height coefficient being greater than the height of the first image and having a difference with the height of the first image being less than the height coefficient.

Here, after determining the size information of the first image, the width and the height of the first image may be respectively extended by pixel filling based on the width coefficient and the height coefficient corresponding to the preset size conversion parameter. Specifically, it may be determined that the difference between the width greater than the width of the first image and the width of the first image is smaller than the integer multiple of the width coefficient from the respective integer multiples corresponding to the width coefficient based on the width of the first image and the width coefficient, as the first integer multiple; similarly, the height greater than the first image and the difference from the height of the first image smaller than the integer multiple of the height coefficient may be determined as the second integer multiple from among the respective integer multiples corresponding to the height coefficient based on the height of the first image and the height coefficient.

Then, pixel filling may be performed on the pixel points of the first image in the width direction of the first image, so that the width of the filled first image is a first integral multiple of the width coefficient, and pixel filling may be performed on the pixel points of the first image in the height direction of the first image, so that the height of the filled first image is a second integral multiple of the height coefficient. Therefore, the number of the filled pixel points corresponding to the first image can be minimized, the number of the pixel points needing to be filled is effectively reduced, and accordingly damage to the image characteristics of the first image is reduced.

In an embodiment, the image processing method mentioned in the embodiment of the present disclosure may be executed by a computer device by using OpenCV, but when OpenCV executes a step of fusing pixels in each grid image into one pixel point, only cyclic processing may be performed on each grid image to determine a target pixel point corresponding to each grid image. In this way, the loop processing manner will increase the fusion time for fusing the pixels in each mesh image, thereby reducing the efficiency of image processing.

Therefore, in order to improve the efficiency of image processing, each mesh image may be stored to the target arithmetic device after each mesh image corresponding to the second image is obtained. The target computing device may be a device capable of processing multiple images in parallel, and specifically, the target computing device may be a GPU in a computer device.

In specific implementation, each grid image corresponding to the second image may be stored in a memory in the GPU, then each grid image is acquired from the memory by using the GPU, and the CUDA library is run, and pixels in each grid image are fused in parallel based on a preset weight value corresponding to each grid image, so that a fused pixel point corresponding to each grid image is obtained. In this way, the efficiency of image processing can be effectively improved by parallel processing for each mesh image.

In an embodiment, since the first images may be used to train the model, after the stitched images corresponding to the preset number of frames of the first images are obtained, the target number of stitched images may be used as a set of training data for training the model for attribute analysis and/or behavior analysis of the target object. Specifically, the target number of the stitched images may be stitched into a sample image, and the sample image may be used as training data.

And the training data also comprises a standard attribute analysis result and/or a standard behavior analysis result of the target object, wherein the standard attribute analysis result and/or the standard behavior analysis result are used as a true value of the training data.

Specifically, the number of targets may be set according to training needs, and is not limited herein, for example, if the number of targets is 9, and the preset number is 200, then each stitched image corresponding to the training data is generated based on 200 first images, and the sample image corresponding to the training data is generated based on 9 stitched images.

Different stitched images are generated based on different sets of images comprising a preset number of first images. The model for analyzing the attribute and/or behavior of the target object may be, for example, a model of heartbeat frequency information corresponding to a human face in a predicted image, a model of driving behavior of a driver in a predicted image, a model of whether a person in a predicted image performs a preset action (such as smoking, making a call, and the like), and the specific model is not limited in the embodiment of the present disclosure.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an image processing apparatus corresponding to the image processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the image processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.

As shown in fig. 4, a schematic diagram of an image processing apparatus provided in an embodiment of the present disclosure includes:

an obtaining module 401, configured to obtain an image of a target object as a first image;

an adjusting module 402, configured to perform size adjustment on the first image based on size information of the first image and a preset size conversion parameter to obtain a second image, where the size conversion parameter includes a width coefficient and a height coefficient, a width of the second image is a first integer multiple of the width coefficient, and a height of the second image is a second integer multiple of the height coefficient;

a segmentation module 403, configured to segment the second image into a plurality of grid images, where widths and heights of the grid images are the first integer and the second integer, respectively;

a processing module 404, configured to obtain a processing result of the first image by performing preset conversion processing on each of the mesh images;

a first determining module 405, configured to determine attribute information and/or behavior information of the target object based on a processing result of the first image.

In a possible implementation manner, the processing module 404 is configured to fuse pixels in each of the grid images into one pixel point;

In a possible implementation manner, the processing module 404 is configured to perform weighted summation on all pixel points in the grid image by using a preset weight value.

In a possible embodiment, the apparatus further comprises:

and the splicing module 406 is configured to splice the processing results of the first images of the frames in the preset number along the direction of the pixel row to obtain a spliced image corresponding to the first image of the frames in the preset number.

In a possible implementation, the obtaining module 401 is configured to detect the target object in each image frame in a video stream;

In a possible implementation, the adjusting module 402 is configured to expand the width and the height of the first image to a first integer multiple of the width coefficient and a second integer multiple of the height coefficient by pixel filling, respectively, where the first integer multiple of the width coefficient is greater than the width of the first image and a difference between the width and the width of the first image is smaller than the width coefficient, and the first integer multiple of the height coefficient is greater than the height of the first image and a difference between the height and the height of the first image is smaller than the height coefficient.

In a possible embodiment, the apparatus further comprises:

a second determining module 407, configured to determine the preset weight value according to the following steps:

In a possible implementation, the processing module 404 is configured to store each of the mesh images to a target computing device;

In a possible implementation manner, the stitching module 406 is further configured to, after a preset number of frames of stitched images corresponding to the first image are obtained, use the stitched images of the target number as a set of training data for training a model for attribute analysis and/or behavior analysis of the target object.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 5, which is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure, and includes:

a processor 51 and a memory 52; the memory 52 stores machine-readable instructions executable by the processor 51, the processor 51 being configured to execute the machine-readable instructions stored in the memory 52, the processor 51 performing the following steps when the machine-readable instructions are executed by the processor 51: s101: acquiring an image of a target object as a first image; s102: performing size adjustment on the first image based on size information of the first image and a preset size conversion parameter to obtain a second image, wherein the size conversion parameter comprises a width coefficient and a height coefficient, the width of the second image is a first integral multiple of the width coefficient, and the height of the second image is a second integral multiple of the height coefficient; s103: dividing the second image into a plurality of grid images, wherein the widths and the heights of the grid images are respectively a first integer and a second integer; s104: obtaining a processing result of the first image by performing a preset conversion process on each of the mesh images and S105: attribute information and/or behavior information of the target object is determined based on the processing result of the first image.

The storage 52 includes a memory 521 and an external storage 522; the memory 521 is also referred to as an internal memory, and temporarily stores operation data in the processor 51 and data exchanged with an external memory 522 such as a hard disk, and the processor 51 exchanges data with the external memory 522 through the memory 521.

For the specific execution process of the instruction, reference may be made to the steps of the image processing method described in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the image processing method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the image processing method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implementing, and for example, a plurality of units or components may be combined, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method comprising:

acquiring an image of a target object as a first image;

2. The method according to claim 1, wherein the obtaining of the processing result of the first image by performing a preset conversion process on each of the mesh images comprises:

fusing pixels in each grid image into a pixel point;

3. The method of claim 2, wherein the fusing the pixels in each of the grid images into a pixel point comprises:

4. The method of claim 2, wherein the method further comprises: and splicing the processing results of the first images of the frames of the preset number along the direction of pixel rows to obtain spliced images corresponding to the first images of the frames of the preset number.

5. The method of claim 4, wherein the acquiring an image of a target object as a first image comprises:

detecting the target object in each image frame in a video stream;

6. The method of claim 5, wherein the sliding window has a sliding step size that is less than a width of the sliding window.

7. The method according to any one of claims 1 to 6, wherein the resizing the first image based on the size information of the first image and a preset size conversion parameter comprises:

8. The method according to claim 3, wherein the method further comprises the step of determining the preset weight value:

9. The method of claim 8, wherein the fusing pixels in each of the grid images into a pixel point comprises:

storing each grid image to a target computing device;

10. The method according to claim 4, wherein after obtaining the stitched image corresponding to the first image for a preset number of frames, the method further comprises:

11. An image processing apparatus characterized by comprising:

12. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the processor performing the steps of the image processing method according to any one of claims 1 to 10 when the machine-readable instructions are executed by the processor.

13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the image processing method according to any one of claims 1 to 10.