CN112101287A

CN112101287A - Image processing method, device, equipment and storage medium

Info

Publication number: CN112101287A
Application number: CN202011025577.XA
Authority: CN
Inventors: 王飞; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-09-25
Filing date: 2020-09-25
Publication date: 2020-12-18
Anticipated expiration: 2040-09-25
Also published as: CN112101287B

Abstract

The application provides an image processing method, an image processing device, image processing equipment and a storage medium. The method comprises the steps of determining the number and the positions of various feature points which are associated with people in the obtained target image and the offset between the feature points belonging to the same person in the various feature points. And determining the number of characteristic point pairs belonging to the same person in the plurality of characteristic points according to the offset and the corresponding positions of the plurality of characteristic points. And determining the number of persons included in the target image based on the number of the feature point pairs and the number of the plurality of feature points corresponding to each of the plurality of feature points.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present application relates to computer technologies, and in particular, to an image processing method, apparatus, device, and storage medium.

Background

Currently, more and more scenes require people counting for the captured images or videos. For example, in a passenger flow analysis scenario, people usually need to perform statistics on the images collected by the cameras deployed in a shopping mall, so as to analyze information such as passenger flow volume in the shopping mall at a certain time. It can be seen that a method for accurately counting the number of persons is required.

Disclosure of Invention

In view of the above, the present application discloses at least an image processing method, which includes: determining the number and the position of various feature points which are associated with people in the obtained target image and the offset between the feature points belonging to the same person in the various feature points; determining the number of characteristic point pairs belonging to the same person in the plurality of characteristic points according to the offset and the corresponding positions of the plurality of characteristic points; and determining the number of persons included in the target image based on the number of the feature point pairs and the number of the plurality of feature points corresponding to each of the plurality of feature points.

In some examples shown, the plurality of feature points include at least a first feature point and a second feature point respectively representing different parts of the person; the determining the number and the position of the plurality of feature points associated with the person in the acquired target image and the offset between the feature points belonging to the same person in the plurality of feature points includes: and inputting the target image into an image processing model constructed based on a neural network to obtain a first quantity and a first position corresponding to a first characteristic point included in the target image, a second quantity and a second position corresponding to a second characteristic point included in the target image, and a target offset between the first characteristic point and the second characteristic point belonging to the same person.

In some examples shown, the determining the number of pairs of feature points belonging to the same person from among the plurality of kinds of feature points according to the offset amount and the positions corresponding to the plurality of kinds of feature points includes: and determining the number of characteristic point pairs formed by combining the first characteristic points and the second characteristic points belonging to the same person according to the target offset, the first position and the second position.

In some examples shown, the determining the number of persons included in the target image based on the number of the feature point pairs and the number of the plurality of feature points corresponding to each of the plurality of feature points includes: the number of persons included in the target image is determined based on the first number, the second number, and the number of the feature point pairs.

In some examples shown, the determining the number of persons included in the target image based on the first number, the second number, and the number of the feature point pairs includes: determining a third number of first feature points which are not combined as feature point pairs from among the first feature points included in the obtained target image, based on the first number and the number of the feature point pairs; determining a fourth number of second feature points which are not combined as feature point pairs from among the second feature points included in the target image, based on the second number and the number of the feature point pairs; and determining the sum of the third number, the fourth number, and the number of the feature point pairs as the number of persons included in the target image.

In some examples shown, the inputting the target image into an image processing model constructed based on a neural network to obtain a first number and a first position corresponding to a first feature point included in the target image includes: extracting the features of the target image to obtain a corresponding first feature map corresponding to the first feature point; sliding a preset sliding window on the first characteristic diagram, and determining target pixel points included in the preset sliding window after sliding and positions corresponding to the target pixel points; after the multiple sliding operations are completed to obtain multiple target pixel points, determining the number corresponding to the multiple target pixel points as the first number of the first feature points, and determining the positions corresponding to the multiple target pixel points as the first positions of the first feature points.

In some examples shown, the target pixel point includes a pixel point with a maximum pixel value in the preset sliding window or a pixel point with a pixel value exceeding a first threshold.

In some examples, the determining the number of pairs of feature points in which the first feature point and the second feature point belonging to the same person are combined based on the target offset amount, the first position, and the second position includes: taking any one of a plurality of first feature points of the target image as a target first feature point, and transforming a first position corresponding to the target first feature point according to a target offset corresponding to the target first feature point to obtain a predicted position; determining a target area based on the predicted position, and determining whether a second characteristic point exists in the target area; when a second feature point exists in the target area, the second feature point closest to the predicted position is determined as a second feature point belonging to the same person as the target first feature point, and the counted number of the feature point pairs is updated.

In some examples shown, the first feature point or the second feature point includes a face feature point.

In some examples shown, the first feature points include face feature points, and the second feature points include torso feature points; or, the first feature points include trunk feature points, and the second feature points include face feature points.

In some examples shown, the image processing model includes a first predictor model, a second predictor model, and an offset predictor model; the first predictor model is used for predicting a first quantity and a first position corresponding to a first feature point included in the target image; the second predictor model is used for predicting a second quantity and a second position corresponding to a second feature point included in the target image; the offset amount prediction submodel is used for predicting a target offset amount between a first characteristic point and a second characteristic point belonging to the same person.

In some examples shown, the method for training the image processing model includes: acquiring a plurality of training samples comprising marking information; the labeling information comprises a first sample characteristic point, a second sample characteristic point and an offset between the first sample characteristic point and the second sample characteristic point which belong to the same person; determining joint learning loss information based on loss information corresponding to each sub-model included in the image processing model; and performing joint training on each sub-model included in the image processing model based on the joint learning loss information and the training samples until each sub-model converges.

The present application also proposes an image processing apparatus, which includes: the first determining module is used for determining the number and the positions of various feature points which are associated with people in the obtained target image and the offset between the feature points which belong to the same person in the various feature points; a second determining module, configured to determine, according to the offset and the respective corresponding positions of the multiple feature points, the number of feature point pairs belonging to the same person in the multiple feature points; and a third determining module, configured to determine, based on the number of the feature point pairs and the number of the plurality of feature points corresponding to each other, the number of people included in the target image.

In some examples shown, the plurality of feature points include at least a first feature point and a second feature point respectively representing different parts of the person; the first determining module includes: the image processing module is specifically configured to input the target image into an image processing model constructed based on a neural network, and obtain a first number and a first position corresponding to a first feature point included in the target image, a second number and a second position corresponding to a second feature point included in the target image, and a target offset between the first feature point and the second feature point belonging to the same person.

In some examples shown, the second determining module includes: and the characteristic point pair determining module is used for determining the number of characteristic point pairs formed by combining the first characteristic points and the second characteristic points belonging to the same person according to the target offset, the first position and the second position.

In some examples shown, the third determining module includes: and a number-of-persons determination module that determines the number of persons included in the target image based on the first number, the second number, and the number of the feature point pairs.

In some examples shown, the person number determination module is specifically configured to: determining a third number of first feature points which are not combined as feature point pairs from among the first feature points included in the obtained target image, based on the first number and the number of the feature point pairs; determining a fourth number of second feature points which are not combined as feature point pairs from among the second feature points included in the target image, based on the second number and the number of the feature point pairs; and determining the sum of the third number, the fourth number, and the number of the feature point pairs as the number of persons included in the target image.

In some examples shown, the image processing module is specifically configured to: extracting the features of the target image to obtain a corresponding first feature map corresponding to the first feature point; sliding a preset sliding window on the first characteristic diagram, and determining target pixel points included in the preset sliding window after sliding and positions corresponding to the target pixel points; after the multiple sliding operations are completed to obtain multiple target pixel points, determining the number corresponding to the multiple target pixel points as the first number of the first feature points, and determining the positions corresponding to the multiple target pixel points as the first positions of the first feature points.

In some examples shown, the characteristic point pair determining module is specifically configured to: taking any one of a plurality of first feature points of the target image as a target first feature point, and transforming a first position corresponding to the target first feature point according to a target offset corresponding to the target first feature point to obtain a predicted position; determining a target area based on the predicted position, and determining whether a second characteristic point exists in the target area; when a second feature point exists in the target area, the second feature point closest to the predicted position is determined as a second feature point belonging to the same person as the target first feature point, and the counted number of the feature point pairs is updated.

In some examples shown, the training apparatus corresponding to the training method for the image processing model includes: the acquisition module acquires a plurality of training samples comprising marking information; the labeling information comprises a first sample characteristic point, a second sample characteristic point and an offset between the first sample characteristic point and the second sample characteristic point which belong to the same person; a loss information determination module for determining joint learning loss information based on the loss information corresponding to each sub-model included in the image processing model; and the joint training module is used for carrying out joint training on each sub-model included in the image processing model based on the joint learning loss information and the training samples until each sub-model converges.

The present application further provides an electronic device, the above device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to call the executable instructions stored in the memory to implement the image processing method as shown in any of the above embodiments.

The present application also proposes a computer-readable storage medium storing a computer program for executing the image processing method as shown in any of the above embodiments.

In the above aspect, the apparatus may determine the number of the plurality of types of feature point pairs associated with the person and the number of feature point pairs belonging to the same person among the plurality of types of feature points by performing overall analysis on the target image. And the apparatus may further determine the number of persons included in the target image based on the number of the feature point pairs and the number of the plurality of kinds of feature point pairs.

Therefore, when the number of the personnel is determined, the system can not only count the personnel in the target image respectively by taking the multiple feature points as dimensions, but also delete the number of the personnel subjected to repeated counting, so that on one hand, the personnel can be counted by utilizing the multiple feature points related to the personnel, and the problem of inaccurate personnel counting caused by the fact that a single specific feature is shielded is avoided; on the other hand, personnel statistics can be carried out by utilizing the global information of the target image, and the personnel statistics accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate one or more embodiments of the present application or technical solutions in related arts, the drawings used in the description of the embodiments or related arts are briefly introduced below, it is obvious that the drawings in the description below are only some embodiments described in one or more embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1 is a method flow diagram of an image processing method shown in the present application;

FIG. 2 is a schematic illustration of an offset shown in the present application;

FIG. 3 is a method flow diagram of an image processing method shown in the present application;

FIG. 4 is a schematic flow chart illustrating image processing of a target image according to the present application;

FIG. 5 is a schematic flow chart of the present application;

FIG. 6 is a schematic flow chart of the present application;

fig. 7 is a schematic diagram illustrating a characteristic point pair quantity determination process according to the present application;

FIG. 8 is a method flow diagram of an image processing model training method illustrated herein;

fig. 9 is a schematic diagram of an image processing apparatus shown in the present application;

fig. 10 is a hardware configuration diagram of an electronic device according to the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The above-described embodiments in the following exemplary embodiments do not represent the above-described embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed above in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and possibly combinations of one or more of the associated listed items. It will also be understood that the word "if" as used herein above may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.

In some people counting methods, it is generally required to perform sliding in a target image by using a sliding window, and count the number of specific features (for example, the specific features may be face features or trunk features) of people included in the sliding window after each sliding. After the sliding of the sliding window is finished, the number of the personnel in the target image can be obtained by adding the counted specific characteristic numbers corresponding to each sliding.

In the method, on one hand, due to the fact that people are dense or an occlusion exists, a specific feature in a target image may be occluded, so that the counted number of people is not accurate; on the other hand, people counting using only a local region within the sliding window without using global information included in the target image may also result in inaccurate counted numbers of people.

In view of the above, the present application provides an image processing method. The method can determine the number of the multiple characteristic points corresponding to the person and the number of the characteristic point pairs belonging to the same person in the multiple characteristic points by performing overall analysis on the target image. And the method may further determine the number of persons included in the target image based on the number of the above-described feature point pairs and the number of the above-described various feature point pairs. Therefore, when the number of the personnel is determined, the method can not only count the personnel in the target image respectively by taking the multiple feature points as dimensions, but also delete the number of the personnel repeatedly counted, so that on one hand, the personnel can be counted by utilizing the multiple feature points related to the personnel, and the problem of inaccurate personnel counting caused by the fact that a single specific feature is shielded is avoided; on the other hand, personnel statistics can be carried out by utilizing the global information of the target image, and the personnel statistics accuracy is improved.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method of image processing according to the present application. As shown above in fig. 1, the method may include:

s102, determining the number and the position corresponding to each of the multiple characteristic points associated with the person in the acquired target image and the offset between the characteristic points belonging to the same person in the multiple characteristic points.

And S104, determining the number of characteristic point pairs belonging to the same person in the plurality of characteristic points according to the offset and the corresponding positions of the plurality of characteristic points.

And S106, determining the number of the persons in the target image based on the number of the characteristic point pairs and the number corresponding to each of the plurality of characteristic points.

The image processing method can be applied to electronic equipment. The electronic device may execute the image processing method by installing a software system corresponding to the image processing method. The electronic device may be a notebook computer, a server, a mobile phone, a PAD terminal, etc., and is not particularly limited in this application.

It is understood that the above image processing method may be executed by only the terminal device or the server device, or may be executed by the terminal device and the server device in cooperation.

For example, the image processing method described above may be integrated in the client. After receiving the image processing request, the terminal device carrying the client can provide calculation power through the hardware environment of the terminal device to execute the image processing method.

For another example, the image processing method described above may be integrated into a system platform. After receiving the image processing request, the server device carrying the system platform can provide computing power to execute the image processing method through the hardware environment of the server device.

For example, the image processing method may be divided into two tasks, that is, acquiring the target image and processing the target image. The acquisition task can be integrated in the client and carried on the terminal device. The processing task can be integrated in the server and carried with the server device. The terminal device may initiate an image processing request to the server device after acquiring the target image. The server device may perform the method on the target image in response to the request after receiving the image processing request.

The following description will be given taking an execution body as an electronic device (hereinafter simply referred to as a device) as an example.

The device may first execute S102 to determine the number and the position of the plurality of feature points associated with the person in the acquired target image, and the offset between the feature points belonging to the same person in the plurality of feature points.

The target image refers to an image acquired by an image acquisition device. Typically, the target image includes a plurality of people. For example, the target image may be a video or an image captured by a camera device deployed in a mall or on a street. It will be appreciated that the video is formed from a plurality of successive images and the method of determining the number of people in the video may be referred to the method of determining the number of people in an image. The following description will be given taking a method of determining the number of persons in an image as an example.

When the target image is acquired, the device can complete the input of the target image through interaction with a user. For example, the device may provide a window for a user to input a target image to be processed through a mounted interface, so that the user can input the image. The user can complete the input of the target image based on the window. After the target image is acquired, the device can input the image into an image processing model for calculation. Of course, the above-mentioned device may also directly acquire the image from the image acquisition device, and is not limited herein.

The above-mentioned multiple feature points specifically refer to feature points associated with a person in the target image. The number of people in the target image counted by taking the characteristic point as a dimension can be determined through any characteristic point.

In practical applications, the above-mentioned multiple feature points are usually feature points on a human body. For example, the various feature points may include face feature points, trunk feature points, limb feature points, and the like. Taking the face feature points as an example, the number of people counted with the face feature points as dimensions in the target image can be determined by determining the number of the face feature points included in the target image.

In some examples, the target image may be input into an image processing model that is supervised training for determination of feature points and quantity statistics in the target image. The image processing model may be obtained by training a plurality of truth maps labeled with feature points.

When a training sample is constructed, the feature points included in the original image can be labeled to obtain a true value image. For example, after the original image is obtained, whether each pixel point corresponding to the original image is a feature point or not may be determined, if the pixel point is a feature point, a label 1 may be given through the labeling software, otherwise, a label 0 is given.

The positions corresponding to the multiple feature points are, specifically, positions where the feature points are located in the target image or a feature map corresponding to the target image (the feature map is an image obtained by performing feature extraction on the target image for multiple times). For example, the lower left corner of the target image may be used as the origin of coordinates, and the position of the pixel point corresponding to the feature point may be used as the position coordinate of the feature point.

The offset is specifically a positional offset between feature points belonging to the same person among the plurality of feature points. By the offset, the position of a known feature point can be converted to obtain a position corresponding to a feature point belonging to the same person as the feature point.

In some embodiments, the offset may be an offset vector. That is, the amount of positional shift of the feature point belonging to the same person on the x-axis and the y-axis.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating an offset according to the present application.

Fig. 2 shows a rectangular coordinate system constructed by taking the lower left corner of the target image as the center of the coordinate circle. Point a is a human face feature point. Point B is a torso feature point belonging to the same person as point a. Point C is a feature point of the extremity belonging to the same person as points a and B. The offset between the point A and the point B is (x2-x1, y2-y 1). The offset between points A and C is (x3-x1, y3-y 1). The offset between the B point and the C point is (x3-x2, y3-y 2). It can be understood that when the position of the point a is known, the position corresponding to the point B can be obtained by the offset between the point a and the point B.

When determining the offset included in the target image, the target image may be input into an image processing model that is supervised and trained for calculation. The image processing model is obtained by training a plurality of training samples marked with true values of offsets between the feature points.

After determining the number and the position of each of the plurality of types of feature points associated with the person in the acquired target image and the offset amount between the feature points belonging to the same person among the plurality of types of feature points, the apparatus may perform S104 to determine the number of the feature point pairs belonging to the same person among the plurality of types of feature points based on the offset amount and the position of each corresponding feature point of the plurality of types of feature points.

The characteristic point pairs are specifically formed by combining characteristic points belonging to the same person. In practical applications, when there is actually another feature point B in the position obtained by converting the position corresponding to the feature point a by the offset amount corresponding to the feature point a, the feature point a and the feature point B can be considered as a pair of feature point pairs.

For example, with continued reference to fig. 2, point A, B, C belongs to the same person, and the position obtained by converting the position of point a by the offset amount corresponding to point a does exist between point B and point C, so point a and point B constitute one characteristic point pair, and point a and point C also constitute one characteristic point pair. It is understood that point B and point C in fig. 2 also form a characteristic point pair, and point A, B, C also forms a characteristic point pair for similar reasons.

The number of the characteristic point pairs is specifically the number of characteristic point pairs formed by combining characteristic points belonging to the same person. It is to be understood that the number of pairs of feature points may indicate the number of persons repeatedly counted if the numbers of persons counted with each of the plurality of kinds of feature points as dimensions are added.

For example, NA represents the number of persons in the target image counted in the dimension of point a, and NB represents the number of persons in the target image counted in the dimension of point B. It is easy to find that if the number of persons in the target image is calculated by NA plus NB, a part of the number of persons is repeatedly calculated. That is, the number of persons including both the feature point a and the feature point B is repeatedly calculated. The number of persons including both the feature point a and the feature point B is the number of pairs of feature points formed by combining the feature point a and the feature point B. It can be seen that the number of feature point pairs may indicate the number of persons repeatedly counted if the numbers of persons counted with each feature point of the plurality of feature points as dimensions are added.

A counter may be maintained in the system when counting the number of pairs of characteristic points. The counter may be incremented by one each time a pair of feature point pairs is determined.

After determining the number of pairs of feature points, the apparatus may perform S106 to determine the number of persons included in the target image based on the number of pairs of feature points and the number of respective correspondences of the plurality of types of feature points.

In this step, the number of people counted with each of the plurality of feature points as a dimension may be added to obtain an addition result. And then, subtracting the repeatedly counted number of the personnel to obtain the number of the personnel in the target image. The number of the characteristic point pairs is subtracted from the addition result to obtain the number of the persons included in the target image.

In some embodiments, the plurality of feature points may include at least a first feature point and a second feature point that respectively represent different parts of the person. The first feature point and the second feature point are both feature points associated with people.

In some examples, since the facial features in the acquired target image are obvious, the first feature point or the second feature point may be set as the facial feature point. Because the face characteristic point is adopted as one of the characteristic points, and the corresponding characteristic of the face characteristic point in the target image is more obvious, more accurate personnel statistics can be carried out.

In some examples, since the trunk feature is also apparent in the acquired target image, the first feature point includes a face feature point, and the second feature point includes a trunk feature point; or, the first feature points include trunk feature points, and the second feature points include face feature points.

Because the face characteristic points and the trunk characteristic points are used as one of the characteristic points, and the corresponding characteristics of the face characteristic points and the trunk characteristic points in the target image are more obvious, more accurate personnel statistics can be carried out.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method of image processing according to the present application.

As shown in fig. 3, when the apparatus executes the step S102, the apparatus may execute a step S302 of inputting the target image into an image processing model constructed based on a neural network, and obtaining a first number and a first position corresponding to a first feature point included in the target image, a second number and a second position corresponding to a second feature point included in the target image, and a target offset between the first feature point and the second feature point belonging to the same person.

The first number specifically refers to the number of first feature points included in the target image. When the first number is determined, first feature points may be extracted from the target image, and then the corresponding number of the extracted first feature points is the first number.

The first position is specifically a position of the first feature point in the target image or a target feature map corresponding to the target image (where the target feature map is a feature map obtained by processing the target image through a backbone network, for example, the target image is an 800 × 600 image, and the target feature map may be an 80 × 60 image). When the first position is determined, the first feature point may be extracted from the target image, and then the position of the extracted first feature point in the target feature map may be determined as the first position.

It should be noted that the explanation of the second number and the second position can refer to the first number and the first position, and will not be described in detail herein.

The image processing model may be a model constructed based on a deep convolutional neural network.

Referring to fig. 4, fig. 4 is a schematic flow chart illustrating image processing performed on a target image according to the present application. As shown in fig. 4, the image processing model may include three branches, which may share the same backbone network. The first branch can be used for predicting a first quantity and a first position corresponding to the first characteristic point, the second branch can be used for predicting a second quantity and a second position corresponding to the second characteristic point, and the third branch can be used for predicting a target offset between the first characteristic point and the second characteristic point which belong to the same person. It should be noted that the structure of the image processing model shown in fig. 4 is only a schematic illustration, and in practical applications, the structure of the model may be set up according to actual situations.

The image processing model may be trained based on a plurality of training samples labeled with truth values. The truth value may include a first feature point, a second feature point, and a target offset between the first feature point and the second feature point belonging to the same person.

The backbone network is specifically used for feature prediction of a target image. For example, the backbone network may be a feature extraction network such as VGG or ResNet, and is not particularly limited herein. After the target feature map corresponding to the target image is predicted by the backbone network, the target feature map may be input into each of the three branches, and further prediction may be performed.

The first branch can output the first number and the first position.

Referring to fig. 5, fig. 5 is a schematic flow chart of the present application. The structure of branch one shown in fig. 5 is only schematic. In practical application, the structure of the first branch can be adjusted according to practical requirements.

As shown in fig. 5, S502 may be executed to perform feature extraction on the target image to obtain a corresponding first feature map corresponding to the first feature point.

It is understood that, in order to ensure that branch one can output the first feature map corresponding to the first feature point, when the image processing model is trained, the truth image labeled with the first feature point can be trained as a training sample.

In this step, a target feature map corresponding to the target image may be input to the feature extraction network and calculated to obtain a first feature map corresponding to the first feature point.

In some examples, after obtaining the first feature map, in order to perform feature alignment on each pixel included in the first feature map, S504 may be performed to perform normalization processing on the first feature map, so as to obtain a normalized first feature map.

In this step, the first feature map may be input to a normalization processing unit to perform normalization operation, so as to obtain a normalized first feature map.

Wherein the normalization processing unit may be a processing unit comprising a sigmoid normalization algorithm, for example. The processing unit may perform normalization processing on each pixel in the first feature map to exclude dimensional influences between different pixels.

After obtaining the first feature map, S506 may be executed to slide a preset sliding window on the normalized first feature map, and determine a target pixel point included in the preset sliding window after the sliding window is slid and a position corresponding to the target pixel point.

In some examples, the target pixel point may include a pixel point within the preset sliding window whose pixel value is greater than a first threshold (empirical threshold). In some examples, to improve the statistical accuracy of the number, the target pixel point may include a pixel point with a largest pixel value within the predetermined sliding window.

In this step, the normalized first feature map may be input to the target pixel point determining unit to be processed to obtain a plurality of target pixel points and positions corresponding to the target pixel points.

The target pixel point determination unit may include a pooling layer. Wherein the pooling layer may be maximally pooled using a preset sliding window. The size of the preset sliding window is not particularly limited in the present application. For example, the preset sliding window may be a window of 3 × 3 size.

In the process of maximum pooling, the preset sliding window may be slid on the normalized first feature map, and after each sliding, a maximum value included in the preset sliding window is determined, and a position corresponding to the maximum value is recorded.

It should be noted that, in order to ensure the accuracy of the staff statistics, in some embodiments, the step length of the sliding of the preset sliding window on the normalized first feature map is 1. Because the step length that the slip operation corresponds is 1, consequently can scan the whole pixel of first characteristic map, utilize whole information in the first characteristic map to confirm above-mentioned first quantity promptly to guarantee to determine accurate first quantity, further guarantee personnel's statistics precision.

After the multiple sliding operations are completed to obtain multiple target pixel points, S508 may be executed, where the number of the multiple target pixel points is determined as the first number of the first feature points, and the positions corresponding to the multiple target pixel points are determined as the first positions of the first feature points.

In this step, the target pixel points may be input to a determining unit for processing, so as to obtain a plurality of first feature points.

The judging unit may include a screening logic configured to screen out target pixels, of the plurality of target pixels, whose pixel values reach the second threshold, to obtain a plurality of first feature points. The second threshold may be a value set by a developer based on experience. The point that reaches the second threshold value may be regarded as the first feature point.

It can be understood that, because only a unique first feature point may exist at the same position, before the screening operation is performed by the determining unit, target pixel points at the same position in the plurality of target pixel points may be merged, and only a unique target pixel point is reserved.

After obtaining the plurality of first feature points, the first number corresponding to the counted first feature points may be output by the number determination unit, and the first position corresponding to each first feature point may be output by the position determination unit.

Wherein, the number determination unit may include a counter. The counter may indicate the number of first feature point points. For example, the count of the counter may be incremented by one each time a first feature point is determined. When all the first characteristic points are determined, the first number can be obtained through the counter.

The position determination unit may maintain a correspondence relationship between the first characteristic point and the first position. For example, after each first feature point is determined, the position determining unit may associate and store an identifier (which may be a unique identifier corresponding to the first feature point, such as a position coordinate or a pixel number) corresponding to the first feature point and a coordinate position of a target pixel point corresponding to the first feature point. After the target pixel points are judged, the first positions corresponding to the first characteristic points can be obtained through the position determining unit.

The first number and the first position are obtained through the first branch.

The second branch can output the second number and the second position.

Referring to fig. 6, fig. 6 is a schematic flow chart of the present application. The structure of the second branch shown in fig. 5 is merely schematic. In practical application, the structure of the second branch can be adjusted according to practical requirements.

As shown in fig. 6, S602 may be executed to perform feature extraction on the target image to obtain a corresponding second feature map corresponding to the second feature point.

It is to be understood that, in order to ensure that branch two can output the second feature map corresponding to the second feature point, when training the image processing model, the truth value image labeled with the second feature point can be trained as a training sample.

In this step, a target feature map corresponding to the target image may be input to the feature extraction network and calculated to obtain a second feature map corresponding to the second feature point.

In some examples, after obtaining the second feature map, in order to perform feature alignment on each pixel included in the second feature map, S604 may be performed to perform normalization processing on the second feature map, so as to obtain a normalized second feature map.

In this step, the second feature map may be input to a normalization processing unit to perform normalization operation, so as to obtain a normalized second feature map.

Wherein the normalization processing unit may be a processing unit comprising a sigmoid regression algorithm, for example. The processing unit may perform normalization processing on each pixel in the second feature map to exclude dimensional influences between different pixels.

After obtaining the second feature map, S606 may be executed to slide a preset sliding window on the normalized second feature map, and determine a target pixel point included in the preset sliding window after the sliding window is slid and a position corresponding to the target pixel point.

In some examples, the target pixel point may include a pixel point within the preset sliding window whose pixel value is greater than a third threshold (empirical threshold). In some examples, to improve the statistical accuracy of the number, the target pixel point may include a pixel point with a largest pixel value within the predetermined sliding window.

In this step, the normalized second feature map may be input to the target pixel point determining unit to be processed to obtain a plurality of target pixel points and positions corresponding to the target pixel points.

In the process of maximum pooling, the preset sliding window may be slid on the normalized second feature map, and after each sliding, a maximum value included in the preset sliding window is determined, and a position corresponding to the maximum value is recorded.

It should be noted that, in order to ensure the accuracy of the staff statistics, in some embodiments, the step length of the sliding of the preset sliding window on the normalized second feature map is 1. Because the step length that the slip operation corresponds is 1, consequently can scan the whole pixels of second characteristic map, utilize whole information in the second characteristic map to confirm above-mentioned second quantity promptly to guarantee to determine accurate second quantity, carry out two-step assurance personnel and count the precision.

After the plurality of target pixel points are obtained by the sliding operation, S608 may be executed to determine the number of the plurality of target pixel points as the second number of the second feature points, and determine the positions corresponding to the plurality of target pixel points as the second positions of the second feature points.

In this step, the target pixel points may be input to the determining unit for processing, so as to obtain a plurality of second feature points.

The determination unit may include a screening logic configured to screen out target pixels, of the plurality of target pixels, whose pixel values reach a fourth threshold value, to obtain a plurality of second feature points. The fourth threshold may be a value set by a developer based on experience. The point reaching the fourth threshold value may be regarded as the second feature point.

It can be understood that, because only a unique second feature point may exist at the same position, before the screening operation is performed by the determining unit, target pixel points at the same position in the plurality of target pixel points may be merged, and only a unique target pixel point is reserved.

After the plurality of second feature points are obtained, the counted second number corresponding to the second feature points can be output through the number determining unit, and the second positions corresponding to the second feature points can be output through the position determining unit.

Wherein, the number determination unit may include a counter. The counter may indicate the number of second feature point points. For example, after each determination of two second feature points, the count of the counter may be incremented by two. When all the second feature points are determined, the second number can be obtained through the counter.

The position determination unit may maintain a correspondence relationship between the second characteristic point and the second position. For example, after determining two second feature points, the position determining unit may associate and store an identifier corresponding to the second feature point (which may be an identifier corresponding to the second feature point uniquely, such as a position coordinate and a pixel number), and a coordinate position of a target pixel point corresponding to the second feature point. And after the target pixel points are judged, the second positions corresponding to the second characteristic points can be obtained through the position determining unit.

The scheme of obtaining the second number and the second position through the second branch is described.

The third branch can specifically predict the target offset between the first characteristic point and the second characteristic point belonging to the same person. By the target offset, the position of the first feature point can be transformed to obtain a position corresponding to a second feature point belonging to the same person as the feature point.

It is to be understood that, in order to ensure that branch three can output the offset, when training the image processing model, the true value image marked with the offset may be trained as a training sample.

After obtaining the first position, the first number, the second position, the second number, and the target offset amount, S304 may be executed to determine the number of pairs of feature points in which a first feature point and a second feature point belonging to the same person are combined, based on the target offset amount, the first position, and the second position.

In practical applications, the first feature points determined in S508 and S608 and the second feature points may be combined in pairs, and it may be determined whether the offset corresponding to the inside of each combination matches the target offset determined in S302. And if the first characteristic point and the second characteristic point are matched, determining the first characteristic point and the second characteristic point in the combination as the characteristic point combination belonging to the same person, and updating the statistical characteristic point number.

After the above combination judgment is completed, the number of feature point pairs in which the first feature point and the second feature point belonging to the same person are combined may be determined.

In some embodiments, a statistical algorithm is employed in order to accurately determine the number of pairs of feature points. The algorithm predicts the predicted position of a second characteristic point belonging to the same person as each first characteristic point by traversing each first characteristic point and aiming at the target offset corresponding to the traversed first characteristic point. Then, the second characteristic point of the predicted position is determined to be the second characteristic point belonging to the same person as the first characteristic point when the second characteristic point really exists, and the counted number of the characteristic point pairs is updated, so that accurate counting is realized.

Of course, it is understood that, in the above statistical algorithm, the second feature points may also be traversed, and the predicted position of the first feature point is predicted according to the target offset corresponding to the second feature point, so as to determine the first feature point belonging to the same person as the second feature point and count the number of the first feature points, which is not described in detail herein.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a characteristic point pair quantity determination process according to the present application.

When counting the number of feature point pairs, S702 may be executed first, and for each target first feature point included in the target image, a first position corresponding to the target first feature point may be transformed according to a target offset corresponding to the target first feature point, so as to obtain a predicted position.

The predicted position is specifically a predicted position of a second feature point belonging to the same person as the first feature point, which is predicted according to the coordinates of the first feature point and the target offset corresponding to the first feature point determined in the statistics step S302. This position may be understood as the position of a possible second feature point. By determining whether the second feature point really exists in the predicted position in the second feature map output in S602, it can be determined whether a second feature point belonging to the same person as the first feature point really exists in the target image, and the number of the statistical feature point pairs is updated when the second feature point really exists.

In this step, the predicted position may be obtained by transforming the position of the first feature point by using a spatial transform network, mathematical mapping transform, or other methods.

After obtaining the predicted position, S704 may be executed to determine a target area based on the predicted position, and determine whether a second feature point exists in the target area.

The target area may be a closed area. The parameters such as the shape and size of the target region may be set according to actual conditions, and are not limited herein. It will be appreciated that the target area may be set according to the actual size of the target image or target feature map.

For example, the target area may be a circle having the predicted position as a center. The radius may be 5 pixel values.

After the target area is determined, it may be determined whether a second feature point exists in the target area.

In this step, it may be determined whether the position corresponding to the traversed second feature point falls into the target region for each second feature point determined in the traversal S608, and if so, it may be considered that the second feature point is in the target region.

In some examples, there may be a plurality of second feature points within the target region. In this case, S706 may be performed to determine a second feature point closest to the predicted position as a second feature point belonging to the same person as the target first feature point, and update the counted number of the feature point pairs.

In this step, the plurality of second feature points may be sorted with a distance between the second feature point and the target first feature point as a dimension, the second feature point closest to the distance may be determined as the second feature point belonging to the same person as the target first feature point, and the counted number of the feature point pairs may be updated.

It will be appreciated that, in order to avoid the situation where the same second feature point is paired with a plurality of target first feature points, in some embodiments, after a pair of feature point pairs is determined, the second feature point in the pair may not participate in other feature point combination pairs, such as marking or deleting the second feature point.

After determining the number of pairs of feature points of the target combination, S306 may be executed to determine the number of persons included in the target image based on the first number, the second number, and the number of pairs of feature points.

The method comprises the following steps of respectively counting the number of people in a target image by taking a first characteristic point and a second characteristic point as dimensions, and deleting the number of people repeatedly counted, so that the number of people in the target image is accurately counted.

In view of the above object, when determining the number of persons included in the target image based on the first number, the second number, and the number of the feature point pairs, the following may be adopted:

s3062, based on the first number and the number of the feature point pairs, determining a third number of the first feature points included in the obtained target image, the third number corresponding to the first feature point not combined as the feature point pair.

S3064, a fourth number of second feature points not combined as feature point pairs among the second feature points included in the obtained target image is determined based on the second number and the number of the feature point pairs.

S3066, the sum of the third number, the fourth number, and the number of the feature point pairs is determined as the number of persons included in the target image.

Specifically, in the case of executing S3062 to S3066, the number of persons may be obtained by adding the first number to the second number and subtracting the number of the feature point pairs. Alternatively, the number of the characteristic point pairs is added to the result obtained by subtracting the number of the characteristic point pairs from each of the first number and the second number, and then the number of the characteristic point pairs is added to obtain the number of persons.

In the above aspect, the apparatus may determine the number of the plurality of types of feature point pairs associated with the person and the number of feature point pairs belonging to the same person among the plurality of types of feature points by performing overall analysis on the target image. And the apparatus may further determine the number of persons included in the target image based on the number of the feature point pairs and the number of the plurality of kinds of feature point pairs. Therefore, when the number of the personnel is determined, the system can not only count the personnel in the target image respectively by taking the multiple characteristic points as dimensions, but also delete the number of the personnel repeatedly counted, so that on one hand, the personnel can be counted by utilizing the multiple characteristic points related to the personnel, and the problem of inaccurate personnel counting caused by the fact that a single specific characteristic is blocked is solved; on the other hand, personnel statistics can be carried out by utilizing the global information of the target image, and the personnel statistics accuracy is improved.

The above is an introduction to the statistical approach for the number of people shown in the present application, and the following is an introduction to the training method of the image processing model.

In the application, in order to improve the prediction accuracy of the image processing model to the base region and the generalization capability of the model, a multi-task joint training mode is adopted when the image processing model is trained.

The image processing model can comprise a first predictor model, a second predictor model and an offset predictor model;

the first predictor model is used for predicting a first quantity and a first position corresponding to a first feature point included in the target image; the second predictor model is used for predicting a second quantity and a second position corresponding to a second feature point included in the target image; the offset amount prediction submodel is used for predicting a target offset amount between a first characteristic point and a second characteristic point belonging to the same person.

In some embodiments, in order to increase the supervision information during the training of the image processing model, thereby improving the prediction accuracy of the image processing model, a truth map labeled with the first feature points, a truth map labeled with the second feature points, and a constraint in the truth map labeled with the offset between the first feature points and the second feature points belonging to the same person may be introduced.

Referring to fig. 8, fig. 8 is a flowchart illustrating a method of training an image processing model according to the present application.

As shown above in fig. 8, the method comprises:

s802, obtaining a plurality of training samples comprising marking information; the labeling information comprises a first sample characteristic point, a second sample characteristic point and an offset between the first sample characteristic point and the second sample characteristic point which belong to the same person.

When the step is executed, the original image can be subjected to true value annotation in a manual annotation or machine-assisted annotation mode. For example, after the original image is obtained, on one hand, each pixel included in the original image may be labeled by using image labeling software to belong to a first sample feature point, a second sample feature point or a background; on the other hand, the offset corresponding to the second sample feature point belonging to the same person as the first sample feature point may be marked at the first sample feature point. After the labeling operation is completed on the original image, a plurality of training samples can be obtained. Note that, when labeling information, a method such as one-hot encoding may be used for labeling, and the specific method of labeling is not limited in the present application.

And S804, determining the joint learning loss information based on the loss information corresponding to each sub-model included in the image processing model.

In executing this step, loss information corresponding to each submodel may be determined. In order to improve the prediction accuracy of the sub-models, in the present application, the loss information corresponding to each sub-model is cross entropy loss information.

After the loss information corresponding to each sub-model is determined, the joint learning loss information may be determined based on the loss information corresponding to each sub-model included in the image processing model. For example, the joint learning loss information may be obtained by adding loss information corresponding to each of the submodels.

In the present application, a regularization term may be added to the joint learning loss information, and is not particularly limited herein.

After determining the joint learning loss information and the training samples, S806 may be performed to perform joint training on each sub-model included in the image processing model based on the joint learning loss information and the training samples until the sub-models converge.

In training the model, a hyper-parameter such as a learning rate, the number of training cycles, etc. may be specified first. After determining the hyper-parameters, the image processing model may be supervised trained based on a plurality of training samples comprising annotation information.

In the process of supervised training, after a calculation result is obtained by forward propagation of an image processing model, the error between the labeling information and the calculation result is evaluated based on the determined joint learning loss information. After the error is found, the falling gradient can be determined using a random gradient descent method. After determining the gradient of descent, the model parameters corresponding to the image processing model may be updated based on back propagation. And repeating the process until the submodels converge. In the present application, the conditions for model convergence are not limited in particular.

When the image processing model is trained, the supervised joint training method is adopted, so that three sub-models included in the image processing can be trained simultaneously, and the sub-models can be mutually constrained and promoted in the training process, so that the convergence efficiency of the image processing model is improved; on the other hand, the backbone network shared by the sub-models can predict the characteristics more beneficial to personnel statistics, so that the personnel statistics accuracy is improved.

Corresponding to any one of the above embodiments, the present application also provides an image processing apparatus.

Referring to fig. 9, fig. 9 is a schematic diagram of an image processing apparatus according to the present application.

As shown in fig. 9, the apparatus 900 includes:

a first determining module 910, configured to determine the number and the position of each of multiple feature points associated with a person in an acquired target image, and an offset between feature points belonging to the same person in the multiple feature points;

a second determining module 920, configured to determine, according to the offset and the corresponding positions of the multiple feature points, the number of feature point pairs belonging to the same person in the multiple feature points;

a third determining module 930, configured to determine the number of people included in the target image based on the number of the feature point pairs and the number of the plurality of feature points corresponding to each of the plurality of feature points.

In some examples shown, the plurality of feature points include at least a first feature point and a second feature point respectively representing different parts of the person; the first determining module 910 includes:

the image processing module is specifically configured to input the target image into an image processing model constructed based on a neural network, and obtain a first number and a first position corresponding to a first feature point included in the target image, a second number and a second position corresponding to a second feature point included in the target image, and a target offset between the first feature point and the second feature point belonging to the same person.

In some examples shown, the second determining module 920 includes:

and the characteristic point pair determining module is used for determining the number of characteristic point pairs formed by combining the first characteristic points and the second characteristic points belonging to the same person according to the target offset, the first position and the second position.

In some examples shown, the third determining module 930 includes:

and a number-of-persons determination module that determines the number of persons included in the target image based on the first number, the second number, and the number of the feature point pairs.

In some examples shown, the person number determination module is specifically configured to:

determining a third number of first feature points which are not combined as feature point pairs from among the first feature points included in the obtained target image, based on the first number and the number of the feature point pairs;

determining a fourth number of second feature points which are not combined as feature point pairs from among the second feature points included in the target image, based on the second number and the number of the feature point pairs;

and determining the sum of the third number, the fourth number, and the number of the feature point pairs as the number of persons included in the target image.

In some examples shown, the image processing module is specifically configured to:

extracting the features of the target image to obtain a corresponding first feature map corresponding to the first feature point;

sliding a preset sliding window on the first characteristic diagram, and determining target pixel points included in the preset sliding window after sliding and positions corresponding to the target pixel points;

after the multiple sliding operations are completed to obtain multiple target pixel points, determining the number corresponding to the multiple target pixel points as the first number of the first feature points, and determining the positions corresponding to the multiple target pixel points as the first positions of the first feature points.

In some examples shown, the characteristic point pair determining module is specifically configured to:

taking any one of a plurality of first feature points of the target image as a target first feature point, and transforming a first position corresponding to the target first feature point according to a target offset corresponding to the target first feature point to obtain a predicted position;

determining a target area based on the predicted position, and determining whether a second characteristic point exists in the target area;

when a second feature point exists in the target area, the second feature point closest to the predicted position is determined as a second feature point belonging to the same person as the target first feature point, and the counted number of the feature point pairs is updated.

In some examples shown, the image processing model includes a first predictor model, a second predictor model, and an offset predictor model;

In some examples shown, the training apparatus 1000 corresponding to the above-described method for training an image processing model includes:

an obtaining module 1010, configured to obtain a plurality of training samples including label information; the labeling information comprises a first sample characteristic point, a second sample characteristic point and an offset between the first sample characteristic point and the second sample characteristic point which belong to the same person;

a loss information determination module 1020 that determines joint learning loss information based on loss information corresponding to each of the submodels included in the image processing model;

and a joint training module 1030, configured to perform joint training on each sub-model included in the image processing model based on the joint learning loss information and the training samples until each sub-model converges.

The embodiment of the image processing apparatus shown in the present application can be applied to an electronic device. Accordingly, the present application discloses an electronic device, which may comprise: a processor.

A memory for storing processor-executable instructions.

Wherein the processor is configured to call the executable instructions stored in the memory to implement the image processing method as shown in any of the above embodiments.

Referring to fig. 10, fig. 10 is a hardware structure diagram of an electronic device shown in the present application.

As shown in fig. 10, the electronic device may include a processor for executing instructions, a network interface for making a network connection, a memory for storing operation data for the processor, and a nonvolatile memory for storing the image processing apparatus.

The embodiment of the image processing apparatus may be implemented by software, or may be implemented by hardware, or a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 10, a hardware structure diagram of an electronic device shown in this application is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 10, the electronic device where the apparatus is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.

It is to be understood that, in order to increase the processing speed, the image processing apparatus may also be stored in the memory, which is not limited herein.

One skilled in the art will recognize that one or more embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

"and/or" in this application means having at least one of the two, for example, "a and/or B" may include three schemes: A. b, and "A and B".

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this application may be implemented in the following: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this application and their structural equivalents, or combinations of one or more of them. Embodiments of the subject matter described in this application can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data can include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this application contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular disclosed embodiments. Certain features that are described in this application in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the present application to the particular embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principles of the present application should be included within the scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

determining the number and the position of various feature points which are associated with people in the obtained target image and the offset between the feature points belonging to the same person in the various feature points;

determining the number of characteristic point pairs belonging to the same person in the multiple characteristic points according to the offset and the corresponding positions of the multiple characteristic points;

and determining the number of people included in the target image based on the number of the characteristic point pairs and the number corresponding to each of the plurality of characteristic points.

2. The method of claim 1, wherein the plurality of feature points includes at least a first feature point and a second feature point that respectively characterize different parts of the person;

the determining the number and the position corresponding to each of the plurality of feature points associated with the person in the obtained target image and the offset between the feature points belonging to the same person in the plurality of feature points includes:

and inputting the target image into an image processing model constructed based on a neural network to obtain a first quantity and a first position corresponding to a first characteristic point in the target image, a second quantity and a second position corresponding to a second characteristic point in the target image, and a target offset between the first characteristic point and the second characteristic point belonging to the same person.

3. The method according to claim 2, wherein the determining the number of pairs of feature points belonging to the same person from the plurality of feature points according to the offset and the corresponding positions of the plurality of kinds of feature points comprises:

and determining the number of characteristic point pairs formed by combining the first characteristic points and the second characteristic points belonging to the same person according to the target offset, the first position and the second position.

4. The method according to claim 3, wherein the determining the number of persons included in the target image based on the number of pairs of feature points and the number of respective correspondences of the plurality of kinds of feature points includes:

determining the number of persons included in the target image based on the first number, the second number, and the number of pairs of feature points.

5. The method of claim 4, wherein the determining the number of people included in the target image based on the first number, the second number, and the number of pairs of feature points comprises:

determining a third number of the obtained first feature points which are not combined as feature point pairs in the first feature points included in the target image based on the first number and the number of the feature point pairs;

determining a fourth number corresponding to second feature points which are not combined as feature point pairs in the obtained second feature points included in the target image based on the second number and the number of the feature point pairs;

and determining the sum of the third number, the fourth number and the number of the feature point pairs as the number of people included in the target image.

6. The method according to any one of claims 2 to 5, wherein the inputting the target image into an image processing model constructed based on a neural network to obtain a first number and a first position corresponding to a first feature point included in the target image comprises:

7. The method according to claim 6, wherein the target pixel point comprises a pixel point with a maximum pixel value within the preset sliding window or a pixel point with a pixel value exceeding a first threshold.

8. The method according to any one of claims 3 to 7, wherein the determining the number of pairs of feature points, which are a combination of a first feature point and a second feature point belonging to the same person, according to the target offset amount, the first position, and the second position comprises:

taking any one of the first feature points of the target image as a target first feature point, and transforming a first position corresponding to the target first feature point according to a target offset corresponding to the target first feature point to obtain a predicted position;

determining a target area based on the predicted position, and determining whether a second feature point exists in the target area;

and when a second feature point exists in the target area, determining the second feature point closest to the predicted position as a second feature point belonging to the same person as the target first feature point, and updating the counted number of the feature point pairs.

9. The method of any of claims 2-8, wherein the first feature points or the second feature points comprise human face feature points.

10. The method of claim 9, wherein the first feature points comprise face feature points and the second feature points comprise torso feature points;

or, the first feature points comprise torso feature points, and the second feature points comprise face feature points.

11. The method of any of claims 2-10, wherein the image processing model comprises a first predictor model, a second predictor model, and an offset predictor model;

the first prediction submodel is used for predicting a first quantity and a first position corresponding to a first feature point included in the target image; the second predictor model is used for predicting a second quantity and a second position corresponding to a second feature point included in the target image; the offset prediction submodel is used for predicting a target offset between a first characteristic point and a second characteristic point which belong to the same person.

12. The method of claim 11, wherein the method of training the image processing model comprises:

acquiring a plurality of training samples comprising marking information; the labeling information comprises a first sample characteristic point, a second sample characteristic point and an offset between the first sample characteristic point and the second sample characteristic point which belong to the same person;

determining joint learning loss information based on loss information respectively corresponding to each sub-model included in the image processing model;

and carrying out joint training on each sub-model included in the image processing model based on the joint learning loss information and the training samples until each sub-model converges.

13. An image processing apparatus, characterized in that the apparatus comprises:

the first determining module is used for determining the number and the positions of various feature points which are associated with people in the obtained target image and the offset between the feature points which belong to the same person in the various feature points;

the second determining module is used for determining the number of characteristic point pairs belonging to the same person in the multiple characteristic points according to the offset and the positions corresponding to the multiple characteristic points respectively;

and a third determining module, configured to determine, based on the number of the feature point pairs and the number of the multiple feature points corresponding to each other, the number of people included in the target image.

14. The apparatus of claim 13, wherein the plurality of feature points includes at least a first feature point and a second feature point respectively characterizing different parts of the person;

the first determining module includes:

and the image processing module is used for inputting the target image into an image processing model constructed based on a neural network to obtain a first quantity and a first position corresponding to a first characteristic point included in the target image, a second quantity and a second position corresponding to a second characteristic point included in the target image, and a target offset between the first characteristic point and the second characteristic point belonging to the same person.

15. An electronic device, characterized in that the device comprises:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to invoke executable instructions stored in the memory to implement the image processing method of any of claims 1 to 12.

16. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the image processing method of any one of claims 1 to 12.