CN111275002A

CN111275002A - Image processing method and device and electronic equipment

Info

Publication number: CN111275002A
Application number: CN202010098809.8A
Authority: CN
Inventors: 王飞; 钱晨
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2020-06-12
Also published as: JP2022526347A; KR20210140758A; WO2021164395A1; JP7235892B2

Abstract

The embodiment of the invention discloses an image processing method, an image processing device and electronic equipment. The method comprises the following steps: obtaining an image to be detected; respectively determining a first detection frame representing the face of a target object and a second detection frame representing the body of the target object in the image; the number of the first detection frames is M, the number of the second detection frames is N, and M and N are non-negative integers; k first detection frames and K second detection frames which meet the matching relation in the M first detection frames and the N second detection frames are determined; k is a non-negative integer, K is less than or equal to M, and K is less than or equal to N; the number of target objects in the image is determined based on M, N and K.

Description

Image processing method and device and electronic equipment

Technical Field

The invention relates to an image analysis technology, in particular to an image processing method and device and electronic equipment.

Background

At present, the number statistics of people in a vehicle can be realized by a face detection mode. However, if the seat is blocked in the vehicle and the rotation angle of the face is large, the detection is missed, and the accuracy of counting the number of people in the vehicle is not high.

Disclosure of Invention

In order to solve the existing technical problems, embodiments of the present invention provide an image processing method and apparatus, and an electronic device.

In order to achieve the above purpose, the technical solution of the embodiment of the present invention is realized as follows:

the embodiment of the invention provides an image processing method, which comprises the following steps:

obtaining an image to be detected;

respectively determining a first detection frame representing the face of a target object and a second detection frame representing the body of the target object in the image; the number of the first detection frames is M, the number of the second detection frames is N, and M and N are non-negative integers;

k first detection frames and K second detection frames which meet the matching relation in the M first detection frames and the N second detection frames are determined; k is a non-negative integer, K is less than or equal to M, and K is less than or equal to N;

determining the number of target objects based on M, N and K.

In some optional embodiments of the invention, the determining K first detection boxes and K second detection boxes, which satisfy a matching relationship, among the M first detection boxes and the N second detection boxes includes:

traversing the M first detection frames, and determining the intersection ratio of each first detection frame and each second detection frame;

and determining the first detection frame and the second detection frame which meet the matching relation based on the intersection ratio of each first detection frame and each second detection frame.

In some optional embodiments of the present invention, the determining, based on the intersection ratio of each first detection frame and each second detection frame, the first detection frame and the second detection frame that satisfy the matching relationship includes:

determining the maximum intersection ratio in the intersection ratios of each first detection frame and each second detection frame;

judging whether the maximum intersection ratio is greater than a preset threshold value or not;

and responding to the condition that the maximum intersection ratio is larger than the preset threshold value, and determining that the first detection frame and the second detection frame corresponding to the maximum intersection ratio meet the matching relation.

In some optional embodiments of the invention, the method further comprises:

obtaining body key points of each target object in the image;

determining a position classification category corresponding to the body key point; the location classification category indicates that the location of the body keypoint is within a particular one of a plurality of particular regions in the image;

and determining the region of each target object based on the position classification category corresponding to each body key point.

In some alternative embodiments of the present invention,

under the condition that each specific area is each seat in the cabin, the position classification category corresponding to the body key point is the seat corresponding to the body key point; the determining the region of each target object based on the position classification category corresponding to each body key point comprises: determining a seat where a target object is located based on seats corresponding to body key points of the target object;

the method further comprises the following steps: and determining the state of each seat in the cabin according to the seat where each target object in the cabin is located.

In some optional embodiments of the present invention, the determining the seat where the target object is located based on the seats corresponding to the body key points of the target object includes:

counting the number of body key points corresponding to the same seat in a plurality of body key points of one target object;

and determining the seat corresponding to the maximum number of the body key points as the seat where the target object is located.

An embodiment of the present invention further provides an image processing apparatus, where the apparatus includes: the device comprises an acquisition unit, a first determination unit, a second determination unit and a matching unit; wherein the content of the first and second substances,

the acquisition unit is used for acquiring an image to be detected;

the first determination unit is used for determining a first detection frame for representing the face of a target object in the image; the number of the first detection frames is M;

the second determination unit is used for determining a second detection frame for representing the body of the target object in the image; the number of the second detection frames is N; wherein M and N are both non-negative integers;

the matching unit is used for determining K first detection frames and K second detection frames which meet the matching relation in the M first detection frames and the N second detection frames; k is a non-negative integer, K is less than or equal to M, and K is less than or equal to N; the number of target objects in the image is determined based on M, N and K.

In some optional embodiments of the present invention, the matching unit is configured to traverse the M first detection frames, and determine an intersection ratio of each first detection frame to each second detection frame; and determining the first detection frame and the second detection frame which meet the matching relation based on the intersection ratio of each first detection frame and each second detection frame.

In some optional embodiments of the present invention, the matching unit is configured to determine a maximum intersection ratio of intersection ratios of each first detection frame and each second detection frame; judging whether the maximum intersection ratio is greater than a preset threshold value or not; and responding to the condition that the maximum intersection ratio is larger than the preset threshold value, and determining that the first detection frame and the second detection frame corresponding to the maximum intersection ratio meet the matching relation.

In some optional embodiments of the invention, the apparatus further comprises a classification unit and a third determination unit; wherein the content of the first and second substances,

the second determination unit is further used for obtaining body key points of each target object in the image;

the classification unit is used for determining a position classification category corresponding to the body key point; the location classification category indicates that the location of the body keypoint is within a particular one of a plurality of particular regions in the image;

and the third determining unit is used for determining the region where each target object is located based on the position classification category corresponding to each body key point.

In some optional embodiments of the present invention, in the case that each specific area is each seat in the cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point; the third determining unit is used for determining the seat where the target object is located based on the seat corresponding to the body key point of the target object; and the state of each seat in the cabin is determined according to the seat in which each target object in the cabin is positioned.

In some optional embodiments of the invention, the third determining unit is configured to count the number of body key points corresponding to the same seat in a plurality of body key points of one target object; and determining the seat corresponding to the maximum number of the body key points as the seat where the target object is located.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image processing method according to the embodiments of the present invention.

The embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the steps of the image processing method according to the embodiment of the present invention are implemented.

The embodiment of the invention provides an image processing method, an image processing device and electronic equipment, wherein the method comprises the following steps: obtaining an image to be detected; respectively determining a first detection frame representing the face of a target object and a second detection frame representing the body of the target object in the image; the number of the first detection frames is M, the number of the second detection frames is N, and M and N are non-negative integers; k first detection frames and K second detection frames which meet the matching relation in the M first detection frames and the N second detection frames are determined; k is a non-negative integer, K is less than or equal to M, and K is less than or equal to N; the number of target objects in the image is determined based on M, N and K. By adopting the technical scheme of the embodiment of the invention, the number of the human faces in the image is detected in a human face detection mode, the number of the human bodies in the image is detected in a human body detection mode, and the number of the personnel in the image is determined in a human face and human body matching mode, so that the problem of missed detection caused by the fact that the target object is shielded or the face of the target object rotates at a large angle in the image is solved, and the accuracy of the personnel number statistics in the image is improved.

Drawings

FIG. 1 is a first flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 2 is a second flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network structure of an image processing method according to an embodiment of the present invention;

FIG. 4 is a first schematic diagram illustrating a composition structure of an image processing apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating a second exemplary embodiment of an image processing apparatus;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

The embodiment of the invention provides an image processing method. FIG. 1 is a first flowchart illustrating an image processing method according to an embodiment of the present invention; as shown in fig. 1, the method includes:

step 101: obtaining an image to be detected;

step 102: respectively determining a first detection frame representing the face of a target object and a second detection frame representing the body of the target object in the image; the number of the first detection frames is M, the number of the second detection frames is N, and M and N are non-negative integers;

step 103: k first detection frames and K second detection frames which meet the matching relation in the M first detection frames and the N second detection frames are determined; k is an integer not minus 0, K is less than or equal to M, and K is less than or equal to N;

step 104: the number of target objects in the image is determined based on M, N and K.

In this embodiment, the image processing method is applied to an image processing apparatus, and the image processing apparatus may be located in a mobile terminal such as a mobile phone, a tablet computer, and a notebook computer, or may be located in an electronic device such as a desktop computer, an all-in-one computer, and a server.

In this embodiment, the image to be detected (hereinafter, simply referred to as an image) may include a target object; the target object may be a real person, and in other embodiments, the target object may also be a virtual person, such as a cartoon character image. Of course, the target object may also be other types of objects, which is not limited in this embodiment.

In some optional embodiments, the target object is a target object in an interior environment of a vehicle. For example, if the vehicle is a five-seat vehicle and there are three people in the vehicle, the image of the interior of the vehicle is taken at the front of the vehicle, and the obtained image may include a part of the environment in the vehicle and three people sitting on the seats, and the captured image may be used as the image in this embodiment, and three people in the image may be used as the target objects in this embodiment.

In some optional embodiments of the present invention, the determining a first detection frame characterizing a face of a target object and a second detection frame characterizing the target object in the image respectively includes: extracting features of the image through a first network, and determining a first detection frame for representing the face of a target object in the image based on the extracted features; and performing feature extraction on the image through a second network, and determining a second detection frame for representing the target object in the image based on the extracted features.

In this embodiment, a face in an image may be detected through a first network, and M first detection frames in the image may be determined. The first network may adopt any network structure capable of detecting a face, which is not limited in this embodiment.

In this embodiment, the target object in the image may be detected through the second network, for example, the body in the image is detected, and the N second detection frames in the image are determined. The second network may adopt any network structure (e.g., a human body detection network) capable of detecting a target object, which is not limited in this embodiment.

In some optional embodiments, determining a second detection box in the image that characterizes the body of the target object comprises: extracting the features of the image through a second network, and determining key points of the target object based on the extracted features, namely determining the position information of the key points; and determining a second detection frame for representing the target object based on the determined key points of the target object. Wherein the position information of the key points can be represented by coordinates of the key points. The method comprises the steps of determining all key points belonging to the same target object, and determining a second detection frame of the target object based on position information of all key points belonging to the same target object, so that the area where the second detection frame is located comprises all key points of the target object, and the area where the second detection frame is located is the minimum area comprising all key points of the target object. As an example, the second detection frame may be a rectangular frame.

The key points of the target object can comprise skeleton key points and/or contour key points, the contour key points represent the contour edge of the target object, and it can be understood that the contour edge of the target object can be formed through the position information of the contour key points; the bone key points represent key points of bones of the target object, and it can be understood that main bones of the target object can be formed through position information of the bone key points. Wherein the contour keypoints may comprise at least one of: an arm contour keypoint, a hand contour keypoint, a shoulder contour keypoint, a leg contour keypoint, a foot contour keypoint, a waist contour keypoint, a head contour keypoint, a hip contour keypoint, a chest contour keypoint. The skeletal keypoints may comprise at least one of: arm bone key points, hand bone key points, shoulder bone key points, leg bone key points, foot bone key points, waist bone key points, head bone key points, hip bone key points, and chest bone key points.

In some optional embodiments, determining a second detection box in the image that characterizes the body of the target object comprises: feature extraction is performed on the image through a second network, a center point of the target object and a length and a width of a second detection frame corresponding to the target object can be determined based on the extracted features, and the second detection frame of the body of the target object is determined according to the center point, the length and the width.

In some optional embodiments of the invention, the determining K first detection boxes and K second detection boxes, which satisfy a matching relationship, among the M first detection boxes and the N second detection boxes includes: traversing the M first detection frames, and determining the intersection ratio of each first detection frame and each second detection frame; and determining the first detection frame and the second detection frame which meet the matching relation based on the intersection ratio of each first detection frame and each second detection frame.

In this embodiment, for each first detection frame, the intersection ratio of the first detection frame and each second detection frame is determined. And the intersection and union ratio represents the ratio of the intersection and the union of the area where the first detection frame is located and the area where the second detection frame is located. It is understood that the intersection ratio represents the degree of association between the corresponding first detection frame and the second detection frame, that is, the degree of association between the corresponding face and the target object. For example, if the intersection ratio is larger, it may indicate that the association degree of the corresponding first detection frame and the second detection frame is higher, that is, the association degree of the corresponding face and the target object is higher; correspondingly, if the intersection ratio is smaller, it can be indicated that the degree of association between the corresponding first detection frame and the second detection frame is lower, that is, the degree of association between the corresponding face and the target object is lower. The first detection frame and the second detection frame which satisfy the matching relationship may be determined based on the intersection ratio of each first detection frame and each second detection frame in this embodiment; the first detection frame and the second detection frame which meet the matching relation belong to the same target object.

In some optional embodiments of the present invention, the determining, based on the intersection ratio of each first detection frame and each second detection frame, the first detection frame and the second detection frame that satisfy the matching relationship includes: determining the maximum intersection ratio in the intersection ratios of each first detection frame and each second detection frame; judging whether the maximum intersection ratio is greater than a preset threshold value or not; and responding to the condition that the maximum intersection ratio is larger than the preset threshold value, and determining that the first detection frame and the second detection frame corresponding to the maximum intersection ratio meet the matching relation.

In this embodiment, for the intersection ratio between each first detection frame and each second detection frame, the second detection frame corresponding to the maximum intersection ratio is determined; when the maximum intersection ratio is larger than the preset threshold value, it can be determined that the first detection frame and the second detection frame corresponding to the maximum intersection ratio satisfy the matching relationship, that is, it can be determined that the first detection frame and the second detection frame corresponding to the maximum intersection ratio belong to the same target object.

In step 104 of this embodiment, the number of the target objects may satisfy: k + (M-K) + (N-K).

In this embodiment, the number of the first detection frames and the number of the second detection frames which satisfy the matching relationship are K, the number of the first detection frames which do not satisfy the matching relationship is M-K, and the number of the second detection frames which do not satisfy the matching relationship is N-K. In this embodiment, the first detection frame and the second detection frame due to the above reasons are counted.

By adopting the technical scheme of the embodiment of the invention, the number of the human faces in the image is detected in a human face detection mode, the number of the human bodies in the image is detected in a human body detection mode, and the number of the personnel in the image is determined in a human face and human body matching mode, so that the problem of missed detection caused by the fact that the target object is shielded or the face of the target object rotates at a large angle in the image is solved, and the accuracy of the personnel number statistics in the image is improved.

Based on the foregoing embodiment, the embodiment of the present invention further provides an image processing method. FIG. 2 is a second flowchart illustrating an image processing method according to an embodiment of the present invention; as shown in fig. 2, the method includes:

step 201: obtaining an image to be detected;

step 202: obtaining body key points of each target object in the image;

step 203: determining a position classification category corresponding to the body key point; the location classification category indicates that the location of the body keypoint is within a particular one of a plurality of particular regions in the image;

step 204: and determining the region of each target object based on the position classification category corresponding to each body key point.

In some optional embodiments of the invention, the obtaining of the body key points of the respective target objects in the image comprises: and performing feature extraction on the image through a second network, and determining a second detection frame for representing the body of the target object and body key points of each target object in the image based on the extracted features.

In this embodiment, the body key points of each target object in the image may be obtained based on the second network in the foregoing embodiments; it is understood that the image is subjected to feature extraction through the second network, and based on the extracted features, on one hand, key points of the body of each target object can be obtained, and on the other hand, a second detection frame for characterizing the body of the target object can be determined based on the extracted features. It can be understood that the image is input into the second network, the body key points of each target object can be output, and meanwhile, the second detection frame of each target object body can also be output; or, the image is input into the second network, the position information of the body key point of each target object can be output, and the position information of the second detection frame of the body of each target object can also be output.

In some optional embodiments, feature extraction is performed on the image through a second network, based on the extracted features, on one hand, a body key point of each target object can be obtained, and on the other hand, a central point of the target object and a length and a width of a second detection frame corresponding to the target object can be determined based on the extracted features, and the second detection frame of the body of each target object is determined according to the central point, the length and the width; and further determining body key points in the area of the second detection frame of each target object according to the area of the second detection frame of each target object and each body key point, and determining the body key points in the area of each second detection frame as the body key points of the target object corresponding to each second detection frame.

In some optional embodiments of the invention, the determining the location classification category corresponding to the body key point comprises: determining a position classification category corresponding to the body key point through a third network; the third network is obtained based on sample image training containing position information of body key points and labeling information of specific areas.

In this embodiment, the location classification category corresponding to each body key point may be determined by the third network. It will be appreciated that the third network may be any classification network, the determined location classification category indicating that the location of the keypoint is within a particular one of a plurality of particular regions in the image.

In this embodiment, one or more specific regions may be included in the image, and the specific regions are related to the classification task of the third network.

For example, if the target object is a target object in an internal environment of the vehicle, and the classification task of the third network is used to determine whether the body key point of each target object is in a seat area in the vehicle, the specific area may be a seat in the vehicle. For example, if the vehicle is a five-seat vehicle, the third network may determine the location of each body key point in which region the seat is located, so that the state of each seat may be determined.

In some optional embodiments of the present invention, in the case that each specific area is each seat in the cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point; the determining the region of each target object based on the position classification category corresponding to each body key point comprises: determining a seat where a target object is located based on seats corresponding to body key points of the target object; the method further comprises the following steps: and determining the state of each seat in the cabin according to the seat where each target object in the cabin is located.

Wherein the state of the seat may comprise an idle state or a non-idle state; the idle state may indicate that the corresponding seat has no target object, i.e. the seat is not occupied; accordingly, the non-idle state indicates that the corresponding seat has the target object, i.e., the seat is occupied.

In this embodiment, based on the location classification categories corresponding to the body key points of each target object, the specific areas corresponding to the body key points of each target object are determined, that is, the seats corresponding to the body key points of each target object are determined; if the body key point of one target object corresponds to one seat, the state of the seat can be shown to be a non-idle state, namely the seat is occupied; if a seat does not correspond to a body key point of any target object, the state of the seat can be indicated as an idle state, namely the seat is not occupied.

In some optional embodiments of the present invention, the determining the seat where the target object is located based on the seats corresponding to the body key points of the target object includes: counting the number of key points corresponding to the same seat in a plurality of body key points of one target object; and determining the seat corresponding to the maximum key point number as the seat where the target object is located.

In practical applications, the number of the body key points of the target object may be multiple, and the target object is likely not strictly in a specific area (i.e. a seat), then in some alternative embodiments, the position classification categories of all the body key points belonging to the same target object are determined, and the number of the body key points belonging to the same position classification category is counted, that is, the number of the body key points corresponding to the same seat is counted; determining the number of the maximum body key points, and taking the specific area corresponding to the position classification category corresponding to the number of the maximum body key points as the area corresponding to the target object, namely taking the seat corresponding to the number of the maximum body key points as the seat where the target object is located; correspondingly, the state of the seat corresponding to the maximum number of body key points is determined as a non-idle state, i.e. the seat is occupied.

Compared with the traditional technology, namely compared with the implementation mode that whether the seat is occupied is judged by arranging the sensor on the seat, the technical scheme of the embodiment does not increase the arrangement of the sensor, thereby reducing the detection cost; and if judge whether the seat is occupied through the mode that sets up pressure sensor, if place the object on the seat then can be considered as the seat and be occupied, the above-mentioned condition of misjudgement can be avoided to the technical scheme of this embodiment and the rate of accuracy that detects has been promoted greatly.

The following describes an image processing method according to an embodiment of the present invention with reference to a specific example.

FIG. 3 is a schematic diagram of a network structure of an image processing method according to an embodiment of the present invention; as shown in fig. 3, images are input into the first network and the second network, respectively; the first network may be a face detection network, and the second network may be a human body detection network.

In this embodiment, the image may be an image of an internal environment of the vehicle, and taking the vehicle as a five-seat vehicle as an example, the specific region in the image may include five specific regions, that is, regions in which five seats are located in the image, for example, the specific region may include a main driving seat region, a sub-driving seat region, a rear left region, a rear right region, and labels of the specific regions may be defined as 0, 1, 2, 3, and 4, respectively, according to the above sequence. It is assumed that the main driver seat area, the sub-driver seat area, and the rear left area in the image are all occupied.

In a first aspect, a first network is used to extract features of an image, face detection frames (i.e., first detection frames in the above-described embodiment) in the image are obtained based on the extracted features, a second network is used to extract features of the image, and human detection frames (i.e., second detection frames in the above-described embodiment) in the image are obtained based on the extracted features, where it is assumed that the number of extracted face detection frames is M and the number of human detection frames is N, and since a shielding or a rotation angle is large, the number of M is equal to or less than 3, and the number of N is equal to or less than 3, that is, a case where 3 human detection frames and/or 3 human detection frames are not detected occurs. Respectively calculating the intersection ratio of each face detection frame and each human body detection frame; determining a human body detection frame with the maximum intersection ratio aiming at each human face detection frame, and judging whether the maximum intersection ratio is greater than a preset threshold value or not; if the maximum intersection ratio is larger than the preset threshold value, the human face detection frame is judged to be matched with the human body detection frame with the maximum intersection ratio, namely the human face detection frame and the human body detection frame with the maximum intersection ratio belong to the same person, and the number K of the human face detection frames and the number K of the matched human body detection frames are further determined. The number of the persons in the vehicle is determined based on the M, N and K, and the result of K + (M-K) + (N-K) is taken as the number of the persons in the vehicle.

In a second aspect, feature extraction is performed on the image through a second network, and based on the extracted features, on one hand, a human body detection frame in the image is obtained, and on the other hand, body key point information in the image can be obtained, and the body key point information can include coordinates of each body key point.

For example, feature extraction may be performed on the image through a second network, based on the extracted features, on one hand, a body key point of each target object may be obtained, and on the other hand, a center point of the target object and a length and a width of a human body detection frame corresponding to the target object may be determined based on the extracted features, and the human body detection frame of each target object may be determined according to the center point, the length and the width; and further determining body key points in the region of the human body detection frame of each target object according to the region of the human body detection frame of each target object and each body key point, and determining the body key points in the region of each human body detection frame as the body key points of the target object corresponding to each human body detection frame.

For example, assuming that the image is represented by 3 × I, where 3 represents the number of channels, in this example, the image may be an RGB color image, and of the 3 channels, one channel data is channel data representing Red (Red), one channel data is channel data representing Green (Green), and one channel data is channel data representing Blue (Blue); i × I represents the size of the image. Then the image is subjected to feature extraction through a second network to obtain a C multiplied by F feature map, wherein C represents the number of channels, and F multiplied by F represents the size of the feature map. And performing convolution processing on the feature map through convolution layers with specific sizes (for example, convolution layers of 1 × 1) to obtain a feature map of H × F, where H represents the number of channels, and each channel can determine one key point, so that H key points can be obtained. The H key points are determined by identifying Gaussian peaks in the H multiplied by F characteristic diagram, and taking the vertex coordinates of the Gaussian peaks as the coordinates of the key points.

Further, the obtained body key points are input into a third network, so that the position classification category of each body key point is determined. For example, taking the five specific areas as an example, the present embodiment determines the tags of the specific areas corresponding to the key points through the third network.

The embodiment of the invention also provides an image processing device. FIG. 4 is a first schematic diagram illustrating a composition structure of an image processing apparatus according to an embodiment of the present invention; as shown in fig. 4, the apparatus includes an acquisition unit 31, a first determination unit 32, a second determination unit 33, and a matching unit 34; wherein the content of the first and second substances,

the acquiring unit 31 is used for acquiring an image to be detected;

the first determining unit 32 is configured to determine a first detection frame in the image, which characterizes a face of a target object; the number of the first detection frames is M;

the second determining unit 33 is configured to determine a second detection frame representing a body of the target object in the image; the number of the second detection frames is N; wherein M and N are both non-negative integers;

the matching unit 34 is configured to determine K first detection frames and K second detection frames that satisfy a matching relationship among the M first detection frames and the N second detection frames; k is a non-negative integer, K is less than or equal to M, and K is less than or equal to N; the number of target objects in the image is determined based on M, N and K.

In some optional embodiments of the present invention, the matching unit 34 is configured to traverse the M first detection frames, and determine an intersection ratio of each first detection frame to each second detection frame; and determining the first detection frame and the second detection frame which meet the matching relation based on the intersection ratio of each first detection frame and each second detection frame.

In some optional embodiments of the present invention, the matching unit 34 is configured to determine a maximum intersection ratio of the intersection ratios of each first detection frame and each second detection frame; judging whether the maximum intersection ratio is greater than a preset threshold value or not; and responding to the condition that the maximum intersection ratio is larger than the preset threshold value, and determining that the first detection frame and the second detection frame corresponding to the maximum intersection ratio meet the matching relation.

In some optional embodiments of the present invention, the first determining unit 32 is configured to perform feature extraction on the image through a first network, and determine a first detection box in the image, which characterizes a face of the target object, based on the extracted features;

the second determining unit 33 is configured to perform feature extraction on the image through a second network, and determine a second detection frame in the image, which characterizes the body of the target object, based on the extracted features.

In some alternative embodiments of the present invention, as shown in fig. 5, the apparatus further comprises a classification unit 35 and a third determination unit 36; wherein the content of the first and second substances,

the second determining unit 33 is further configured to obtain body key points of each target object in the image;

the classification unit 35 is configured to determine a location classification category corresponding to the body key point; the location classification category indicates that the location of the body keypoint is within a particular one of a plurality of particular regions in the image;

the third determining unit 36 is configured to determine, based on the location classification categories corresponding to the body key points, the regions where the target objects are located.

In some optional embodiments of the present invention, the second determining unit 33 is configured to perform feature extraction on the image through a second network, and determine, based on the extracted features, a second detection frame in the image, which characterizes the body of the target object, and body key points of each target object.

In some optional embodiments of the present invention, the classifying unit 35 is configured to determine, through a third network, a location classification category corresponding to the key point; the third network is obtained based on sample image training containing position information of body key points and labeling information of specific areas.

In some optional embodiments of the present invention, in the case that each specific area is each seat in the cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point; the third determining unit 36 is configured to determine a seat where a target object is located based on a seat corresponding to a body key point of the target object; and the state of each seat in the cabin is determined according to the seat in which each target object in the cabin is positioned.

In some optional embodiments of the present invention, the third determining unit 36 is configured to count the number of body key points corresponding to the same seat in a plurality of body key points of one target object; and determining the seat corresponding to the maximum number of the body key points as the seat where the target object is located.

In the embodiment of the present invention, the obtaining Unit 31, the first determining Unit 32, the second determining Unit 33, the matching Unit 34, the classifying Unit 35, and the third determining Unit 36 in the image Processing apparatus may be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA) in the apparatus in practical application.

It should be noted that: the image processing apparatus provided in the above embodiment is exemplified by the division of each program module when performing image processing, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

The embodiment of the invention also provides the electronic equipment. Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device 40 includes a memory 42, a processor 41, and a computer program stored in the memory 42 and executable on the processor 41, and when the processor 41 executes the computer program, the steps of the image processing method according to the embodiment of the present invention are implemented.

Optionally, various components within electronic device 40 may be coupled together by a bus system 43. It will be appreciated that the bus system 43 is used to enable communications among the components. The bus system 43 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 43 in fig. 6.

It will be appreciated that the memory 42 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 42 described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiments of the present invention may be applied to the processor 41, or implemented by the processor 41. The processor 41 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 41. The Processor 41 described above may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 41 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in memory 42, where processor 41 reads the information in memory 42 and in combination with its hardware performs the steps of the method described above.

In an exemplary embodiment, the electronic Device 40 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.

In an exemplary embodiment, the present invention further provides a computer readable storage medium, such as the memory 42 comprising a computer program, which is executable by the processor 41 of the electronic device 40 to perform the steps of the aforementioned method. The computer readable storage medium can be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM; or may be a variety of devices including one or any combination of the above memories, such as a mobile phone, computer, tablet device, personal digital assistant, etc.

The computer readable storage medium provided by the embodiment of the present invention stores thereon a computer program, which when executed by a processor implements the steps of the image processing method according to the embodiment of the present invention.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

obtaining an image to be detected;

the number of target objects in the image is determined based on M, N and K.

2. The method according to claim 1, wherein the determining K first detection boxes and K second detection boxes satisfying a matching relationship among the M first detection boxes and the N second detection boxes comprises:

3. The method according to claim 2, wherein the determining the first detection frame and the second detection frame satisfying the matching relationship based on the intersection ratio of each first detection frame and each second detection frame comprises:

4. The method according to any one of claims 1 to 3, further comprising:

obtaining body key points of each target object in the image;

5. The method according to claim 4, wherein in the case that each specific area is each seat in the cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point; the determining the region of each target object based on the position classification category corresponding to each body key point comprises:

determining a seat where a target object is located based on seats corresponding to body key points of the target object;

the method further comprises the following steps:

and determining the state of each seat in the cabin according to the seat where each target object in the cabin is located.

6. The method of claim 5, wherein determining the seat of the target object based on the corresponding seats of the body key points of the target object comprises:

7. An image processing apparatus, characterized in that the apparatus comprises: the device comprises an acquisition unit, a first determination unit, a second determination unit and a matching unit; wherein the content of the first and second substances,

the acquisition unit is used for acquiring an image to be detected;

8. The apparatus according to claim 7, wherein the matching unit is configured to traverse the M first detection frames, and determine an intersection ratio of each first detection frame to each second detection frame; and determining the first detection frame and the second detection frame which meet the matching relation based on the intersection ratio of each first detection frame and each second detection frame.

9. The apparatus according to claim 8, wherein the matching unit is configured to determine a maximum intersection ratio of the intersection ratios of each first detection frame and each second detection frame; judging whether the maximum intersection ratio is greater than a preset threshold value or not; and responding to the condition that the maximum intersection ratio is larger than the preset threshold value, and determining that the first detection frame and the second detection frame corresponding to the maximum intersection ratio meet the matching relation.

10. The apparatus according to any one of claims 7 to 9, characterized in that the apparatus further comprises a classification unit and a third determination unit; wherein the content of the first and second substances,

11. The apparatus according to claim 10, wherein in the case that each specific area is each seat in the cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point;

the third determining unit is used for determining the seat where the target object is located based on the seat corresponding to the body key point of the target object; and the state of each seat in the cabin is determined according to the seat in which each target object in the cabin is positioned.

12. The apparatus according to claim 11, wherein the third determining unit is configured to count the number of body key points corresponding to the same seat in a plurality of body key points of one target object; and determining the seat corresponding to the maximum number of the body key points as the seat where the target object is located.

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1 to 6 are implemented when the program is executed by the processor.