CN109685041B

CN109685041B - Image analysis method and device, electronic equipment and storage medium

Info

Publication number: CN109685041B
Application number: CN201910063728.1A
Authority: CN
Inventors: 冯伟; 刘文韬; 李通; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2020-05-15
Anticipated expiration: 2039-01-23
Also published as: CN109685041A

Abstract

The present disclosure relates to an image analysis method and apparatus, an electronic device, and a storage medium, the method including: extracting features of an image to be analyzed to obtain feature information of a target object in the image to be analyzed, wherein the feature information comprises behavior features and posture features; and performing relationship identification on the target object according to the characteristic information to obtain a relationship identification result of the target object, wherein the relationship identification result comprises at least one of behavior information and position information of an object related to the behavior. According to the embodiment of the invention, the relationship recognition is carried out on the target object through the behavior characteristic and the posture characteristic of the target object in the image, so that the accuracy of the relationship recognition is improved.

Description

Image analysis method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image analysis method and apparatus, an electronic device, and a storage medium.

Background

In the fields of image understanding, human-computer interaction and the like, human-object relationship recognition and human body posture estimation are widely applied. However, the traditional human-object relationship identification method only depends on the appearance characteristics of the human, and the identification result is easily influenced by the change of the appearance characteristics, and the traditional human posture estimation method also usually predicts each human key point independently, ignores the position relationship among the key points, and is easily influenced by factors such as shielding and false detection.

Disclosure of Invention

The present disclosure provides a technical solution for image analysis.

According to a first aspect of the present disclosure, there is provided an image analysis method, the method comprising: extracting features of an image to be analyzed to obtain feature information of a target object in the image to be analyzed, wherein the feature information comprises behavior features and posture features; and performing relationship identification on the target object according to the characteristic information to obtain a relationship identification result of the target object, wherein the relationship identification result comprises at least one of behavior information and position information of an object related to the behavior.

In one possible implementation, the method further includes: and performing attitude estimation on the target object according to the relationship recognition result and the attitude characteristics to obtain an attitude estimation result of the target object, wherein the attitude estimation result comprises attitude information of the target object.

In a possible implementation manner, the relationship recognition result includes N-level relationship recognition results, the posture estimation result includes N-level posture estimation results, N is an integer greater than 1, where performing relationship recognition on the target object according to the feature information to obtain the relationship recognition result of the target object includes: according to the characteristic information, carrying out relationship identification on the target object to obtain a first-level relationship identification result; and under the condition that N is equal to 2, performing relation recognition on the target object according to the relation recognition result of the first stage and the posture estimation result of the first stage to obtain a relation recognition result of the second stage.

In a possible implementation manner, performing relationship identification on the target object according to the feature information to obtain a relationship identification result of the target object, further includes: under the condition that N is larger than 2, carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the attitude estimation result of the (N-1) th level to obtain a relation recognition result of the (N) th level, wherein N is an integer and is more than 1 and less than N and less than N; and carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the posture estimation result of the (N-1) th level to obtain a relation recognition result of the (N) th level.

In a possible implementation manner, performing attitude estimation on the target object according to the relationship recognition result and the attitude feature to obtain an attitude estimation result of the target object, includes: carrying out attitude estimation on the target object according to the first-stage relationship recognition result and the attitude characteristics to obtain a first-stage attitude estimation result; and under the condition that N is equal to 2, carrying out attitude estimation on the target object according to the second-stage relationship recognition result and the first-stage attitude estimation result to obtain a second-stage attitude estimation result.

In a possible implementation manner, performing attitude estimation on the target object according to the relationship recognition result and the attitude feature to obtain an attitude estimation result of the target object, further includes: under the condition that N is larger than 2, carrying out attitude estimation on the target object according to the relationship recognition result of the nth level and the attitude estimation result of the (N-1) th level to obtain the attitude estimation result of the nth level, wherein N is an integer and is more than 1 and less than N; and carrying out attitude estimation on the target object according to the Nth-level relation recognition result and the N-1 th-level attitude estimation result to obtain the Nth-level attitude estimation result.

In a possible implementation manner, performing relationship recognition on the target object according to the relationship recognition result of the (n-1) th level and the posture estimation result of the (n-1) th level to obtain a relationship recognition result of the (n) th level, including: carrying out full connection processing on the n-1 level relation recognition result and the n-1 level attitude estimation result to obtain the n level connection characteristic; and performing behavior recognition processing on the connection characteristics of the nth level to obtain behavior information of the nth level.

In a possible implementation manner, the performing relationship recognition on the target object according to the relationship recognition result of the (n-1) th level and the posture estimation result of the (n-1) th level to obtain a relationship recognition result of the (n) th level further includes: and performing relation identification processing on the connection characteristics of the nth level according to the behavior information of the nth level to obtain position information of the nth level.

In a possible implementation manner, the nth-level relationship recognition result further includes an nth-level intermediate relationship feature, where the relationship recognition is performed on the target object according to the nth-1-level relationship recognition result and the nth-1-level posture estimation result to obtain the nth-level relationship recognition result, and the method further includes: and performing full connection and convolution processing on the n-1 level relationship identification result and the n-1 level attitude estimation result to obtain the n level intermediate relationship characteristic.

In a possible implementation manner, performing attitude estimation on the target object according to the relationship recognition result of the nth level and the attitude estimation result of the (n-1) th level to obtain an attitude estimation result of the nth level, including: performing convolution and activation processing on the relationship identification result of the nth level based on an attention mechanism to obtain an attention diagram of the nth level; performing point multiplication on the attention diagram of the nth level and the attitude estimation result of the (n-1) th level to obtain the input characteristic of the nth level; and carrying out attitude estimation on the input features of the nth level to obtain attitude information of the nth level.

In a possible implementation manner, the n-th-level posture estimation result further includes an n-th-level intermediate posture feature, wherein the posture estimation is performed on the target object according to the n-th-level relationship recognition result and the n-1-th-level posture estimation result to obtain an n-th-level posture estimation result, and the method further includes: and performing convolution processing on the input feature of the nth level to obtain the intermediate attitude feature of the nth level.

In a possible implementation manner, the feature information further includes an appearance feature, where performing relationship identification on the target object according to the feature information to obtain a relationship identification result of the target object includes: and performing relationship recognition on the target object according to the appearance characteristic, the behavior characteristic and the posture characteristic to obtain a relationship recognition result of the target object.

In a possible implementation manner, performing attitude estimation on the target object according to the relationship recognition result and the attitude feature to obtain an attitude estimation result of the target object, includes: and performing attitude estimation on the target object according to the appearance characteristics, the relationship identification result and the attitude characteristics to obtain an attitude estimation result of the target object.

In a possible implementation manner, the method is implemented by a neural network, and the neural network includes a relationship recognition network and an attitude estimation network, wherein the relationship recognition network is used for performing relationship recognition on the feature information, and the attitude estimation network is used for performing attitude estimation on the relationship recognition result and the attitude feature.

In one possible implementation manner, the method is implemented by a neural network, and the neural network comprises an N-level relation recognition network and an N-level posture estimation network, wherein the N-th level relation recognition network is used for carrying out relation recognition on the N-1-th level relation recognition result and the N-1-th level posture estimation result, and the N-th level posture estimation network is used for carrying out posture estimation on the N-th level relation recognition result and the N-1-th level posture estimation result.

In one possible implementation, the method is implemented by a neural network, which includes a feature extraction network for performing feature extraction on an image to be analyzed.

In one possible implementation, the method further includes: and training the neural network according to a preset training set.

In one possible implementation, the behavior information includes a confidence level of the current behavior of the target object.

According to a second aspect of the present disclosure, there is provided an image analysis apparatus comprising: the characteristic extraction module is used for extracting characteristics of an image to be analyzed to obtain characteristic information of a target object in the image to be analyzed, wherein the characteristic information comprises behavior characteristics and posture characteristics; and the relationship identification module is used for carrying out relationship identification on the target object according to the characteristic information to obtain a relationship identification result of the target object, wherein the relationship identification result comprises at least one of behavior information and position information of an object related to the behavior.

In one possible implementation, the apparatus further includes: and the first attitude estimation module is used for carrying out attitude estimation on the target object according to the relationship recognition result and the attitude characteristics to obtain an attitude estimation result of the target object, wherein the attitude estimation result comprises attitude information of the target object.

In one possible implementation manner, the relationship recognition result includes N-level relationship recognition results, the posture estimation result includes N-level posture estimation results, N is an integer greater than 1, and the relationship recognition module includes: the first relation identification submodule is used for carrying out relation identification on the target object according to the characteristic information to obtain a first-level relation identification result; and the second relation identification submodule is used for carrying out relation identification on the target object according to the relation identification result of the first level and the posture estimation result of the first level under the condition that the N is equal to 2 so as to obtain a relation identification result of the second level.

In a possible implementation manner, the relationship identifying module further includes: the third relation identification submodule is used for carrying out relation identification on the target object according to the relation identification result of the (N-1) th level and the attitude estimation result of the (N-1) th level under the condition that N is larger than 2 to obtain the relation identification result of the nth level, wherein N is an integer and is more than 1 and less than N and less than N; and the fourth relation recognition submodule is used for carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the posture estimation result of the (N-1) th level to obtain the relation recognition result of the (N) th level.

In one possible implementation, the first posture estimation module includes: the first attitude estimation submodule is used for carrying out attitude estimation on the target object according to a first-stage relationship recognition result and the attitude characteristics to obtain a first-stage attitude estimation result; and the second attitude estimation submodule is used for carrying out attitude estimation on the target object according to the second-stage relationship recognition result and the first-stage attitude estimation result under the condition that the N is equal to 2, so as to obtain a second-stage attitude estimation result.

In one possible implementation manner, the first posture estimation module further includes: a third attitude estimation submodule, configured to perform attitude estimation on the target object according to an nth-level relationship recognition result and an nth-1-level attitude estimation result when N is greater than 2, to obtain an nth-level attitude estimation result, where N is an integer and 1< N; and the fourth attitude estimation submodule is used for carrying out attitude estimation on the target object according to the Nth-level relation recognition result and the Nth-1-level attitude estimation result to obtain the Nth-level attitude estimation result.

In one possible implementation, the third relationship identification submodule is configured to: carrying out full connection processing on the n-1 level relation recognition result and the n-1 level attitude estimation result to obtain the n level connection characteristic; and performing behavior recognition processing on the connection characteristics of the nth level to obtain behavior information of the nth level.

In one possible implementation, the third relationship identification submodule is further configured to: and performing relation identification processing on the connection characteristics of the nth level according to the behavior information of the nth level to obtain position information of the nth level.

In a possible implementation manner, the relationship identification result of the nth level further includes an intermediate relationship feature of the nth level, wherein the third relationship identification submodule is further configured to: and performing full connection and convolution processing on the n-1 level relationship identification result and the n-1 level attitude estimation result to obtain the n level intermediate relationship characteristic.

In one possible implementation, the third pose estimation sub-module is configured to: performing convolution and activation processing on the relationship identification result of the nth level based on an attention mechanism to obtain an attention diagram of the nth level; performing point multiplication on the attention diagram of the nth level and the attitude estimation result of the (n-1) th level to obtain the input characteristic of the nth level; and carrying out attitude estimation on the input features of the nth level to obtain attitude information of the nth level.

In one possible implementation, the posture estimation result of the nth stage further includes an intermediate posture feature of the nth stage, where the third posture estimation submodule is further configured to: and performing convolution processing on the input feature of the nth level to obtain the intermediate attitude feature of the nth level.

In one possible implementation, the characteristic information further includes a look characteristic.

In one possible implementation, the apparatus further includes: and the second attitude estimation module is used for carrying out attitude estimation on the target object according to the appearance characteristics, the relationship recognition result and the attitude characteristics to obtain an attitude estimation result of the target object.

In one possible implementation manner, the apparatus includes a neural network, and the neural network includes a relationship recognition network and an attitude estimation network, wherein the relationship recognition network is configured to perform relationship recognition on the feature information, and the attitude estimation network is configured to perform attitude estimation on the relationship recognition result and the attitude feature.

In one possible implementation manner, the device comprises a neural network, wherein the neural network comprises an N-level relation recognition network and an N-level posture estimation network, the N-level relation recognition network is used for carrying out relation recognition on the N-1-level relation recognition result and the N-1-level posture estimation result, and the N-level posture estimation network is used for carrying out posture estimation on the N-level relation recognition result and the N-1-level posture estimation result.

In one possible implementation, the apparatus includes a neural network including a feature extraction network, where the feature extraction network is configured to perform feature extraction on an image to be analyzed.

In one possible implementation, the apparatus further includes: and the training module is used for training the neural network according to a preset training set.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: the above-described image analysis method is performed.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described image analysis method.

In the embodiment of the disclosure, the relationship recognition is performed on the target object through the behavior feature and the posture feature of the target object in the image, so as to obtain the behavior score of the target object and the position information of the object related to the behavior, thereby improving the accuracy of the relationship recognition.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of an image analysis method according to an embodiment of the present disclosure.

Fig. 2 shows a flow diagram of an image analysis method according to an embodiment of the present disclosure.

Fig. 3 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure.

Fig. 4 shows a schematic structural diagram of a relationship recognition network according to an embodiment of the present disclosure.

Fig. 5 shows a schematic structural diagram of an attitude estimation network according to an embodiment of the present disclosure.

Fig. 6 illustrates a block diagram of an image analysis apparatus according to an embodiment of the present disclosure.

Fig. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

FIG. 8 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of an image analysis method according to an embodiment of the present disclosure, as shown in fig. 1, the image analysis method including:

in step S11, performing feature extraction on an image to be analyzed, and acquiring feature information of a target object in the image to be analyzed, where the feature information includes a behavior feature and an attitude feature;

in step S12, performing relationship recognition on the target object according to the feature information to obtain a relationship recognition result of the target object, where the relationship recognition result includes at least one of behavior information and position information of an object related to the behavior.

According to the embodiment of the disclosure, the target object is subjected to relationship recognition through the behavior feature and the posture feature of the target object in the image, so that the behavior score of the target object and the position information of the object related to the behavior are obtained, and the accuracy of relationship recognition is improved.

In one possible implementation, the image analysis method may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.

In a possible implementation manner, the image to be analyzed may be an image pre-stored in a terminal or a server or an image downloaded from a network, or may be an image captured by a shooting component. For example, during human-computer interaction or interactive entertainment, images can be captured by a camera (e.g., a video camera). There may be one or more target objects (persons or objects) in the image to be analyzed and one or more objects interacting with the target objects. The present disclosure does not limit the type and acquisition manner of the image to be analyzed. In one possible implementation, the target object may be a human or animal object, and the feature information may include a behavior feature (preliminary behavior feature) and a posture feature (preliminary posture feature). The behavioral and pose characteristics may, for example, include, among other things, the locations of a plurality of human keypoints of the target object.

In a possible implementation manner, in step S11, feature extraction may be performed on the image to be analyzed, so as to obtain feature information of the target object in the image to be analyzed. The present disclosure does not limit the manner of feature extraction.

In a possible implementation manner, when the image analysis method according to the embodiment of the present disclosure is implemented by a neural network, feature extraction may be performed on an image to be analyzed by a feature extraction network. It should be understood that the feature extraction network may be, for example, a convolutional neural network, a deep neural network, etc., and the specific type of feature extraction network is not limited by this disclosure.

For example, the feature extraction network may include, for example, a plurality of convolutional layers, a fully-connected layer, and the like. The image to be analyzed can be input into the feature extraction network, a feature map of the image to be analyzed is extracted, and one or more human body regions are determined, wherein each human body region can contain a target object (human body); and then, carrying out different processing on the characteristic diagram of the human body region to respectively obtain the behavior characteristic (obtained by convolution and full-connection processing for example) and the posture characteristic (obtained by convolution processing for example) of the target object.

In a possible implementation manner, based on the feature information of the target object, in step S12, the relationship identification of the target object may be performed, so as to obtain a relationship identification result of the target object. The relationship recognition result includes at least one of behavior information and position information of an object related to the behavior.

In one possible implementation, the behavior information may include a confidence level of the current behavior of the target object. For example, the confidence (e.g., 60%) that the person currently behaves in the image to be analyzed as "playing football". It should be understood that the behavior information may also include information such as a probability or a score that the target object performs a certain behavior currently, and the specific content of the behavior information is not limited by the present disclosure.

In one possible implementation, the position information of the object related to the behavior may include an image position (area) of the object in the image to be analyzed, which is associated with the current behavior of the target object. For example, in the case where the current behavior of the person in the image to be analyzed is "playing football", the object related to the behavior is "football", and the position information may be determined according to the position of "football" in the image to be analyzed.

In a possible implementation manner, when the image analysis method according to the embodiment of the present disclosure is implemented by a neural network, the relationship recognition may be performed on the target object by a relationship recognition network. The relationship recognition network may be, for example, a convolutional neural network CNN, and the specific type of relationship recognition network is not limited by this disclosure.

For example, the feature information (behavior feature and posture feature) may be input into the relationship recognition network for behavior recognition, so as to obtain behavior information (e.g., confidence) of each behavior action of the target object; according to the behavior information and the position of the human body region, the position information of the object related to the behavior can be obtained. In the image to be analyzed, each behavior of the target object has a related object corresponding to it, and therefore, the relationship recognition result may include a plurality of sets of behavior information and position information of the objects related to the behavior.

Fig. 2 shows a flow diagram of an image analysis method according to an embodiment of the present disclosure. As shown in fig. 2, the method may further include:

in step S13, performing attitude estimation on the target object according to the relationship recognition result and the attitude feature to obtain an attitude estimation result of the target object, where the attitude estimation result includes attitude information of the target object. That is, the relationship recognition result and the posture feature may be simultaneously used as input to perform posture estimation, so as to obtain a posture estimation result of the target object.

In one possible implementation, when the image analysis method according to the embodiment of the present disclosure is implemented by a neural network, the target object may be subjected to pose estimation by a pose estimation network. The pose estimation network may be, for example, a convolutional neural network CNN or the like, and the specific type of pose estimation network is not limited by this disclosure.

For example, the posture estimation network may process (may be, for example, convolved and activated) the input relationship recognition result (behavior information and/or position information of an object related to the behavior) and the posture feature to obtain a posture estimation result, where the posture estimation result may include at least posture information of the target object (e.g., the position of a key point of a human body).

By the mode, the relation recognition result is used as the input of the posture estimation, so that the position relation between key points of the human body is more definite, and the accuracy of the posture estimation is improved.

In one possible implementation, the relationship recognition and the pose estimation may be performed on the image to be analyzed multiple times in a turbo learning manner (turbo learning architecture) by combining the relationship recognition and the pose estimation. The result of the pose estimation can be used as an input for relationship recognition; and then taking the result of the relation recognition as the input of attitude estimation, and carrying out multiple iterations. The process is similar to a turbocharger of an engine, the output is fed back to the input, the efficiency of the engine is improved by utilizing the waste gas, the results of the two tasks can be gradually improved in the feedback process, and meanwhile, the accuracy of relation recognition and attitude estimation is improved.

In one possible implementation, the relationship recognition result includes N-level relationship recognition results, and the pose estimation result includes N-level pose estimation results. N is an integer greater than 1, e.g., N ═ 3. The present disclosure is not limited to specific values of N.

In one possible implementation, step S12 may include: according to the characteristic information, carrying out relationship identification on the target object to obtain a first-level relationship identification result;

and under the condition that N is equal to 2, performing relation recognition on the target object according to the relation recognition result of the first stage and the posture estimation result of the first stage to obtain a relation recognition result of the second stage.

For example, N-level relationship identification may be performed on the image to be analyzed. In the first-stage relation recognition, the relation recognition can be carried out on the target object according to the characteristic information (behavior characteristics and attitude characteristics) to obtain a first-stage relation recognition result (behavior information and/or position information of an object related to the behavior); in the second-stage relationship recognition, the relationship recognition can be performed on the target object according to the first-stage relationship recognition result and the first-stage attitude estimation result to obtain a second-stage relationship recognition result. In the case where N is equal to 2, the relationship recognition result of the second stage may be taken as the final output result.

In one possible implementation, step S12 may further include:

under the condition that N is larger than 2, carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the attitude estimation result of the (N-1) th level to obtain a relation recognition result of the (N) th level, wherein N is an integer and is more than 1 and less than N and less than N;

and carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the posture estimation result of the (N-1) th level to obtain a relation recognition result of the (N) th level.

For example, when N is greater than 2, in the nth (1< N) level relationship identification, the relationship identification may be performed on the target object according to the relationship identification result of the nth-1 level and the posture estimation result of the nth-1 level, so as to obtain the relationship identification result of the nth level, where the nth level may be any one of the second level to the nth-1 level; in the Nth-level relation recognition, the relation recognition of the target object can be carried out according to the relation recognition result of the Nth-1 level and the posture estimation result of the Nth-1 level, so that the relation recognition result of the Nth level is obtained.

In one possible implementation, when the image analysis method according to the embodiment of the present disclosure is implemented by a neural network, the neural network may include an N-level relationship recognition network for performing relationship recognition on the target object. The present disclosure does not limit the specific type of N-level relationship recognition network.

By the mode, the posture estimation result can be used as the input of the relation recognition, so that the relation recognition process not only depends on the appearance characteristics of people, but also depends on more detailed human posture characteristics, the influence of the appearance difference of people is avoided, and the precision of the relation recognition is improved.

In one possible implementation, step S13 may include: carrying out attitude estimation on the target object according to the first-stage relationship recognition result and the attitude characteristics to obtain a first-stage attitude estimation result;

and under the condition that N is equal to 2, carrying out attitude estimation on the target object according to the second-stage relationship recognition result and the first-stage attitude estimation result to obtain a second-stage attitude estimation result.

For example, N-level pose estimation may be performed on the image to be analyzed. In the first-stage attitude estimation, the attitude estimation of the target object can be performed according to the first-stage relationship recognition result (behavior information and/or position information of an object related to the behavior) and the attitude characteristics to obtain a first-stage attitude estimation result (attitude information); in the second-stage attitude estimation, the attitude estimation of the target object can be performed according to the second-stage relationship recognition result and the first-stage attitude estimation result, so as to obtain a second-stage attitude estimation result. In the case where N is equal to 2, the attitude estimation result of the second stage may be taken as the final output result.

In one possible implementation, step S13 may further include:

under the condition that N is larger than 2, carrying out attitude estimation on the target object according to the relationship recognition result of the nth level and the attitude estimation result of the (N-1) th level to obtain the attitude estimation result of the nth level;

and carrying out attitude estimation on the target object according to the Nth-level relation recognition result and the N-1 th-level attitude estimation result to obtain the Nth-level attitude estimation result.

For example, in the case that N is greater than 2, in the N (1< N) th-level posture estimation, the posture of the target object may be estimated according to the relationship recognition result of the N-level and the posture estimation result of the N-1 level, so as to obtain the posture estimation result of the N-level, where the N-level may be any one of the second level to the N-1 level; in the nth-level attitude estimation, the attitude estimation may be performed on the target object according to the nth-level relationship recognition result and the nth-1-level attitude estimation result to obtain an nth-level attitude estimation result.

In one possible implementation, when the image analysis method according to the embodiment of the present disclosure is implemented by a neural network, the neural network may include an N-level pose estimation network for pose estimation of the target object. The present disclosure does not limit the specific type of the N-level pose estimation network.

By the mode, the posture estimation process can depend on the external characteristics of the person and also depend on the behavior of the person and the position information of the object related to the behavior, so that the position relation among key points of the human body is more definite, and the posture estimation accuracy is improved.

Fig. 3 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure. As illustrated in fig. 3, the neural network may include an N-level relationship recognition network: the first-stage relationship recognition network 31, the second-stage relationship recognition network 32, …, and the nth-stage relationship recognition network 33. The neural network may also include an N-stage pose estimation network: a first stage pose estimation network 61, a second stage pose estimation network 62, …, and an nth stage pose estimation network 63.

In one possible implementation, at a first stage of the relationship recognition, preliminary behavioral features of the target object obtained from the image to be analyzed by the feature extraction network 34 may be obtained

And preliminary attitudeFeature(s)

Inputting the data into the first-stage relationship recognition network 31 for processing to obtain the first-stage relationship recognition result

First-level relationship recognition result

Including behavior information of target objects

And position information of the object related to the behavior

In one possible implementation, at the second stage of relationship identification, the relationship identification result of the first stage can be used

And attitude estimation results of the first stage

Inputting the data into the second-stage relationship recognition network 32 for processing to obtain the second-stage relationship recognition result

At the nth stage of the relationship recognition, the relationship recognition result of the (n-1) th stage and the gesture recognition result of the (n-1) th stage can be input into the relationship recognition network of the nth stage for relationship recognition, so as to obtain the relationship recognition result of the nth stage.

In one possible implementation, at the Nth level of relationship recognition, the relationship recognition result of the Nth-1 level can be obtained

And gesture recognition results of the N-1 st level

Inputting the relation identification result into the relation identification network 33 of the Nth level for processing to obtain the relation identification result of the Nth level

(comprises

And

) And recognizing the relationship

And

as output of the image analysis.

In this way, the recognition result of the human-object relationship recognition is iteratively optimized in a turbo learning mode by utilizing the high correlation between the human-object relationship recognition and the human body posture estimation, and the recognition accuracy can be gradually improved.

Fig. 4 shows a schematic structural diagram of a relationship recognition network according to an embodiment of the present disclosure. As shown in fig. 4, in a possible implementation manner, the relationship recognition network may include operations such as full connection and convolution, and may perform behavior recognition and relationship recognition on the target object.

In a possible implementation manner, the performing relationship recognition on the target object according to the relationship recognition result of the (n-1) th level and the posture estimation result of the (n-1) th level to obtain the relationship recognition result of the (n) th level may include:

carrying out full connection processing on the n-1 level relation recognition result and the n-1 level attitude estimation result to obtain the n level connection characteristic;

and performing behavior recognition processing on the connection characteristics of the nth level to obtain behavior information of the nth level.

As shown in FIG. 4, in the relationship recognition network, the relationship recognition of the (n-1) th level of the input may be performed firstResults

And the gesture recognition result of the n-1 th level

Performing full connection processing to obtain n-level connection characteristics (shown as 41 in fig. 4); then, the connection characteristics of the nth level are subjected to behavior recognition to obtain the behavior information of the nth level

Wherein, the connection characteristic h can be obtained by the following formula (1):

in formula (1), [ ] represents the connection of the characteristic diagram, and I () represents the full-connection operation.

By the mode, behavior recognition can be performed by utilizing the connected relation recognition result and the connected posture recognition result, and the accuracy of behavior recognition is improved.

In a possible implementation manner, the step of performing relationship recognition on the target object according to the relationship recognition result of the (n-1) th level and the posture estimation result of the (n-1) th level to obtain a relationship recognition result of the (n) th level may further include:

and performing relation identification processing on the connection characteristics of the nth level according to the behavior information of the nth level to obtain position information of the nth level.

As shown in fig. 4, in the relationship identification network, the relationship identification may be performed on the connection feature of the nth level according to the behavior information of the nth level to obtain the location information of the nth level

By the mode, the relation recognition result and the gesture recognition result after connection can be used for carrying out relation recognition, and the precision of relation recognition is improved.

In a possible implementation manner, the relationship identification result of the nth level further comprises an intermediate relationship characteristic of the nth level,

the method comprises the following steps of obtaining a relation identification result of the nth level according to the relation identification result of the nth-1 level and the attitude estimation result of the nth-1 level, and comprises the following steps:

and performing full connection and convolution processing on the n-1 level relationship identification result and the n-1 level attitude estimation result to obtain the n level intermediate relationship characteristic.

As shown in FIG. 4, in the relationship recognition network, the relationship recognition result of the (n-1) th level can be obtained

And the gesture recognition result of the n-1 th level

Carrying out full connection and convolution processing to obtain the nth-level intermediate relation characteristic

According to the recognition result of the nth stage

(including the n-th level of behavior information

And position information of nth stage

) And intermediate relationship characteristics of nth order

Obtaining the n-th level relation recognition result

By the method, the previous-stage recognition result (intermediate relation characteristic) can be added into the relation recognition result, so that the characteristic loss is avoided, and the precision of relation recognition is further improved.

In one possible implementation, as shown in fig. 3, the pose estimation may be performed by an N-stage pose estimation network. At a first stage of pose estimation, the first stage relationship of the target object may be identified

And preliminary attitude characterization

Inputting the attitude data into a first-stage attitude estimation network 61 for processing to obtain a first-stage attitude estimation result

At a second stage of pose estimation, the relationship of the second stage may be identified

And attitude estimation results of the first stage

Inputting the attitude data into a second-stage attitude estimation network 62 for processing to obtain second-stage attitude characteristics

In one possible implementation, at the nth stage of the pose estimation, the relationship recognition result of the nth stage may be used

And recognition results of the N-1 st stage

Inputting the attitude estimation result into the N-th attitude estimation network 63 for processing to obtain the N-th attitude estimation result

And estimating the attitude

As output of the image analysis.

By the method, the recognition result of the posture estimation is iteratively optimized by utilizing the high correlation between the human-object relationship recognition and the human body posture estimation, and the recognition precision can be gradually improved.

Fig. 5 shows a schematic structural diagram of an attitude estimation network according to an embodiment of the present disclosure. As shown in fig. 5, in one possible implementation, the posture estimation network may include operations such as full connection, convolution, activation, and dot multiplication, and may perform posture estimation on the target object.

In a possible implementation manner, the step of performing attitude estimation on the target object according to the relationship recognition result of the nth level and the attitude estimation result of the (n-1) th level to obtain the attitude estimation result of the nth level may include:

performing convolution and activation processing on the relationship identification result of the nth level based on an attention mechanism to obtain an attention diagram of the nth level;

performing point multiplication on the attention diagram of the nth level and the attitude estimation result of the (n-1) th level to obtain the input characteristic of the nth level;

and carrying out attitude estimation on the input features of the nth level to obtain attitude information of the nth level.

For example, in an attitude estimation network, the relationship recognition result of the nth stage may be first identified based on the attention mechanism

Convolution and activation processing are carried out to obtain the attention diagram Att of the nth stage_action(as shown at 51 in fig. 5). As shown in equation (2):

in formula (2), sigmoid represents an activation function, and R () represents a morphing operation.

In one possible implementation, the attention map for the nth level, attitude estimation for the n-1 st level may be usedResult counting

Performing dot multiplication to obtain the input characteristic p of the nth stage, as shown in formula (3):

in a possible implementation manner, in the posture estimation network, the posture estimation can be performed on the input feature p of the nth level to obtain the posture information of the nth level

In one possible implementation, the attitude estimation result of the nth stage further includes an intermediate attitude feature of the nth stage,

wherein, the step of performing attitude estimation on the target object according to the nth level relationship recognition result and the nth-1 level attitude estimation result to obtain the nth level attitude estimation result may further include:

and performing convolution processing on the input feature of the nth level to obtain the intermediate attitude feature of the nth level.

That is, the input feature p of the nth stage is convolved to obtain the intermediate attitude feature of the nth stage

Attitude information according to nth order

And intermediate attitude feature of nth order

Obtaining the attitude estimation result of the nth level

As described above, each stage of the turbine learning architecture may include a primary relationship recognition network and a primary poseAnd the estimation network can take the output of the previous stage as the input of the next stage. Input t of nth stageⁿCan be shown as equation (4):

by such multiple iterations, the results of the relationship recognition task and the attitude estimation task can be gradually improved, and meanwhile, the accuracy of the relationship recognition and the accuracy of the attitude estimation are improved.

In one possible implementation, the neural network may be trained prior to image analysis of the image to be analyzed using the neural network (including the feature extraction network, the N-level relationship recognition network, and the N-level pose estimation network).

In one possible implementation, the method further includes: and training the neural network according to a preset training set. In the training process, the network parameter values can be adjusted according to the direction of minimizing the loss function, and when the loss function is reduced to a certain degree or converged within a certain threshold value, the adjustment is stopped, and the adjusted N-level neural network is obtained. The present disclosure does not limit the loss function used in the training process.

In one possible implementation, the neural network may be trained using the following loss function:

in formula (5), L represents the total loss of the neural network, N represents the number of stages of the relationship recognition network and the attitude estimation network,

representing the loss of the i-th order pose estimation network,

representing the weights of the i-th order pose estimation network,

indicating that the ith level relationship identifies a loss of the network,

representing the weight of the i-th level relationship recognition network, i is more than or equal to 1 and less than or equal to N, L_detIndicating a loss of the target detection network. The target detection network is used for determining the area of the target object in the image to be analyzed, and the network structure of the target detection network is not limited by the disclosure.

The loss function stated in the formula (5) is used for training the neural network to obtain the adjusted neural network, so that the performance of the neural network in image analysis can be improved.

In one possible implementation, the characteristic information may further include an appearance characteristic, wherein the appearance characteristic may include, for example, characteristics of human wear, appearance, and the like. In the feature extraction of the image to be analyzed in step S11, the appearance feature of the target object may be extracted. For example, a feature map of an image to be analyzed may be extracted, one or more human body regions may be determined, and the human body regions may be analyzed to obtain appearance features of the target object.

In one possible implementation, step S12 may include: and performing relationship recognition on the target object according to the appearance characteristic, the behavior characteristic and the posture characteristic to obtain a relationship recognition result of the target object.

For example, appearance features may be added as input when performing relationship recognition on the target object. When the N-level relationship recognition is performed on the image to be analyzed, the appearance feature can be used as an input of each level of relationship recognition. For example, in the nth-level relationship recognition, the relationship recognition may be performed on the target object according to the appearance feature, the (N-1) th-level relationship recognition result, and the (N-1) th-level posture estimation result, so as to obtain an nth-level relationship recognition result, where the nth level may be any one of the second level to the (N-1) th level.

By the method, appearance information of the object (character) can be introduced during the relation recognition, influence of the appearance difference of the character on the relation recognition result is avoided, and the precision of the relation recognition is further improved.

In one possible implementation, step S13 may include: and performing attitude estimation on the target object according to the appearance characteristics, the relationship identification result and the attitude characteristics to obtain an attitude estimation result of the target object.

For example, appearance features may be added as input when pose estimation is performed on the target object. When the N-level pose estimation is performed on the image to be analyzed, the appearance features can be used as the input of each level of pose estimation. For example, in the nth-level posture estimation, the posture of the target object may be estimated according to the appearance features, the nth-level relationship recognition result, and the nth-1-level posture estimation result, so as to obtain an nth-level posture estimation result, where the nth level may be any one of the second level to the N-1 level.

By the method, the appearance information of the object (person) can be introduced during posture estimation, so that the influence of the appearance difference of the person on the posture estimation result is avoided, and the precision of the posture estimation is further improved.

Likewise, where the feature information includes appearance features and the turbine learning architecture is employed for simultaneous relationship recognition and attitude estimation, the input t of the nth stage of the turbine learning architectureⁿCan be shown as equation (6):

by the method, appearance characteristics can be introduced in the iterative process, the influence of human appearance difference on results is avoided, and the accuracy of relationship identification and posture estimation is further improved.

According to the image analysis method disclosed by the embodiment of the disclosure, the positions of all people (objects) and objects in the image, the position relation between each person and the interacted object, and the human body key point position (posture characteristic) of the person can be obtained, the robustness of the appearance difference of the person through the relation recognition between the person and the object and the posture estimation of the person is enhanced, and a more accurate recognition result can be obtained. The embodiment of the disclosure can be applied to products such as man-machine interaction products, interactive entertainment products and the like and corresponding use scenes, and the accuracy of interactive behavior identification is improved.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

Fig. 6 shows a block diagram of an image analysis apparatus according to an embodiment of the present disclosure, which includes, as shown in fig. 6:

the feature extraction module 71 is configured to perform feature extraction on an image to be analyzed, and acquire feature information of a target object in the image to be analyzed, where the feature information includes a behavior feature and an attitude feature;

and the relationship identification module 72 is configured to perform relationship identification on the target object according to the feature information to obtain a relationship identification result of the target object, where the relationship identification result includes at least one of behavior information and position information of an object related to a behavior.

In one possible implementation, the relationship recognition result includes N levels of relationship recognition results, the posture estimation result includes N levels of posture estimation results, N is an integer greater than 1, and the relationship recognition module 72 includes: the first relation identification submodule is used for carrying out relation identification on the target object according to the characteristic information to obtain a first-level relation identification result; and the second relation identification submodule is used for carrying out relation identification on the target object according to the relation identification result of the first level and the posture estimation result of the first level under the condition that the N is equal to 2 so as to obtain a relation identification result of the second level.

In a possible implementation manner, the relationship identifying module 72 further includes: the third relation identification submodule is used for carrying out relation identification on the target object according to the relation identification result of the (N-1) th level and the attitude estimation result of the (N-1) th level under the condition that N is larger than 2 to obtain the relation identification result of the nth level, wherein N is an integer and is more than 1 and less than N and less than N; and the fourth relation recognition submodule is used for carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the posture estimation result of the (N-1) th level to obtain the relation recognition result of the (N) th level.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 7 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 8 is a block diagram illustrating an electronic device 1900 in accordance with an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of image analysis, the method comprising:

extracting features of an image to be analyzed to obtain feature information of a target object in the image to be analyzed, wherein the feature information comprises behavior features and posture features;

according to the characteristic information, carrying out relationship identification on the target object to obtain a relationship identification result of the target object, wherein the relationship identification result comprises at least one of behavior information and position information of an object related to the behavior;

and performing attitude estimation on the target object according to the relationship recognition result and the attitude characteristics to obtain an attitude estimation result of the target object, wherein the attitude estimation result comprises attitude information of the target object.

2. The method of claim 1, wherein the relationship recognition result comprises N levels of relationship recognition results, wherein the pose estimation result comprises N levels of pose estimation results, wherein N is an integer greater than 1,

according to the feature information, performing relationship identification on the target object to obtain a relationship identification result of the target object, including:

according to the characteristic information, carrying out relationship identification on the target object to obtain a first-level relationship identification result;

3. The method according to claim 2, wherein the relationship recognition is performed on the target object according to the feature information to obtain a relationship recognition result of the target object, and further comprising:

4. The method according to claim 2, wherein performing pose estimation on the target object according to the relationship recognition result and the pose feature to obtain a pose estimation result of the target object comprises:

carrying out attitude estimation on the target object according to the first-stage relationship recognition result and the attitude characteristics to obtain a first-stage attitude estimation result;

5. The method according to claim 4, wherein performing pose estimation on the target object according to the relationship recognition result and the pose feature to obtain a pose estimation result of the target object, further comprising:

under the condition that N is larger than 2, carrying out attitude estimation on the target object according to the relationship recognition result of the nth level and the attitude estimation result of the (N-1) th level to obtain the attitude estimation result of the nth level, wherein N is an integer and is more than 1 and less than N;

6. The method according to claim 3, wherein performing relationship recognition on the target object according to the relationship recognition result of the (n-1) th level and the attitude estimation result of the (n-1) th level to obtain a relationship recognition result of the (n) th level, comprises:

7. The method according to claim 6, wherein the performing relationship recognition on the target object according to the relationship recognition result of the (n-1) th level and the posture estimation result of the (n-1) th level to obtain a relationship recognition result of the (n) th level, further comprises:

8. The method of claim 6, wherein the n-th level of relationship identification results further comprises n-th level of intermediate relationship features,

wherein, according to the n-1 level relationship recognition result and the n-1 level attitude estimation result, the relationship recognition is performed on the target object to obtain the n level relationship recognition result, and the method further comprises the following steps:

9. The method according to claim 5, wherein performing attitude estimation on the target object according to the relationship recognition result of the nth level and the attitude estimation result of the (n-1) th level to obtain an attitude estimation result of the nth level comprises:

10. The method of claim 9, wherein the attitude estimation result of the nth stage further includes an intermediate attitude feature of the nth stage,

wherein, according to the nth level of the relationship recognition result and the nth-1 level of the attitude estimation result, the attitude estimation is performed on the target object to obtain the nth level of the attitude estimation result, and the method further comprises:

11. The method of claim 1, wherein the characterizing information further includes an appearance characteristic,

and performing relationship recognition on the target object according to the appearance characteristic, the behavior characteristic and the posture characteristic to obtain a relationship recognition result of the target object.

12. The method according to claim 11, wherein performing pose estimation on the target object according to the relationship recognition result and the pose feature to obtain a pose estimation result of the target object comprises:

and performing attitude estimation on the target object according to the appearance characteristics, the relationship identification result and the attitude characteristics to obtain an attitude estimation result of the target object.

13. The method of claim 1, wherein the method is implemented by a neural network comprising a relationship recognition network and an attitude estimation network, wherein the relationship recognition network is used for performing relationship recognition on the feature information, and the attitude estimation network is used for performing attitude estimation on the relationship recognition result and the attitude feature.

14. The method according to claim 3, wherein the method is implemented by a neural network comprising an N-level relation recognition network and an N-level posture estimation network, wherein the N-th level relation recognition network is used for carrying out relation recognition on the N-1-th level relation recognition result and the N-1-th level posture estimation result, and the N-th level posture estimation network is used for carrying out posture estimation on the N-th level relation recognition result and the N-1-th level posture estimation result.

15. The method of claim 1, wherein the method is implemented by a neural network comprising a feature extraction network for feature extraction of the image to be analyzed.

16. The method according to any one of claims 13-15, further comprising:

and training the neural network according to a preset training set.

17. The method of any one of claims 1-15, wherein the behavior information includes a confidence level of a current behavior of the target object.

18. An image analysis apparatus, comprising:

the characteristic extraction module is used for extracting characteristics of an image to be analyzed to obtain characteristic information of a target object in the image to be analyzed, wherein the characteristic information comprises behavior characteristics and posture characteristics;

the relation identification module is used for carrying out relation identification on the target object according to the characteristic information to obtain a relation identification result of the target object, and the relation identification result comprises at least one of behavior information and position information of an object related to the behavior;

and the first attitude estimation module is used for carrying out attitude estimation on the target object according to the relationship recognition result and the attitude characteristics to obtain an attitude estimation result of the target object, wherein the attitude estimation result comprises attitude information of the target object.

19. The apparatus of claim 18, wherein the relationship recognition result comprises N levels of relationship recognition results, wherein the pose estimation result comprises N levels of pose estimation results, wherein N is an integer greater than 1,

wherein, the relationship identification module comprises:

the first relation identification submodule is used for carrying out relation identification on the target object according to the characteristic information to obtain a first-level relation identification result;

and the second relation identification submodule is used for carrying out relation identification on the target object according to the relation identification result of the first level and the posture estimation result of the first level under the condition that the N is equal to 2 so as to obtain a relation identification result of the second level.

20. The apparatus of claim 19, wherein the relationship identification module further comprises:

the third relation identification submodule is used for carrying out relation identification on the target object according to the relation identification result of the (N-1) th level and the attitude estimation result of the (N-1) th level under the condition that N is larger than 2 to obtain the relation identification result of the nth level, wherein N is an integer and is more than 1 and less than N and less than N;

and the fourth relation recognition submodule is used for carrying out relation recognition on the target object according to the relation recognition result of the (N-1) th level and the posture estimation result of the (N-1) th level to obtain the relation recognition result of the (N) th level.

21. The apparatus of claim 19, wherein the first pose estimation module comprises:

the first attitude estimation submodule is used for carrying out attitude estimation on the target object according to a first-stage relationship recognition result and the attitude characteristics to obtain a first-stage attitude estimation result;

and the second attitude estimation submodule is used for carrying out attitude estimation on the target object according to the second-stage relationship recognition result and the first-stage attitude estimation result under the condition that the N is equal to 2, so as to obtain a second-stage attitude estimation result.

22. The apparatus of claim 21, wherein the first pose estimation module further comprises:

a third attitude estimation submodule, configured to perform attitude estimation on the target object according to an nth-level relationship recognition result and an nth-1-level attitude estimation result when N is greater than 2, to obtain an nth-level attitude estimation result, where N is an integer and 1< N;

and the fourth attitude estimation submodule is used for carrying out attitude estimation on the target object according to the Nth-level relation recognition result and the Nth-1-level attitude estimation result to obtain the Nth-level attitude estimation result.

23. The apparatus of claim 20, wherein the third relationship identification submodule is configured to:

24. The apparatus of claim 23, wherein the third relationship identification submodule is further configured to:

25. The apparatus of claim 23, wherein the n-th level relationship identification result further comprises an n-th level intermediate relationship feature,

wherein the third relationship identification submodule is further configured to:

26. The apparatus of claim 22, wherein the third pose estimation sub-module is configured to:

27. The apparatus of claim 26, wherein the attitude estimation result of the nth stage further comprises an intermediate attitude feature of the nth stage,

wherein the third pose estimation submodule is further configured to:

28. The apparatus of claim 18, wherein the characterizing information further comprises an appearance characteristic.

29. The apparatus of claim 28, further comprising:

and the second attitude estimation module is used for carrying out attitude estimation on the target object according to the appearance characteristics, the relationship recognition result and the attitude characteristics to obtain an attitude estimation result of the target object.

30. The apparatus of claim 18, wherein the apparatus comprises a neural network, and wherein the neural network comprises a relationship recognition network and an attitude estimation network, wherein the relationship recognition network is configured to perform relationship recognition on the feature information, and the attitude estimation network is configured to perform attitude estimation on the relationship recognition result and the attitude feature.

31. The apparatus of claim 20, comprising a neural network comprising an N-level relationship recognition network and an N-level pose estimation network, wherein the N-level relationship recognition network is configured to perform relationship recognition on the N-1-level relationship recognition result and the N-1-level pose estimation result, and the N-level pose estimation network is configured to perform pose estimation on the N-level relationship recognition result and the N-1-level pose estimation result.

32. The apparatus of claim 18, wherein the apparatus comprises a neural network comprising a feature extraction network for feature extraction of an image to be analyzed.

33. The apparatus of any one of claims 30-32, further comprising:

and the training module is used for training the neural network according to a preset training set.

34. The apparatus of any of claims 18-32, wherein the behavior information comprises a confidence level of a current behavior of the target object.

35. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 17.

36. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 17.