CN112906467A

CN112906467A - Group photo image generation method and device, electronic device and storage medium

Info

Publication number: CN112906467A
Application number: CN202110056318.1A
Authority: CN
Inventors: 薛地; 王权
Original assignee: Shenzhen TetrasAI Technology Co Ltd
Current assignee: Shenzhen TetrasAI Technology Co Ltd
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-06-04
Also published as: WO2022151687A1; JP2023533022A

Abstract

The present disclosure relates to a group photo image generation method and apparatus, an electronic device, and a storage medium, the method including: acquiring a live-action image; identifying a live-action image, and determining a target object in the live-action image and a first position of the target object in an Augmented Reality (AR) scene; displaying an AR preview image according to the first position and a second position of the AR object in the AR scene; in response to a group operation on the AR preview image, an AR group image of the target object and the AR object is generated. The embodiment of the disclosure can improve the sense of reality and the sense of immersion in the AR group photo scene.

Description

Group photo image generation method and device, electronic device and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a group photo image, an electronic device, and a storage medium.

Background

Augmented Reality (AR) is a human-computer interaction technology with which real scenes can be simulated. The AR group photo is an important application of the AR technology, and the AR group photo mode in the related technology has the problem of poor immersion and reality.

Disclosure of Invention

The present disclosure provides a group photo image generation technical solution.

According to an aspect of the present disclosure, there is provided a group photo image generating method including: acquiring a live-action image; identifying the live-action image, and determining a target object in the live-action image and a first position of the target object in an Augmented Reality (AR) scene; displaying an AR preview image according to the first position and a second position of an AR object in the AR scene; generating an AR group photo image of the target object and the AR object in response to a group photo operation for the AR preview image. According to the embodiment of the disclosure, the shielding effect between the virtual object and the real person can be realized based on the relative position relationship between the virtual object and the real person in the AR group photo, and the reality sense and the immersion sense in the AR group photo scene are improved.

In one possible implementation, the recognizing the real-world image, and determining a target object in the real-world image and a first position of the target object in an augmented reality AR scene include: carrying out human body recognition on the live-action image, and determining a human body area where a target object in the live-action image is located; determining a first depth of the target object in the AR scene according to the body region, the first location comprising the first depth. According to the embodiment of the disclosure, the first depth of the target object in the AR scene can be accurately determined according to the human body area, so that when an AR preview image is generated, the front and back positions of the target object can be accurately rendered according to the first depth, and a shielding effect is realized.

In one possible implementation, the displaying the AR preview image according to the first position and the second position of the AR object in the AR scene includes: rendering the AR object in the live-action image according to the relative relation between the first depth and the second depth, generating an AR preview image and displaying the AR preview image. According to the embodiment of the disclosure, the front-back position relationship between the target object and the AR object can be presented according to the relative relationship between the first depth and the second depth, so that when an AR preview image generated according to the front-back position relationship is displayed, the sense of reality and the sense of immersion in the AR group photo can be improved.

In one possible implementation, the method further includes: performing human key point detection on the live-action image, determining the human posture of a target object in the live-action image, and displaying an AR preview image according to the first position and the second position of the AR object in the AR scene, including: and displaying an AR preview image according to the first position, the second position and the human body posture of the target object. According to the embodiment of the disclosure, the AR preview image can be displayed according to the human body posture, the first position and the second position of the target object, so that the interaction effect of the AR object and the target object in the AR group photo is improved.

In one possible implementation manner, the displaying the AR preview image according to the first position, the second position, and the human body posture of the target object includes: determining a first posture of the AR object according to the human posture of the target object, wherein the first posture of the AR object is symmetrical to the human posture of the target object; and generating and displaying a first AR preview image according to the first posture, the first position and the second position, wherein the first AR preview image comprises an AR object with the first posture. According to the embodiment of the disclosure, the AR object can be driven to perform the action symmetrical to the target object according to the posture of the target object, the shielding effect between the AR object and the target object is presented, and the mutual sensation, the sense of reality and the sense of immersion in the AR group photo process are improved.

In one possible implementation manner, the displaying the AR preview image according to the first position, the second position, and the human body posture of the target object includes: determining an action category corresponding to the human body posture of the target object according to the human body posture of the target object; determining a second pose of the AR object according to the action category, the second pose of the AR object matching the action category; and generating and displaying a second AR preview image according to the second posture, the first position and the second position, wherein the second AR preview image comprises an AR object with the second posture. According to the embodiment of the disclosure, the AR object can be driven to present the gesture matched with the action of the target object according to the action of the target object, the shielding effect between the AR object and the target object is presented, and the mutual dynamic sense, the sense of reality and the sense of immersion in the AR group photo process are improved.

In one possible implementation, the AR preview image includes a plurality of frames of AR preview images, and the generating an AR group photo image of the target object and the AR object in response to the group photo operation on the AR preview image includes: and responding to the group photo operation of the plurality of frames of AR preview images, and generating an AR group photo video of the target object and the AR object. According to the embodiment of the disclosure, the AR group photo video can be generated, the position relation and the posture of the target object and the AR object can be dynamically displayed, and the mutual dynamic sense and the immersion sense are improved.

According to an aspect of the present disclosure, there is provided a group photo image generating apparatus including: the acquisition module is used for acquiring a live-action image; the determining module is used for identifying the live-action image, and determining a target object in the live-action image and a first position of the target object in an Augmented Reality (AR) scene; the display module is used for displaying the AR preview image according to the first position and the second position of the AR object in the AR scene; a generating module, configured to generate an AR group photo image of the target object and the AR object in response to a group photo operation for the AR preview image.

In one possible implementation manner, the determining module includes: the human body region determining submodule is used for carrying out human body recognition on the live-action image and determining a human body region where a target object in the live-action image is located; a first depth determination submodule, configured to determine, according to the human body region, a first depth of the target object in the AR scene, where the first position includes the first depth.

In one possible implementation, the second location of the AR object includes a second depth of the AR object in the AR scene, and the presentation module includes: and the first display submodule is used for rendering the AR object in the live-action image according to the relative relation between the first depth and the second depth, generating an AR preview image and displaying the AR preview image.

In one possible implementation, the apparatus further includes: the posture determining module is configured to perform human body key point detection on the live-action image, and determine a human body posture of a target object in the live-action image, and the displaying module includes: and the second display submodule is used for displaying the AR preview image according to the first position, the second position and the human body posture of the target object.

In a possible implementation manner, the second display sub-module includes: a first posture determining unit, configured to determine a first posture of the AR object according to a human posture of the target object, where the first posture of the AR object is symmetric to the human posture of the target object; and the first image display unit is used for generating and displaying a first AR preview image according to the first posture, the first position and the second position, wherein the first AR preview image comprises an AR object with the first posture.

In a possible implementation manner, the second display sub-module includes: an action type determining unit, configured to determine an action type corresponding to the human body posture of the target object according to the human body posture of the target object; a second pose determination unit, configured to determine a second pose of the AR object according to the motion category, where the second pose of the AR object matches the motion category; and the second image display unit is used for generating and displaying a second AR preview image according to the second posture, the first position and the second position, wherein the second AR preview image comprises an AR object with the second posture.

In one possible implementation manner, the AR preview image includes multiple frames of AR preview images, and the generating module includes: and the group photo video generation sub-module is used for responding to the group photo operation aiming at the plurality of frames of AR preview images and generating the AR group photo video of the target object and the AR object.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the AR preview image can be displayed according to the first position of the target object and the second position of the AR object in the AR scene, and the AR group photo image is generated in response to the group photo operation for the AR preview image, so that the shielding effect between the virtual object and the real person is realized based on the relative position relationship between the virtual object and the real person in the AR group photo, and the sense of reality and the sense of immersion in the AR group photo scene are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of a group image generation method according to an embodiment of the present disclosure.

Fig. 2a shows a depth schematic according to an embodiment of the present disclosure.

Fig. 2b shows a schematic diagram of an AR preview image according to an embodiment of the present disclosure.

Fig. 3a shows a depth schematic diagram two according to an embodiment of the present disclosure.

Fig. 3b shows a schematic diagram two of an AR preview image according to an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of an AR preview image according to an embodiment of the present disclosure.

Fig. 5 illustrates a flowchart of an AR preview image generation method according to an embodiment of the present disclosure.

Fig. 6a illustrates a schematic diagram of an AR preview image according to the related art.

Fig. 6b shows a schematic diagram of an AR preview image according to an embodiment of the present disclosure.

Fig. 7 illustrates a block diagram of a group image generation apparatus according to an embodiment of the present disclosure.

FIG. 8 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a group image generation method according to an embodiment of the present disclosure, which includes, as shown in fig. 1:

in step S10, a live view image is acquired;

in step S11, the live view image is recognized, and the target object in the live view image and the first position of the target object in the AR scene are determined.

In step S12, an AR preview image is presented according to the first location and the second location of the AR object in the AR scene.

In step S13, in response to the group operation for the AR preview image, an AR group image of the target object and the AR object is generated.

In one possible implementation, the group image generating method may be performed by an AR group device supporting AR technology, and the AR group device may include, for example: a terminal device, which may include a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and so on. The method may be implemented by a processor of the terminal device invoking computer readable instructions stored in a memory.

In a possible implementation manner, in step S10, the live-action image may be captured instantly, for example, by an image capturing device provided on the AR group photo device. The images may also be transmitted to the AR group photo device for other devices, or may also be obtained by the AR group photo device calling a local album, which is not limited in this disclosure. In these cases, the generated AR group image of the target object and the AR object may be, for example, an image generated by superimposing a special effect of the AR object on a live view image.

In one possible implementation manner, in step S10, the acquired live-action image may be one or more than one. The multiple live-action images may be, for example, multiple frames of continuous live-action images in a video stream, or multiple frames of live-action images obtained by performing interval sampling, which is not limited in this disclosure.

In one possible implementation manner, as described above, the AR group photo device is provided with an image acquisition device, and the image acquisition device can capture the live-action image in the shooting view range in real time and transmit the live-action image captured in real time to the processor of the AR group photo device as a video stream; the processor of the AR group photo device generates an AR preview image of the target object and the AR object based on each frame of live-action image of the acquired video stream, and displays the AR preview image on a display interface of the AR group photo device, so that the AR preview image of the target object and the AR object is presented in real time in the display interface. The positions and postures of the target objects of the multi-frame live-action images can be changed continuously, and the positions and postures of the AR objects corresponding to different live-action images can be changed correspondingly, so that a dynamic display effect is presented.

In one possible implementation, the live-action image containing the object may be determined by recognizing the live-action image. After determining the live-action image containing the object, the target object in the live-action image can be further determined. It is understood that the target object may be set according to actual needs, and may include, but is not limited to: setting all objects in the live-action image as target objects; or setting an object in the middle area in the live-action image as a target object; or the setting may be that after the object in the live-action image is recognized, the user selects the target object by himself, and the like, which is not limited in this embodiment of the disclosure.

In a possible implementation manner, the image capturing device may include, but is not limited to, an RGB (Red Green Blue ) camera, a TOF (Time of flight) camera, or the like.

In one possible implementation, in step S11, the live-action image is recognized by performing human body recognition on the live-action image by using any known human body recognition technology to determine the object in the live-action image. Human identification techniques include, but are not limited to: human image segmentation, human body key point detection and the like. It is to be understood that one skilled in the art may select one or more human body recognition techniques as long as recognition of an object in a live view image can be achieved, and the embodiment of the present disclosure is not limited thereto.

In one possible implementation, the AR scene may include a virtual scene that is pre-constructed according to actual requirements. An AR scene may include AR objects such as virtual characters, virtual animals, etc. The AR scene may be constructed based on a camera coordinate system of the image acquisition device, and a position of the AR object in the AR scene may be preset based on a three-dimensional registration technology.

In a possible implementation manner, data packets corresponding to different AR scenes may be stored in the AR group photo device; the data packets corresponding to different AR scenes may also be transmitted to the AR group photo device by other devices (e.g., a cloud server); by using different packets, different AR scenes can be shown in the AR group photo device.

In one possible implementation, the AR scene may be determined in response to a user selection operation on the AR scene. For example, an AR scene that can be selected by the user may be presented in a display interface of the AR group photo device, so that the user can trigger the selection of the AR scene through operations such as clicking and the like. The embodiment of the present disclosure does not limit the determination method of the AR scene.

In one possible implementation, the AR scene may be constructed based on the camera coordinate system of the image capture device, as described above. In step S11, after the target object in the live view image is determined, the position of the target object in the camera coordinate system, that is, the first position of the target object in the AR scene, may be determined according to the position of the target object in the live view image based on the imaging principle of the image capturing device.

In one possible implementation, in step S12, as described above, the position of the AR object in the AR scene is preset, and the second position of the AR object in the AR scene may be the position of the AR object in the camera coordinate system, that is, the second position of the AR object in the AR scene may be known.

In one possible implementation, in step S12, an occlusion effect between the AR object and the target object may be rendered in the live view image according to the first position of the target object and the second position of the AR object, an AR preview image may be generated, and the AR preview image may be presented in the AR group photo device.

In one possible implementation, the group photo operation for the AR preview image may include, but is not limited to: triggering group photo operation (for example, a user clicks a photographing key displayed on a display interface to trigger group photo operation) based on a touch key or a physical key provided by the AR group photo device; alternatively, the group photo operation may also be triggered based on a remote control (e.g., a remote trigger group photo operation is implemented by recognizing a user gesture). The embodiment of the present disclosure is not limited to the implementation form of the group photo operation.

In a possible implementation manner, the generation of the AR group photo image of the target object and the AR object may be implemented by intercepting and saving an AR preview image displayed in the display interface. For example, under the condition that a user clicks a photographing key displayed on a display interface, screenshot is carried out, and the intercepted image is an AR group photo image; and storing the intercepted image to the local for the user to extract.

In one possible implementation manner, in step S11, the identifying the real-world image, and determining the target object in the real-world image and the first position of the target object in the AR scene may include:

carrying out human body recognition on the live-action image, and determining a human body area where a target object in the live-action image is located;

from the body region, a first depth of the target object in the AR scene is determined, the first location comprising the first depth.

In a possible implementation manner, the human body identification is performed on the live-action image, the human body area where the target object is located in the live-action image is determined, and the human body area where the target object is located can be segmented based on a human image segmentation technology; the method may further include detecting key points of the human body contour based on a human body key point detection technology, determining the human body contour according to the key points of the human body contour, and using an area formed by the human body contour as a human body area, which is not limited in the embodiment of the present disclosure.

In one possible implementation, as described above, after determining the target object in the live-action image, the position of the target object in the camera coordinate system, that is, the first position of the target object in the AR scene, may be determined based on the position of the target object in the live-action image. Wherein the position of the target object in the camera coordinate system may comprise a depth of the target object in the camera coordinate system. The depth of the target object in the camera coordinate system can be understood as the distance between the target object and the image acquisition device in the real scene; the farther away the distance, the greater the depth; conversely, the smaller the depth.

In a possible implementation manner, after the human body region of the target object is determined, a distance between the target object and the image acquisition device may be determined based on pixel coordinates of key points of the human body region in the live-action image and an imaging principle (e.g., a pinhole imaging principle) of the image acquisition device, and the distance between the target object and the image acquisition device is used as a depth of the target object in a camera coordinate system, that is, a first depth of the target object in the AR scene.

As described above, the image acquisition device may include a TOF camera. In a possible implementation manner, in a case that the image acquisition device is a TOF camera, since the image data acquired by the TOF camera already includes information of a distance between the target object in the real scene and the TOF camera, after a human body region of the target object in the image is determined, a distance between the target object and the TOF camera, that is, a first depth of the target object in the AR scene, may be obtained.

In one possible implementation, the first depth of the target object may be a depth of a key point of the human body contour in a camera coordinate system; the depth of key points of the human joint part in a camera coordinate system can also be determined; but also the depth of all key points within the body region in the camera coordinate system. The number of the key points may be determined according to actual requirements, the operation performance of the processor, and the like, and the embodiment of the present disclosure is not limited.

In one possible implementation, an average of the depths of the plurality of key points may be taken as the first depth; or the minimum value and/or the maximum value in the depths of the plurality of key points can be used as the first depth; or the depth of all key points in the body region may be taken as the first depth. The determination may be specifically determined according to an actual application scenario, an operation performance of the processor, and the like, and the embodiment of the present disclosure is not limited.

In the embodiment of the disclosure, the first depth of the target object in the AR scene can be accurately determined according to the human body area, so that when the AR preview image is generated, the front and back positions of the target object can be accurately rendered according to the first depth, and the shielding effect is realized.

As described above, the AR scene may be constructed based on a camera coordinate system of the image capture device, and a position of the AR object in the AR scene may be preset, the first position including the first depth. In a possible implementation manner, the presenting the AR preview image according to the first position and the second position of the AR object in the AR scene may include:

and rendering the AR object in the live-action image according to the relative relation between the first depth and the second depth, generating an AR preview image and displaying the AR preview image.

In one possible implementation, the relative relationship between the first depth and the second depth may include: the first depth is greater than the second depth, or the first depth is less than the second depth.

In one possible implementation, the AR object may be stereoscopic in the AR scene, in which case the second depth of the AR object may include a plurality, that is, the second depth of the AR object may be a depth interval. It is to be understood that the AR object may also be planar in the AR scene, in which case the second depth of the AR object may be one.

In one possible implementation, the first depth of the target object may be one or more, as described above. In a case where the second depth of the AR object is plural, it may be determined that the first depth is less than the second depth in a case where all the first depths of the target object are less than the minimum value among the second depths of the AR object; in the case where the first depth of the AR object is one, it may be determined that the first depth is less than the second depth in the case where all the first depths of the target object are less than the one second depth.

In a possible implementation manner, for the case that the first depth is smaller than the second depth, it may be considered that the target object is closer to the image acquisition device than the AR object, that is, the AR object is located behind the target object with respect to the image acquisition device. Then, in the case where the first depth is smaller than the second depth, the AR object may be rendered in the live-action image at the rear side with respect to the target object, and an AR preview image may be generated and presented.

Fig. 2a shows a depth schematic according to an embodiment of the present disclosure. Fig. 2b shows a schematic diagram of an AR preview image according to an embodiment of the present disclosure. As shown in fig. 2a, the average value a1 of the first depth of the target object a is smaller than the minimum value b1 of the second depths of the AR object b, and the AR object b is rendered at the rear side of the target object a in the live view image, so that the AR preview image shown in fig. 2b can be generated; as shown in fig. 2b, the target object a occludes the AR object b.

In one possible implementation, the first depth of the target object may be one or more, as described above. In a case where the second depth of the AR object is plural, it may be determined that the first depth is greater than the second depth in a case where all the first depths of the target object are greater than a maximum value among the second depths of the AR object; in the case where the first depth of the AR object is one, it may be determined that the first depth is greater than the second depth in the case where all the first depths of the target object are greater than the one second depth.

In a possible implementation manner, for the case that the first depth is greater than the second depth, it may be considered that the target object is far away from the image acquisition device relative to the AR object, that is, the AR object is in front of the target object relative to the image acquisition device. Then, in the case where the first depth is greater than the second depth, the AR object may be rendered in the live view image on the front side with respect to the target object, and an AR preview image may be generated and presented.

Fig. 3a shows a depth schematic diagram two according to an embodiment of the present disclosure. Fig. 3b shows a schematic diagram two of an AR preview image according to an embodiment of the present disclosure. As shown in fig. 3a, the average value a2 of the first depth of the target object a is larger than the maximum value b2 of the second depth of the AR object b, and the AR object b is rendered in front of the target object a in the live view image, so that the AR preview image shown in fig. 3b can be generated; as shown in fig. 3b, the AR object b occludes the target object a.

It should be noted that the front side and the back side in the embodiments of the present disclosure have relativity, for example, an object closer to the image capturing device may be a front side of an object located farther from the image capturing device. The meaning of anterior and posterior will be understood by those skilled in the art, or at least understood after reading the embodiments of this disclosure.

As described above, the recognition of the live-view image frame may employ a human image segmentation technique. In a possible implementation manner, a human body region and a background region in the live-action image may be separated based on a human image segmentation technology, and then the AR object may be rendered on the rear side or the front side of the target object based on the separated human body region and the background region, so as to generate an AR preview image.

In one possible implementation manner, the displaying of the AR preview image may be that the AR preview image is drawn on a display interface of the AR group photo device for displaying.

In the embodiment of the present disclosure, the front-back positional relationship between the target object and the AR object can be presented according to the relative relationship between the first depth and the second depth, so that when displaying the AR preview image generated according to the front-back positional relationship, the sense of realism and the sense of immersion in the AR group photo can be improved.

In a possible implementation manner, in order to improve an interactive feeling in an AR group photo process, the method for generating an AR group photo image may further include: performing human body key point detection on the live-action image, determining the human body posture of the target object in the live-action image, and displaying the AR preview image according to the first position and the second position of the AR object in the AR scene may include: and displaying the AR preview image according to the first position, the second position and the human body posture of the target object.

In one possible implementation manner, any known human key point detection manner may be adopted to perform human key point detection on the live-action image, for example, the human key point detection manner may include: extracting key points of human body joint parts (for example, key points of human body at 13 joint parts) in the live-action image, wherein the number and the positions of the key points of the human body can be determined according to actual requirements, and the embodiment of the disclosure is not limited.

In one possible implementation, the posture information of the target object can be determined according to the detected key points of the human body. When the posture information is two-dimensional posture information, the two-dimensional posture information can comprise two-dimensional coordinate values of a plurality of human key points of the target object in the live-action image; and then, based on the two-dimensional coordinate values, connecting the plurality of key points according to the structure of the human body to obtain the two-dimensional posture of the target object.

In one possible implementation, when the posture information is three-dimensional posture information, the three-dimensional posture information may include: three-dimensional coordinate values of a plurality of human body key points of the target object under a camera coordinate system; and then, based on the three-dimensional coordinate values, connecting the plurality of key points according to the structure of the human body to obtain the three-dimensional posture of the target object.

In a possible implementation manner, in the process of identifying the live-action image in step S11, human body key point detection may be performed on the live-action image; human body recognition and human body key point detection can be performed simultaneously; or the human body can be identified firstly, and then the key points of the human body joint parts in the human body area are detected after the human body area in the live-action image is determined. Specifically, the determination may be performed according to the processing capability of the device implementing the identification function, the resource occupation of the device, the limitation on the time delay in the application process, and other factors that may affect the identification detection sequence, and the embodiment of the present disclosure is not limited.

In the embodiment of the disclosure, the AR preview image can be displayed according to the human body posture, the first position and the second position of the target object, so that the interaction effect of the AR object and the target object in the AR group photo is improved.

In one possible implementation, presenting the AR preview image according to the first position, the second position, and the human posture of the target object may include:

determining a first posture of the AR object according to the human body posture of the target object, wherein the first posture of the AR object is symmetrical to the human body posture of the target object;

and generating and displaying a first AR preview image according to the first posture, the first position and the second position, wherein the first AR preview image comprises the AR object with the first posture.

In one possible implementation, after the human body posture of the target object is determined, a first posture symmetrical to the human body posture of the target object may be determined based on the human body posture of the target object. Fig. 4 shows a schematic diagram of an AR preview image according to an embodiment of the present disclosure. As shown in fig. 4, the target object and the AR object perform a symmetric motion of "barycenter", that is, the human postures of the target object and the AR object are symmetric.

In one possible implementation, the first posture of the AR object may also be the same or opposite posture as the human posture of the target object. For example, the human body gesture is "lift left hand", the same first gesture may be "lift left hand", and the opposite first gesture may be "lift right hand".

In one possible implementation manner, the AR object with the first posture may be rendered on a rear side or a front side relative to the target object based on the first position of the target object and the second position of the AR object, a first AR preview image is generated, and the generated first AR preview image is drawn on a display interface of the AR group photo device for presentation.

In one possible implementation, as described above, the target object may be one or more set according to actual needs. In the case of one target object, the first pose of the AR object may be determined from the human pose of the target object. In the case that there are a plurality of target objects, the first posture of the AR object may be determined according to a human posture of the target object selected by the user; or the human body posture of the target object in the middle area of the live-action image is determined, which is not limited in the embodiment of the present disclosure.

In the embodiment of the disclosure, the AR object can be driven to perform the action symmetrical to the target object according to the posture of the target object, the shielding effect between the AR object and the target object is presented, and the mutual sensation, the sense of reality and the sense of immersion in the AR group photo process are improved.

In a possible implementation manner, the displaying the AR preview image according to the first position, the second position, and the human body posture of the target object may further include:

determining an action type corresponding to the human body posture of the target object according to the human body posture of the target object;

determining a second posture of the AR object according to the action category, wherein the second posture of the AR object is matched with the action category;

and generating and displaying a second AR preview image according to the second posture, the first position and the second position, wherein the second AR preview image comprises an AR object with a second human body posture.

In one possible implementation, the action category may be a preset category for determining the pose of the AR object, i.e., a different second pose may be set to correspond to a different action category. For example, an action category "arm spread" may be set, corresponding to a second gesture "holding the elbow between the arms"; the action category "raise hands" corresponds to the second posture "raise legs", and so on. The type and number of the action categories and the second postures corresponding to the action categories may be set according to actual requirements, and the embodiment of the present disclosure is not limited.

In one possible implementation, the motion of the target object may be determined according to the human body posture of the target object. When the motion of the target object belongs to a preset motion category, a second gesture corresponding to the motion category can be obtained. For example, following the above example, the target object is determined to be "lifting hands", i.e., the motion category is "lifting hands", and the second pose of the AR object may be "lifting legs", depending on the human pose of the target object.

In one possible implementation, a default pose may also be set for the AR object. And determining the second posture of the AR object as the default posture under the condition that the motion of the target object does not belong to the preset motion category.

In one possible implementation, when the pose of the target object changes, the pose of the AR object also changes correspondingly. For example, the target object changes from a motion not belonging to the preset motion category to a motion belonging to the preset motion category, and accordingly, the AR object may be driven to change from the default posture to a posture matching the preset motion category.

In one possible implementation manner, the AR object with the second posture may be rendered on the rear side or the front side of the target object based on the first position of the target object and the second position of the AR object, a second AR preview image is generated, and the generated second AR preview image is drawn on the display interface of the AR group photo device for displaying.

In a possible implementation manner, in different AR scenes, a plurality of action categories and a second posture matched with the action categories may be set, so that the second posture of the AR object may be adjusted along with the posture change of the target object, thereby increasing the interest and the interactive feeling of the AR group photo.

In one possible implementation, as described above, the target object may be one or more set according to actual needs. In the case of one target object, the second pose of the AR object may be determined from the human pose of the target object. In the case that there are a plurality of target objects, the second posture of the AR object may be determined according to the human posture of the target object selected by the user; or the human body posture of the target object in the middle area of the live-action image is determined; or when it is detected that the motion of a certain target object belongs to a preset motion category, determining according to the motion category corresponding to the human body posture of the target object, which is not limited in the embodiment of the present disclosure.

In the embodiment of the disclosure, the AR object can be driven to present a posture matched with the action of the target object according to the action of the target object, and a shielding effect between the AR object and the target object is presented, so that the mutual sensation, the sense of reality and the sense of immersion in the AR group photo process are improved.

As described above, the image capturing device can capture live-action images in the shooting visual field range in real time, and transmit the live-action images captured in real time as a video stream to the processor of the AR group photo equipment; the processor of the AR group photo device generates an AR preview image of the target object and the AR object based on each frame of live-action image of the acquired video stream, and displays the AR preview image on a display interface of the AR group photo device, so that the display interface presents the AR preview image of the target object and the AR object in real time.

It is to be appreciated that the generated AR preview image may include a plurality of frames of AR preview images. In one possible implementation, an AR group-photo video of the target object and the AR object may be generated in response to a group-photo operation for a plurality of frames of AR preview images.

In one possible implementation, the group photo operation for multiple frames of AR preview images may include, but is not limited to: triggering group photo operation (for example, a user clicks a video recording key displayed on a display interface to trigger group photo operation) based on a touch key or a physical key provided by the AR group photo device; alternatively, the group photo operation may also be triggered based on a remote control (e.g., a remote trigger group photo operation is implemented by recognizing a user gesture). The embodiment of the present disclosure is not limited to the implementation form of the group photo operation.

It is understood that the group photo operation for multiple frames of AR preview images may at least include: starting group photo and ending group photo operation; or start group photo, pause group photo, and end group photo. The setting may be specifically set according to actual requirements, functions supported by the AR group photo device, and the like, and the embodiment of the present disclosure is not limited.

In a possible implementation manner, generating an AR group video of the target object and the AR object may be implemented by recording and storing the video. For example, when a user clicks to start group photo, screen recording is performed; and clicking to finish group photo and finishing screen recording. The video corresponding to the screen recording operation is the AR group photo video; the video corresponding to the screen recording operation can be stored locally for the user to extract.

In the embodiment of the disclosure, an AR group photo video can be generated, the position relation and the posture of the target object and the AR object can be dynamically displayed, and the mutual dynamic sense and the immersion sense are improved.

Fig. 5 illustrates a flowchart of an AR preview image generation method according to an embodiment of the present disclosure. As shown in fig. 5, the AR preview image generating method may include:

acquiring video stream data; carrying out human body identification on each frame of image in video stream data; outputting the position information and the posture information of the real person; determining the action of a preset AR character according to the posture information of the real character; driving the AR character to do corresponding action, and rendering the relative position of the real character and the AR character; an AR preview image is generated.

In one possible implementation, video stream data may be collected based on an RGB or TOF camera; data such as position information of a real person, position information of the AR person, human skeleton and posture data of the real person in the AR scene can be determined based on a human body recognition technology, and the data are converted into digital signals to be returned to the AR group photo equipment (for example, a mobile terminal); after the AR group photo equipment acquires data, analyzing the position relation between a real person and an AR person, and visually presenting; meanwhile, the human skeleton and the posture of the real person are analyzed, the AR person is driven to act, and an interesting group photo concept is formed.

In one possible implementation manner, the action of determining the preset AR character may be determined in the following manner:

symmetrical action: analyzing the human body posture, driving the AR character image to act by the mirror image, and finishing AR group photo together, such as heart comparing action and the like which are popular at present;

double action: the human body posture and the action category of the real person are analyzed, the AR figure image is driven to do corresponding actions, the double action postures are completed together, and habitual interaction of the real person and the AR figure is formed.

In a possible implementation manner, the AR group photo image generation method in the embodiment of the present disclosure may be applied to an internet APP, for example, a map application, or an application product with an IP image, a speaker, or the like; functional APPs, for example, are used in most products for AR marketing.

In a possible implementation mode, the AR character image is driven to change the motion in real time according to the human body posture and motion detection of a real person, and finally an interactive concept is presented. Namely, the AR character image is triggered to feed back through the limb actions of the real person by combining the human skeleton detection technology and the like, and interaction is formed.

In the related art, due to lack of a human image segmentation function, a virtual object is in a foreground in the whole AR scene and does not have sufficient sense of reality, and fig. 6a shows a schematic diagram of an AR preview image according to the related art, as shown in fig. 6a, the virtual object is in the foreground and does not embody a position relationship between the virtual object and a real person. In the embodiment of the present disclosure, in combination with a portrait segmentation function, a front-back position relationship between a real person and a virtual object is highlighted in an AR scene, so that the whole AR scene is more real, and fig. 6b shows a schematic diagram of an AR preview image according to the embodiment of the present disclosure, as shown in fig. 6b, the real person is located on the front side of the virtual object, and represents a position relationship between the real person and the virtual object, so as to form an occlusion effect.

In the related art, the AR group photo is hard because the AR character is in a substantially static state. In the embodiment of the disclosure, by combining a human skeleton detection (human key point detection) function, in an AR group photo scene, by detecting the skeleton and action posture information of a real person, an AR character image is driven to make a corresponding action, for example, a symmetric action, so that the whole interaction is more vivid.

In the embodiment of the disclosure, brand new interactive experience can be brought to the user in the AR scene, and the user is more interesting, real and fun. And simultaneously analyzing corresponding position information based on the image data, and displaying the front-back position relation of the real person and the AR character image in real time. Through the AR series of application software forms, a user can walk back and forth in front of the camera or swing out the action, the three-dimensional virtual image can be triggered in real time to make a response action, and meanwhile, the real physical position relationship is sensed.

In an embodiment of the present disclosure, a scheme for experiencing interactive group photo memos in an AR scene is provided: the user can walk back and forth in the lens to experience a real front-back position relation with the virtual character, and meanwhile, the action is supported, and the AR character image is triggered to respond; the user can quickly and simply output a photo and an interactive video which can be used for social sharing and the like.

In the embodiment of the disclosure, the problem that virtual materials are forever in the foreground in an AR scene is solved; by adding the portrait shielding function, the position relation between the virtual object and the real object in the AR scene is highlighted, the shielding effect is realized, and great leap to the reality is brought to the AR experience.

In the embodiment of the disclosure, the problem of interaction stiffness with the AR character image in the AR scene is solved, and based on the action change of a real person, the AR character image can be migrated and driven to give feedback, such as mirror image action, through the technologies of skeleton binding, human body key point detection, posture detection and the like.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a group photo image generating device, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any group photo image generating method provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are referred to and are not described again.

Fig. 7 shows a block diagram of a group image generation apparatus according to an embodiment of the present disclosure, which includes, as shown in fig. 7:

an obtaining module 21, configured to obtain a live-action image;

a determining module 22, configured to identify the live-action image, and determine a target object in the live-action image and a first position of the target object in the augmented reality AR scene;

a display module 23, configured to display an AR preview image according to the first position and a second position of the AR object in the AR scene;

a generating module 24, configured to generate an AR group photo image of the target object and the AR object in response to a group photo operation on the AR preview image.

In one possible implementation, the determining module 22 includes: the human body region determining submodule is used for carrying out human body recognition on the live-action image and determining a human body region where a target object in the live-action image is located; a first depth determination submodule, configured to determine, according to the human body region, a first depth of the target object in the AR scene, where the first position includes the first depth.

In one possible implementation, the second position of the AR object includes a second depth of the AR object in the AR scene, and the presentation module 23 includes: and the first display submodule is used for rendering the AR object in the live-action image according to the relative relation between the first depth and the second depth, generating an AR preview image and displaying the AR preview image.

In one possible implementation, the apparatus further includes: a posture determining module, configured to perform human body key point detection on the live-action image, and determine a human body posture of a target object in the live-action image, where the displaying module 23 includes: and the second display submodule is used for displaying the AR preview image according to the first position, the second position and the human body posture of the target object.

In one possible implementation manner, the AR preview image includes multiple frames of AR preview images, and the generating module 24 includes: and the group photo video generation sub-module is used for responding to the group photo operation aiming at the plurality of frames of AR preview images and generating the AR group photo video of the target object and the AR object.

In the embodiment of the disclosure, the AR preview image can be displayed according to the first position of the target object and the second position of the AR object in the AR scene, and the AR group photo image is generated in response to the group photo operation aiming at the AR preview image, so that the shielding effect between the virtual object and the real person is realized based on the relative position relationship between the virtual object and the real person in the AR group photo, and the reality sense and the immersion sense are improved.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code, when the computer readable code runs on a device, a processor in the device executes instructions for implementing the group photo image generation method provided in any of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the group photo image generation method provided in any of the above embodiments.

The electronic device may be provided as a terminal or other modality of device.

Fig. 8 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, electronic device 800 may include an AR group photo device that supports AR technology.

Referring to fig. 8, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of generating a group image, comprising:

acquiring a live-action image;

identifying the live-action image, and determining a target object in the live-action image and a first position of the target object in an Augmented Reality (AR) scene;

displaying an AR preview image according to the first position and a second position of an AR object in the AR scene;

generating an AR group photo image of the target object and the AR object in response to a group photo operation for the AR preview image.

2. The method of claim 1, wherein the identifying the live-action image, determining a target object in the live-action image and a first position of the target object in an Augmented Reality (AR) scene comprises:

determining a first depth of the target object in the AR scene according to the body region, the first location comprising the first depth.

3. The method of claim 2, wherein the second location of the AR object comprises a second depth of the AR object in the AR scene,

the displaying an AR preview image according to the first location and a second location of an AR object in the AR scene, comprising:

rendering the AR object in the live-action image according to the relative relation between the first depth and the second depth, generating an AR preview image and displaying the AR preview image.

4. The method of claim 1, further comprising:

detecting key points of human body on the live-action image, determining the human body posture of the target object in the live-action image,

and displaying an AR preview image according to the first position, the second position and the human body posture of the target object.

5. The method of claim 4, wherein presenting the AR preview image according to the first location, the second location, and the human pose of the target object comprises:

determining a first posture of the AR object according to the human posture of the target object, wherein the first posture of the AR object is symmetrical to the human posture of the target object;

and generating and displaying a first AR preview image according to the first posture, the first position and the second position, wherein the first AR preview image comprises an AR object with the first posture.

6. The method of claim 4, wherein presenting the AR preview image according to the first location, the second location, and the human pose of the target object comprises:

determining an action category corresponding to the human body posture of the target object according to the human body posture of the target object;

determining a second pose of the AR object according to the action category, the second pose of the AR object matching the action category;

and generating and displaying a second AR preview image according to the second posture, the first position and the second position, wherein the second AR preview image comprises an AR object with the second posture.

7. The method of any of claims 1-6, wherein the AR preview image comprises a plurality of frames of AR preview images, and wherein generating an AR group photo image of the target object and the AR object in response to the group photo operation on the AR preview image comprises:

and responding to the group photo operation of the plurality of frames of AR preview images, and generating an AR group photo video of the target object and the AR object.

8. A group image generation device, comprising:

the acquisition module is used for acquiring a live-action image;

the determining module is used for identifying the live-action image, and determining a target object in the live-action image and a first position of the target object in an Augmented Reality (AR) scene;

the display module is used for displaying the AR preview image according to the first position and the second position of the AR object in the AR scene;

a generating module, configured to generate an AR group photo image of the target object and the AR object in response to a group photo operation for the AR preview image.

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.