CN116823869A

CN116823869A - Background replacement method and electronic equipment

Info

Publication number: CN116823869A
Application number: CN202210271308.4A
Authority: CN
Inventors: 李炜; 黄睿
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2023-09-29
Also published as: WO2023174063A1

Abstract

The application provides a background replacement method, which comprises the steps of dividing a first image based on an image segmentation lightweight model, determining the region where a target object in the first image is located and the region where the background content is located, and training the image segmentation lightweight model through a background replacement training operation until the background replacement effect satisfied by a user is reached when the user is not satisfied with the background replacement effect. Therefore, when the user is not satisfied with the replacement effect of the background replacement based on the current image segmentation lightweight model, the training of the image segmentation lightweight model can be started through the user operation, and the segmentation accuracy of the image segmentation lightweight model is improved, so that the accuracy of the background replacement is improved when the background replacement is carried out, the probability of replacement errors is reduced, and the user experience is improved.

Description

Background replacement method and electronic equipment

Technical Field

The application relates to the field of terminals, in particular to a background replacement method and electronic equipment.

Background

Background replacement is becoming desirable in many video applications, such as video calls, video conferences, video customer service, etc., in order to reduce the environmental restrictions of the application, to protect the privacy of the user. Among them, background replacement is a technique of replacing the content of a background area contained in a video or image with specified background content. In background replacement, the more central step is image segmentation, i.e. the input image is segmented into a target region and a background region by means of an image segmentation model.

At present, the training of the image segmentation model is to collect a large amount of historical images or video data of a target object in advance to construct an image data set, and then train the image segmentation model by utilizing the image data set to obtain an optimized image segmentation model aiming at the target object. Therefore, the accuracy of image segmentation depends on the quality of an image dataset used in the training process of an image segmentation model, but a pre-collected image dataset cannot include all individual features, when the image dataset is low in quality, namely, does not include some individual features of a target object, when an image including the individual features of the target object is segmented by using the trained image segmentation model, a region including the individual features in a target region of the image is regarded as a background region, so that the accuracy of image segmentation is poor, and further, when the image background is replaced, replacement errors are caused, and user experience is affected.

Disclosure of Invention

The application provides a background replacement method and electronic equipment, and the implementation of the method can improve the image segmentation accuracy of a background segmentation model, further improve the background replacement accuracy and improve the user experience.

In a first aspect, an embodiment of the present application provides a method for replacing a background, where the method includes displaying, by an electronic device, a second image obtained by performing first background replacement on a first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image comprises an area where the target object is located and an area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content; responding to background replacement training operation of a user, and displaying a first gesture graph by the electronic equipment, wherein the first gesture graph is used for indicating the user to make a first gesture; the electronic equipment acquires a third image, wherein the gesture of a user in the third image is the first gesture; the electronic equipment displays a fourth image obtained by carrying out second background replacement on the first image; the second background replacement is performed based on a target image segmentation lightweight model, wherein the target image segmentation lightweight model is obtained based on third image training; the fourth image is obtained by replacing the first background content in the first image with the second background content.

It will be appreciated that upon detection of a background replacement training operation, the electronic device displays a first gesture map indicating that the user has made a first gesture, and then the electronic device obtains a third image including the user, in which the gesture of the user is the first gesture. The electronic device uses the third image as training data of the first image segmentation lightweight model, and can acquire the latest individual characteristics of the user. In this way, the accuracy of the segmentation of the image by the target image segmentation lightweight model obtained by using the third image training is higher. That is, the target image segmentation lightweight model can more accurately segment the target object and the background content from the image. Therefore, the size of the region in which the target object is located in the fourth image is closer to the size of the region in which the target object is located in the first image than in the second image obtained by performing the first background replacement on the first image based on the first image segmentation lightweight model and the fourth image obtained by performing the first background replacement on the first image based on the target image segmentation lightweight model. That is, the accuracy of the background replacement of the fourth image is higher than that of the second image. The segmentation accuracy of the target image segmentation lightweight model is higher than the segmentation accuracy of the first image segmentation lightweight model.

According to the method, the electronic device uses the image segmentation lightweight model to segment the first image, the area where the target object in the first image is located and the area where the background content is located are determined, when the user is not satisfied with the effect of background replacement, the user can train the image segmentation lightweight model through the background replacement training operation until the effect of the background replacement satisfied by the user is achieved. Therefore, when the user is not satisfied with the effect of background replacement based on the current image segmentation lightweight model, the training of the image segmentation lightweight model can be started through user operation, the segmentation accuracy of the image segmentation lightweight model is improved, the accuracy of background replacement is improved, and the user experience is improved.

With reference to the first aspect, in some implementations, before the electronic device displays the fourth image obtained by performing the second background replacement on the first image, the method further includes: the electronic equipment inputs a third image into the image segmentation total model to obtain a first segmentation result; the electronic equipment inputs the third image into the first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the image segmentation full model is greater than the number of model parameters in the first image segmentation light model; the first segmentation result and the second segmentation result are used for indicating the region where the target object in the third image is located and the region where the third background content is located; the electronic equipment trains the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model; the electronic equipment inputs a third image into the second image segmentation lightweight model to obtain a third segmentation result;

Under the condition that the first segmentation result and the third segmentation result are different, the electronic equipment determines a second gesture graph based on the first segmentation result and the third segmentation result, wherein the second gesture graph is used for indicating a user to make a second gesture, the second gesture graph comprises a first limb, and the area of the first limb in the first segmentation result is different from the area of the first limb in the third segmentation result;

the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain a target image segmentation lightweight model, and the gesture of the target object in the fifth image is the second gesture.

The image segmentation lightweight model is obtained by cutting and quantizing based on an image segmentation full model, and the number of model parameters in the image segmentation full model is larger than that in the image segmentation lightweight model. In general, the segmentation result of the image segmentation full model is more accurate than that of the image segmentation light model, but the calculation amount of the image segmentation full model is relatively large. In order to reduce the model calculation amount, an image segmentation lightweight model with fewer model parameters is generally deployed on an electronic device for image segmentation, but the segmentation effect of the image segmentation lightweight model is not as good as that of the image segmentation full model, so that in the training process of the image segmentation lightweight model, the image lightweight model to be trained is guided and trained by the image segmentation full model, so that the performance of the image segmentation lightweight model is close to that of the image segmentation full model, namely, the segmentation effect of the image segmentation lightweight model is close to that of the image segmentation full model. That is, the electronic device segments the third image using the image segmentation full model and the image segmentation lightweight model, respectively, to obtain the first segmentation result and the second segmentation result. Based on the error between the first segmentation result and the second segmentation result, model parameters of the image segmentation lightweight model are adjusted so that the segmentation effect of the image lightweight model is closer to that of the image segmentation full-scale model. Thus, the calculation amount of the electronic equipment can be reduced under the condition of ensuring the image segmentation effect.

In this way, after the electronic device uses the first segmentation result and the second segmentation result to train the second image segmentation lightweight model, the third image is used to verify the trained second image segmentation lightweight model, and when the third segmentation result and the first segmentation result are different, the electronic device determines a gesture graph based on the segmentation result, instructs the user to make the gesture shown in the gesture graph, and uses the gesture graph as training data for guiding the next round of training. The training data is actively screened, and the quality of the training data is improved. Training the model by using the screened training data improves the training effect of the model and further improves the segmentation accuracy of the model.

With reference to the first aspect, in some implementations, the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain a target image segmentation lightweight model, which specifically includes:

the electronic equipment acquires the fifth image; the electronic equipment inputs the fifth image into the image segmentation total model to obtain a fourth segmentation result; the electronic equipment inputs a fifth image into the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used for indicating an area where the target object is located and an area where fourth background content is located in the fifth image; the electronic equipment trains the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result to obtain a third image segmentation lightweight model; the electronic equipment inputs the fifth image into the third image segmentation lightweight model to obtain a sixth segmentation result;

And when the fourth segmentation result and the sixth segmentation result do not meet the first preset condition, the third image segmentation lightweight model is a target image segmentation lightweight model.

In this way, after the electronic device is trained to obtain the second image segmentation lightweight model, the second image segmentation lightweight model is verified by using the fifth image, the segmentation result of the second image segmentation lightweight model and the segmentation result master of the image segmentation total model are judged to meet the first preset condition, and if the first preset condition is not met, the electronic device determines that the second image segmentation lightweight model is the target image segmentation lightweight model and stops training. That is, after one round of training, the electronic device may determine whether the segmentation result of the image segmentation lightweight model meets the requirement, and stop training if the segmentation result meets the requirement, so that unnecessary training may be avoided.

With reference to the first aspect, in some implementations, after the electronic device inputs the fifth image to the third image segmentation lightweight model to obtain the sixth segmentation result, the method further includes: under the condition that the fourth segmentation result and the sixth segmentation result meet the first preset condition, the electronic equipment determines a third gesture graph based on the fourth segmentation result and the sixth segmentation result, wherein the third gesture graph is used for indicating a user to make a third gesture, the third gesture graph comprises a second limb, and the second limb in the fourth segmentation result is different from the second limb in the sixth segmentation result;

The electronic device trains the third image segmentation lightweight model based on the sixth image to obtain a target image segmentation lightweight model, and the posture of the target object in the sixth image is the third posture.

In this way, in the case where the fourth segmentation result and the fifth segmentation result do not satisfy the requirements, that is, in the case where the segmentation result of the third image segmentation lightweight model and the segmentation result of the image segmentation total model do not satisfy the requirements, the electronic device determines a third posture graph for instructing the user to make a third posture. Thus, the electronic equipment continuously screens the training data through the training of each round until the segmentation result of the model meets the requirement. Therefore, the quality of the training data can be improved, the training effect of the model is improved, the waste of the training data can be avoided, and the overlong training time caused by invalid training data is avoided.

With reference to the first aspect, in some implementations, in a case where the first segmentation result and the third segmentation result are different, the electronic device determines a second pose graph based on the first segmentation result and the third segmentation result, including:

In the case that the difference between the first segmentation result and the third segmentation result meets a first preset condition, the electronic equipment determines a target area of the first image based on the difference between the first segmentation result and the third segmentation result; the electronic device determines the second gesture map based on the target region.

The target area is an area with poor segmentation, and the electronic equipment determines whether the area with poor segmentation exists in the third segmentation result based on the first segmentation result and the third segmentation result. If the poorly segmented region exists in the third segmentation result, the electronic device determines a gesture graph containing the poorly segmented region from the representative gesture library, and the gesture graph is used for indicating a user to make a gesture contained in the gesture graph, namely a gesture contained in the poorly segmented region. In this way, when the light model is trained by the next round of image segmentation, the user images containing the poorly segmented areas are obtained, and the poorly segmented areas are trained in a focus mode, namely individual features of the target object are subjected to personalized training, so that the training effect of the model is improved, and the segmentation accuracy of the model is further improved.

With reference to the first aspect, in some implementations, the first segmentation result and the third segmentation result include pixel information of a pixel point in the third image;

In the case that the difference between the first segmentation result and the third segmentation result meets a first preset condition, the electronic device determines a target area of the first image based on the difference between the first segmentation result and the third segmentation result, and specifically includes: the electronic equipment determines a first target pixel point in the third image based on the difference value between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the third segmentation result, wherein the first target pixel point is a pixel point with the difference value between the pixel information of the first segmentation result and the pixel information of the pixel point in the third segmentation result being larger than a first threshold value; the electronic device determines an area of the third image where one or more limbs of the target object are located, wherein the one or more limbs comprise the third limb; and under the condition that the number of the first target pixel points in the area where the third limb is located in the third image is larger than the second threshold value, the electronic equipment determines the area where the third limb is located in the third image as a target area.

In some implementations, the limb may include a head, neck, right shoulder, right thigh, right forearm, right hand, left shoulder, left thigh, left forearm, left hand, torso, right crotch, right thigh, right calf, right foot, left crotch, left thigh, left calf, left foot, and the like limb.

With reference to the first aspect, in some implementations, the electronic device determines a second gesture graph based on the target area, specifically includes: the electronic equipment determines a third limb of a target object contained in the target area; the electronic device determines a second gesture map that includes a third limb.

Therefore, the electronic equipment can screen out the gesture graph containing the limb with poor segmentation, and in the next round of training, the region with poor segmentation is trained in a key way, so that the training effect of the model can be improved.

With reference to the first aspect, in some implementations, the electronic device determines a second gesture map including a third limb, including: the electronic device determines a plurality of gesture graphs comprising a third limb; the electronic equipment determines a second gesture image from the gesture images, wherein the second gesture image is the gesture image with the largest first target pixel point in the area where the third limb is located in the gesture images.

In some implementations, the pose map includes a contour map and a human body region map; before the electronic device detects the first user operation, the method further comprises: the electronic equipment acquires a training data set which comprises a plurality of images; the electronic equipment inputs the image dataset into a human body posture estimation model to obtain a plurality of human body posture vectors corresponding to the image dataset; the electronic equipment inputs a plurality of human body posture vectors corresponding to the image data set into a clustering model to obtain one or more representative posture vectors; the electronic equipment inputs one or more representative gesture vectors into the human body contour detection model to obtain contour diagrams corresponding to the one or more representative gesture vectors; the electronic equipment inputs the image data set into a human body region detection model to obtain one or more limb region diagrams corresponding to the image data set.

In combination with the first aspect, in some implementations, when the first segmentation result and the third segmentation result do not meet the first preset condition, the electronic device replaces the third background content in the area where the background content in the third image is located with the preset background content based on the third segmentation result, so as to obtain a seventh image;

the electronic device displays a seventh image, a first control and first prompt information, wherein the first prompt information is used for prompting training of the second image segmentation lightweight model.

In this way, when the segmentation result of the image segmentation lightweight model meets the requirement, the electronic device can replace the background content in the segmentation result with the preset background content to obtain a replaced image, and display the replaced image on the display screen. And a retraining control may be displayed through which the user may retrain the image segmentation lightweight model when the user is not satisfied with the effect of the background replacement. The user experience can be improved.

With reference to the first aspect, in some implementations, after the electronic device displays the seventh image, the first control, and the first prompt, the method further includes: the electronic equipment detects an operation acting on the first control, and determines a third threshold value which is smaller than the first threshold value;

Under the condition that the difference value between the first segmentation result and the third segmentation result meets a second preset condition, the electronic equipment determines a fourth gesture graph based on the first segmentation result and the third segmentation result, wherein the fourth gesture graph is used for indicating a fourth gesture; the second preset condition is: the number of second target pixel points in the region where the fourth limb is located in the first image is larger than a second threshold; the second target pixel point is a pixel point of which the difference value between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the second segmentation result is larger than a third threshold value; the electronic device trains the second image segmentation lightweight model based on the eighth image to obtain a target image segmentation lightweight model, and the gesture of the target object in the eighth image is a fourth gesture.

With reference to the first aspect, in some implementations, before the electronic device displays the first gesture image in response to a user's context replacement training operation, the method further includes:

when the electronic equipment detects that the using time length of the first image segmentation lightweight model is longer than the first time length, the electronic equipment displays second prompt information and a second control, wherein the second prompt information is used for prompting training of the first image segmentation lightweight model; the background replacement training operation is an operation that acts on the second control.

In a second aspect, embodiments of the present application provide a context replacement device comprising respective units for performing the method of the first aspect or any of the possible implementations of the first aspect.

In a third aspect, embodiments of the present application provide an electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method as described in the first aspect and any possible implementation of the first aspect.

In a fourth aspect, embodiments of the present application provide a chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform a method as described in the first aspect and any possible implementation of the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer readable storage medium comprising instructions which, when executed on an electronic device, cause the electronic device to perform a method as described in the first aspect and any possible implementation manner of the first aspect.

It will be appreciated that the background replacement apparatus provided in the second aspect, the electronic device provided in the third aspect, the chip system provided in the fourth aspect, and the computer storage medium provided in the fifth aspect described above are all used to perform the method provided by the embodiment of the present application. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.

Drawings

FIGS. 1A-1B are schematic views of a user interface for a video conference of an electronic device according to an embodiment of the present application;

FIG. 2A is a flow chart of a method of background replacement provided by an embodiment of the present application;

FIG. 2B is a flow chart of a method of background replacement provided by an embodiment of the present application;

FIGS. 3A-3F are schematic views of some user interfaces provided by embodiments of the present application;

FIG. 4 is a schematic diagram of a segmentation result provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a training process of an image segmentation lightweight model provided by an embodiment of the application;

FIG. 6 is a flowchart of a process for determining a target area by an electronic device according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another user interface provided by an embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device user according to an embodiment of the present application, where an original image is segmented and then replaced by using an image segmentation lightweight model 1 and a target image segmentation lightweight model, so as to obtain a replaced image;

FIG. 9 is a flowchart of an electronic device building a representative gesture library provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a set of skeletal keypoint data provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of a process of clustering electronic devices to obtain representative gesture vectors according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a representative gesture vector resulting profile provided by an embodiment of the present application;

FIG. 13 is a schematic illustration of a limb area provided by an embodiment of the present application;

fig. 14 is a schematic software structure of an electronic device 100 according to an embodiment of the present application;

FIG. 15 is a cooperative relationship of each module in an electronic device according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a background replacement system provided by an embodiment of the present application;

FIG. 17A is a schematic diagram of a background replacement device according to an embodiment of the present application;

FIG. 17B is a schematic diagram of another background replacement device provided by an embodiment of the present application;

fig. 18 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 19 is a schematic structural diagram of another electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; the text "and/or" is merely an association relation describing the associated object, and indicates that three relations may exist, for example, a and/or B may indicate: the three cases where a exists alone, a and B exist together, and B exists alone, and furthermore, in the description of the embodiments of the present application, "plural" means two or more than two.

The terms "first," "second," and the like, are used below for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

Currently, when a user performs a video conference through an electronic device, the electronic device acquires a video containing the user and displays a video image frame in real time, wherein the video image frame comprises a user image and an environment background image around the user, so that privacy of the user can be possibly leaked during the video conference, and in order to protect the privacy of the user, the user can select the environment background image in the video image frame to replace the environment background image with a preset background image.

By way of example, FIG. 1A illustrates a user interface 110 for a video conference of an electronic device in an embodiment of the application. As shown in fig. 1A, a video image frame 111 and a background replacement control 112 displayed in real-time may be included in the user interface 110 of the video conference. Each video image frame may include an area where a background is located and an area where a target object is located. The region in which the target object is located may be referred to as a foreground region, and the region in which the background content is located may be referred to as a background region. As shown in fig. 1A (a), the video image frame 111 includes an area 1111 and an area 1112 therein. As shown in fig. 1A (b), the region 1111 is a background region, and the region 1112 is a foreground region, i.e., a region where the target object is located. The user may click on the background replacement control 112, in response to which the electronic device may replace the background content in the background area and display the user interface 120.

As shown in fig. 1B, the electronic device displays a user interface 120. A replaced video image frame 121 and a background replacement control 122 may be included in the user interface 120. Wherein the region 1211 included in the video image frame 121 is a background region after bit substitution, and the region 1212 is a foreground region. It can be seen that the region 1211A and the region 1211B in the video image frame 121 do not exist in the hair raised portion of the target object, that is, some limb portions or a part of some limb portions of the target object are replaced as background regions when the electronic device performs background replacement, resulting in replacement errors. In the background replacement, firstly, an image segmentation model is used for segmenting a foreground region and a background region in a video or an image to obtain a target object and initial background content, and then the initial background content of the background region in the video or the image is replaced by preset background content to obtain a replaced image or video. In the process of dividing the foreground region and the background region, the image segmentation model mistakenly takes some limb parts or part of the limb parts of the target object as initial background content, namely, the limb parts or part of the limb parts of the target object are segmented into the background region, so that the image segmentation is inaccurate. Further, in the background replacement process, some limb portions or a part of some limb portions of the target object are replaced as background content. This may result in incomplete limb portions of the target object in the replaced image, i.e. missing limb portions or missing part of limb portions of the target object. For example, the raised portions of hair in region 1211A and region 1211B in FIG. 1B.

In the background replacement technology, the accuracy of image segmentation affects the effect of background replacement. Generally, the more accurate the image segmentation, the less the electronic device will have a probability of erroneously replacing the content in the target object region with the preset background content when performing background replacement.

Currently, image segmentation is to pre-train an image segmentation model and then use the image segmentation model to segment a video image frame. Specifically, first, the electronic device needs to collect a large number of images or videos including the target object to construct an image dataset. Then, the electronic equipment trains the initial image segmentation model by utilizing the image data set to obtain an optimized image segmentation model. The optimized image segmentation model can more accurately segment the target object from the target image. The accuracy of image segmentation depends on the quality of the image dataset used in the training of the image segmentation model. In general, a pre-collected image dataset cannot encompass all individual features. For example, the hairstyle, headwear, appearance of the target object, etc., which may change over time.

When the quality of the image data set during training is lower, that is, when some individual features of the target object are not included, when the trained image segmentation model is used for segmenting the image including the individual features of the target object, the area including the individual features in the area where the target object of the image is located is regarded as a background area, so that the accuracy of image segmentation is poor and the user experience is affected. For example, when the image segmentation model is trained, when the collected image of the target object is short hair, the hairstyle of the target object changes with the passage of time, and at this time, when the original image segmentation model is used for image segmentation, the region where the hairstyle of the target object changes is easily segmented into background regions by mistake, so that the segmentation is inaccurate, and the user experience is affected by error replacement.

Accordingly, embodiments of the present application provide a method of context replacement in which: firstly, the electronic device can replace the acquired first image based on the first image segmentation lightweight model, and replace the first background content in the first image with the second background content to obtain the second image. The electronic device displays a second image with the background replaced for the first image on a display screen. When the user considers that the background replacement accuracy of the second image is not high, the user can select to retrain the first image segmentation lightweight model, namely the electronic device can respond to the background replacement training operation of the user, and the electronic device displays a first gesture graph on a display screen, wherein the first gesture graph is used for indicating a first gesture, and the first gesture graph is used for guiding the user to record videos or images according to the first gesture contained in the first gesture graph. Then, the electronic device may acquire a third image including the user, and the electronic device may retrain the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model. Finally, the electronic equipment uses the target image segmentation lightweight model to segment the first image to obtain a segmentation result, and replaces the first background content in the segmentation result with the second background content to obtain a fourth image. That is, the electronic device uses the image segmentation lightweight model to segment the first image, determines an area where the target object in the first image is located and an area where the background content is located, and when the user is not satisfied with the effect of background replacement, the user can train the image segmentation lightweight model through the background replacement training operation until the effect of the background replacement satisfied by the user is reached. Therefore, when the user is not satisfied with the effect of background replacement based on the current image segmentation lightweight model, the training of the image segmentation lightweight model can be started through user operation, the segmentation accuracy of the image segmentation lightweight model is improved, and therefore when the background replacement is carried out, the accuracy of the background replacement is improved, and the user experience is improved.

In some implementations, the electronic device training image segmentation lightweight model may be that the electronic device displays a pose map P1 on a display screen, where the pose map P1 is used to instruct the target object to take a video or image according to a pose G1 included in the pose map P1. Then, the electronic device acquires an image T1 including the target object, and inputs the image T1 into the image segmentation full model and the image segmentation light model M1, respectively, to obtain a segmentation result 1 and a segmentation result 2. The electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2, and obtains an image segmentation lightweight model M2. The electronic device inputs the image T1 into the image segmentation lightweight model M2 to obtain a segmentation result 3. Then, the electronic device judges whether an error between the segmentation result 1 and the segmentation result 3 is preset with a condition D2, if the error is met, the image segmentation lightweight model M2 is a target image segmentation lightweight model, namely the image segmentation lightweight model M2 can be used for subsequent image segmentation; if the segmentation result 1 and the segmentation result 3 are not satisfied, the electronic device determines an area with poor segmentation in the segmentation result 3, and the area with poor segmentation can be used as an area needing important training for the next round, and then determines a gesture map P2 containing the area with poor segmentation. The electronic device displays a gesture map P2 on the display screen, and the gesture map P2 is used for guiding the user to shoot a video or an image according to a gesture G2 included in the gesture map P2. The electronic device acquires the image T2, and according to the steps, the electronic device uses the image T2 as the image segmentation lightweight model M2 to continue training until the segmentation result of the image segmentation full-scale model and the segmentation result of the image lightweight model meet the preset condition D2.

The image segmentation lightweight model is obtained by cutting and quantizing based on an image segmentation full model, and the number of model parameters in the image segmentation full model is larger than that in the image segmentation lightweight model. In general, the segmentation result of the image segmentation full model is more accurate than that of the image segmentation light model, but the calculation amount of the image segmentation full model is relatively large. In order to reduce the model calculation amount, an image segmentation lightweight model with fewer model parameters is generally deployed on an electronic device for image segmentation, but the segmentation effect of the image segmentation lightweight model is not as good as that of the image segmentation full model, so that in the training process of the image segmentation lightweight model, the image lightweight model to be trained is guided and trained by the image segmentation full model, so that the performance of the image segmentation lightweight model is close to that of the image segmentation full model, namely, the segmentation effect of the image segmentation lightweight model is close to that of the image segmentation full model. Thus, the calculation amount of the electronic equipment can be reduced under the condition of ensuring the image segmentation effect. The training of the image segmentation lightweight model is specifically described in the following examples, and is not repeated here.

The segmentation result 1 and the segmentation result 3 are used for indicating the region where the target object is located and the region where the background content is located in the image T1. The region in which the background content indicated in the division result 1 is located is different from the region in which the background content indicated in the division result 3 is located.

The electronic device determines the poorly segmented region based on the segmentation result 1 and the segmentation result 3, namely determines the poorly segmented region based on the segmentation result of the lightweight model of the previous round of image segmentation, and then screens out the gesture graph containing the poorly segmented region in the representative gesture library as training data for guiding the next round of training, so that the training data is actively screened, and the quality of the training data is improved. And when the light model is trained by the next round of image segmentation, acquiring the user images containing the poorly segmented areas, and performing important training on the poorly segmented areas, namely performing personalized training aiming at the individual characteristics of the target object, thereby improving the training effect of the model and further improving the segmentation accuracy of the model. Thereby reducing the probability of replacement errors when background replacement is performed. In addition, after one round of training, the electronic equipment can judge whether the segmentation result of the image segmentation lightweight model meets the preset condition, and under the condition that the segmentation result meets the preset condition, the training is stopped, so that unnecessary training can be avoided.

In the embodiment of the present application, the above-mentioned gesture image P1 may be also referred to as a first gesture image, the gesture G1 may be also referred to as a first gesture, the image T1 may be also referred to as a third image, the image segmentation lightweight model 1 may be also referred to as a first image segmentation lightweight model, the segmentation result 1 may be also referred to as a first segmentation result, the segmentation result 2 may be also referred to as a second segmentation result, the segmentation result 3 may be also referred to as a third segmentation result, and the image segmentation lightweight model 2 may be referred to as a second image segmentation lightweight model.

The following describes an exemplary background replacement method according to an embodiment of the present application with reference to fig. 2A to 13.

Fig. 2A shows a background alternative method flow diagram of an embodiment of the application. As shown in fig. 2A, the method includes steps S101 to S104.

S101, the electronic equipment displays a second image obtained by performing first background replacement on the first image.

The image T1 may be uploaded by a user, for example, the user needs to replace the background of a certain picture, and the user may upload the picture needing to replace the background on the electronic device. Or may be an image taken by the electronic device or a frame of video. For example, the electronic device may capture an image of the user while the user is using video conferencing software for a video conference, or while the user is video with other people.

Specifically, the first image includes an area where the target object is located and an area where the first background is located. The electronic device can determine an area where the target object in the first image is located and an area where the first background content is located based on the first image segmentation lightweight model, and the electronic device replaces the first background content of the area where the first background content is located in the first image with the second background content to obtain the second image. The first background content and the second background content are different, and the second background content can be preset background content by a user or default background content set by the electronic equipment in a factory.

S102, responding to a background replacement training operation of a user, and displaying a first gesture graph by the electronic equipment.

Wherein the first gesture graph is used for indicating that the user makes a first gesture. The context replacement training operation may be an operation that acts on the context replacement training control. For example, after the electronic device displays the second image, a background replacement training control is displayed on the display screen. When the user is not satisfied with the replacement effect of the second image obtained by performing the first background replacement based on the first image segmentation lightweight model, the replacement training control can be clicked, and the first image segmentation lightweight model training is started. The background replacement training operation may also be a voice instruction or a key press operation. The application is not limited in this regard.

In some embodiments, in response to a user's context replacement training operation, in a video conference scenario, it may also be interpreted as clicking on a context replacement training control, and then ending the video conference sequence. For example, after detecting that the user acts on the background replacement training control, the electronic device may display the first gesture image after the video conference is finished, and then start training for the image segmentation lightweight model.

S103, the electronic device acquires a third image, wherein the gesture of the user in the third image is the first gesture.

The third image may be an image captured by the user in the first posture indicated in the first posture chart. It can be understood that the third image may also be one frame of image in the video frame, the target object records the video according to the first gesture 1 indicated in the first gesture map, the electronic device acquires the video, and one frame of image in the video is intercepted as the third image.

In some alternative embodiments, the third image may be an image set comprising a plurality of images. For example, the electronic device may capture multiple images at a time, capture multiple images in the recorded video, and one of the multiple images may be used as the third image.

And S104, the electronic device displays a fourth image for performing second background replacement on the first image.

Specifically, the electronic device may train the first image segmentation lightweight model based on the third image to obtain the target image segmentation lightweight model. And inputting the first image into a target image segmentation lightweight model for image segmentation, and determining the region where the target object in the first image is located and the region where the first background content is located. And the electronic equipment replaces the first background of the area where the first background content is located in the first image with the second background content to obtain a replaced fourth image and displays the fourth image.

It will be appreciated that upon detection of a background replacement training operation, the electronic device displays a first gesture map indicating that the user has made a first gesture, and then the electronic device obtains a third image including the user, in which the gesture of the user is the first gesture. The electronic device uses the third image as training data of the first image segmentation lightweight model, and can acquire the latest individual characteristics of the user. In this way, the accuracy of the segmentation of the image by the target image segmentation lightweight model obtained by using the third image training is higher. That is, the target image segmentation lightweight model can more accurately segment the target object and the background content from the image. Therefore, the size of the region in which the target object is located in the fourth image is closer to the size of the region in which the target object is located in the first image than in the second image obtained by performing the first background replacement on the first image based on the first image segmentation lightweight model and the fourth image obtained by performing the first background replacement on the first image based on the target image segmentation lightweight model. That is, the accuracy of the background replacement of the fourth image is higher than that of the second image.

The specific training process of the electronic device to obtain the target image lightweight model based on the third image training may refer to the description in the embodiment of fig. 2B, which is not described herein.

In one possible implementation, FIG. 2B illustrates a process of how an electronic device trains to obtain a target image segmentation lightweight model. As shown in fig. 2B, the process of training the electronic device to obtain the target image segmentation lightweight model is as follows:

s201, based on the first operation, the electronic apparatus displays a posture chart P1, the posture chart P1 indicating a posture G1.

In one possible implementation, before step S101, in a case where the electronic device meets the preset condition D1, the electronic device displays a user interface a, in which a prompt box may be displayed, where the prompt box is used to prompt whether the target object retrains the image segmentation lightweight model M1. The electronic device is preconfigured with an image segmentation lightweight model M1. The preset condition D1 may include:

the electronic device detects that the use time length of the image segmentation lightweight model M1 exceeds a preset time length, or the electronic device periodically detects the segmentation effect of the image segmentation lightweight model M1, and when the segmentation effect of the image segmentation lightweight model M1 is detected to meet a preset condition D2.

The electronic device may be configured with a preset duration, which may be one month, one year, etc. The embodiment of the application does not limit the specific value of the preset time length. That is, when the electronic device detects that the time of use of the image segmentation lightweight model M1 is longer than one month, the electronic device may restart training of the image segmentation lightweight model M1.

Regarding the preset condition D2, reference may be made to the following description in step S106, which is not repeated here.

Illustratively, the user interface A may be as the user interface 210 shown in FIG. 3A. As shown in fig. 3A, the user interface 210 may include: status bar 211, calendar indicator 212, weather indicator 213, prompt box 214.

Wherein: status bar 211 may include one or more signal strength indicators of a mobile communication signal, one or more signal strength indicators of a wireless fidelity (WiFi) signal, a battery status indicator, and a time indicator. Calendar indicator 212 may be used to indicate the current time. Weather indicator 213 may be used to indicate a weather type.

Prompt box 214 includes prompt 214a, determination control 214b, and cancel control 214c. Wherein, the prompt 214a is used to prompt whether the target object retrains the image segmentation lightweight model M1, as shown in fig. 3A, the prompt 214a may be "whether to start the flow of the reconstruction of the background replacement engine? ", determination control 214b is used to determine the retraining image segmentation lightweight model M1, and cancellation control 214c is used to cancel the retraining image segmentation lightweight model M1. It is understood that the prompt box 214 may be displayed in a stack on the user interface 210.

It should be understood that the shape of the prompt box 214 and the specific contents of the prompt box 214 are not limited in the embodiment of the present application.

In some alternative embodiments, the user interface A may be as the user interface 220 shown in FIG. 3B. As shown in fig. 3B, the user interface 220 includes a prompt box 221. The relevant description of the prompt box 221 is referred to the relevant description of the prompt box 214, and is not repeated herein.

In some alternative embodiments, the user interface A may be as the user interface 230 shown in FIG. 3C. As shown in fig. 3C, the user interface 230 includes a calendar indicator 232, a prompt box 231. Wherein: the related description of the calendar indicator 232 is referred to the related description of the calendar indicator 212, and is not described herein; the relevant description of the prompt box 231 refers to the relevant description of the prompt box 214, and is not repeated herein.

The first operation may be a touch operation applied to the determination control 214b in fig. 3A, and in response to the touch operation, the electronic device displays the gesture chart P1. The gesture map P1 is used to indicate a gesture G1, where the gesture G1 may be, for example, a gesture of two hands finishing headphones. It should be understood that the above-mentioned gesture G1 is merely illustrative, and in practical applications, the gesture G1 may be other gestures, for example, hand lifting, head holding, etc., and the specific gesture form of the present application is not limited. In embodiments of the present application, the first operation may be referred to as a context replacement training operation. In an alternative implementation, the first operation may also be an operation that acts on the context replacement training control. For example, the electronic device may segment the image using the image segmentation lightweight model M1, background replace, and display a user interface containing the replaced image, in which the electronic device may display a background replacement training control for training the image segmentation lightweight model M1.

In some alternative embodiments, based on the first operation, in the video conference scene, it may also be interpreted as clicking on the background replacement training control, and then ending the video conference sequence of operations. For example, after detecting that the user acts on the background replacement training control, the electronic device may display the first gesture image after the video conference is finished, and then start training for the image segmentation lightweight model.

Referring to fig. 3D, for example, fig. 3D illustrates a user interface 240 of the electronic device display gesture map P1. As shown in fig. 3D, the user interface 240 includes: recording guide box 241, prompt 242, determination control 243, return control 244. The recording guide frame 241 is used for displaying a gesture chart P1, the gesture chart P1 is used for indicating a gesture 1, and the gesture 1 is a gesture of arranging headphones with both hands. The prompt 242 is used to prompt the target object to complete the gesture 1 corresponding action, for example, the prompt 242 may be "action requirement: two hands sort the headphones "," please complete the specified action in the white area in the recording guide frame ". Return control 244 is used to exit current user interface 240, return to a previous level user interface, such as user interface 220. The determination control 243 is used to obtain photographs or videos taken by the electronic device. When the electronic device detects a touch operation of the determination control 243, the electronic device displays the user interface 250 in response to the touch operation.

It should be understood that the recording guide frame 241 and the prompt information 242 are merely examples, and the shape of the recording guide frame 241 and the prompt information 242 and the specific content in the recording guide frame 241 and the prompt information 242 are not limited in the embodiments of the present application.

As shown in fig. 3E, the user interface 250 includes: recording effect preview box 251, prompt 252, determination control 253, return control 254. The recording effect preview box 251 is used to display a photograph or video currently taken by the electronic device. The return control 254 is used to exit the current user interface 250, returning to a previous level user interface, such as the user interface 240. Determination control 253 is used to obtain a photograph or video taken by the electronic device. When the electronic device detects a touch operation of the determination control 253, in response to the touch operation, the electronic device acquires an image including a target object, where the image includes an area where the target object is located and an area where a background is located, and the gesture of the target object is the gesture displayed in the gesture map P1.

It is to be understood that the recording effect preview box 251 and the prompt message 252 are merely examples, and the shape of the recording effect preview box 251 and the prompt message 252 and the specific contents in the recording effect preview box 251 and the prompt message 252 are not limited in the embodiment of the present application.

In some alternative embodiments, the electronic device may display the recording guide box 241 and the recording effect preview box 251 in the same user interface. Illustratively, when the electronic device detects a touch operation on the determination control 214b, the electronic device displays the user interface 260 in response to the touch operation. As shown in fig. 3F, the user interface 260 includes: recording guide 261, recording effect preview 262, prompt 263, return control 264, determine control 265. Wherein:

the relevant description of the recording guide 261 is referred to the relevant description of the recording guide 241, and is not repeated here.

The description of the recording effect preview box 262 is referred to the description of the recording effect preview box 251, and is not repeated here. The description of the prompt 263 is referred to the description of the prompt 252, and is not repeated here. The return control 264 is used to return to the previous level user interface and the determine control 265 is used to take a photograph or video taken by the electronic device.

S202, the electronic device acquires an image T1, and the pose of the target object in the image T1 is the pose G1 indicated in the pose map P1.

Specifically, the image T1 is an image in which the target object is photographed in accordance with the pose G1 in the pose map P1, and the pose of the target object is the pose G1 indicated in the pose map P1 included in the image T1. For example, the image T1 may be an image in the recording effect preview box 251 in the embodiment of fig. 3F described above.

It may be understood that the image T1 may also be one frame of image in a video frame, the target object records a video according to the gesture G1 indicated in the gesture map P1, the electronic device acquires the video, and captures one frame of image in the video as the image T1.

In some alternative embodiments, the image T1 may be an image collection containing multiple images. For example, the electronic device may capture multiple images at a time, capture multiple images in the recorded video, and one of the multiple images may be used as the image T1.

S203, the electronic device inputs the image T1 into the image segmentation full model to obtain a segmentation result 1, and inputs the image T1 into the image segmentation light model M1 to obtain a segmentation result 2.

The image full-quantity model is a machine learning model with high image segmentation accuracy which is trained in advance. That is, the image segmentation full model is a model obtained after convergence of training based on the initial image segmentation full model. The image segmentation lightweight model M1 is a model trained based on the initial image segmentation lightweight model. It will be appreciated that the image segmentation full model and the initial image segmentation lightweight model may be pre-trained on or pre-configured by the electronic device, as the application is not limited in this respect.

In some embodiments, the initial image segmentation lightweight model is obtained based on a clipping and quantization of an initial image segmentation full model, wherein the number of model parameters in the initial image segmentation full model is greater than the number of model parameters in the initial image segmentation lightweight model. The image segmentation full model is used for guiding and training the image light model to be trained to obtain a target image segmentation light model, so that the performance of the target image segmentation light model is close to that of the image segmentation full model, namely the segmentation effect of the target image segmentation light model is close to or consistent with that of the image segmentation full model.

Specifically, the segmentation result is used for indicating an area where the target object is located and an area where the background content is located in the segmented image, and the segmentation result may include pixel information of the pixel points in the segmented image. That is, the segmentation result 1 is used for indicating the region where the target object is located and the region where the background content is located in the image T1, and the region where the target object is located and the region where the background content is located indicated in the segmentation result 1 are segmented by the image segmentation full model, and the segmentation result 1 includes the pixel information 1 of the pixel point in the image T1. The segmentation result 2 is used for indicating an area where the target object is located and an area where the background content is located in the image T1, the area where the target object is located and the area where the background content is located indicated in the segmentation result 2 are segmented by the image segmentation lightweight model 1, and the segmentation result 2 comprises pixel information 2 of the pixel points in the image T1.

It will be appreciated that the segmentation results of different image segmentation lightweight models for the same image are different, that is, the pixel information 1 and the pixel information 2 are different, that is, the pixel information of the pixel point in the image T1 included in the segmentation result 1 and the segmentation result 2 are different.

In some embodiments, the pixel information of the pixel point may be a probability value that the pixel point is a foreground pixel point. Specifically, the segmentation result may be a predicted foreground probability result of the image T1, where the predicted foreground probability result includes a probability value when each pixel in the image T1 is a foreground pixel, and the probability value is a real number between 0 and 1. For example, when the probability value of the foreground pixel point in the foreground region is 1 and the probability value of the foreground pixel point in the background region is 0 in the segmentation result.

In other embodiments, the pixel information of the pixel point may also be a pixel value of the pixel point, for example, an RGB value, a gray value, and the like. The segmentation result may be a binarized image corresponding to the image T1, which is used to distinguish a foreground area and a background area, where a pixel value of a pixel point in the foreground area is 255, and a pixel value of a pixel point in the background area is 0. Alternatively, the pixel value of the pixel point in the foreground region may be 0, and the pixel value of the pixel point in the background region may be 255.

As shown in fig. 4, the segmentation result is shown in fig. 4, in which the black part is a background region containing pixels with values of 255, and the white part is a foreground region, i.e., a region where the target object is located, containing pixels with values of 0.

In other embodiments, the pixel information of the pixel may also be a foreground label of the pixel, and the foreground label may be a value, for example, a value of 1 or a value of 0. For example, when an image is input and an image is split into a lightweight model, if a predicted pixel is a foreground pixel, a foreground label is added to the pixel as 1, and if the pixel is a background pixel, a label is added to the pixel as 0.

S204, the electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2 to obtain an image segmentation lightweight model M2.

Specifically, the electronic device calculates an error value between the segmentation result 1 and the segmentation result 2, trains the image segmentation lightweight model 1 using the error value, and adjusts model parameters of the image segmentation lightweight model 1 to obtain the image segmentation lightweight model 2.

In particular, fig. 5 illustrates a training process for an image segmentation lightweight model. As shown in fig. 5, the image T1 is divided into an input image division full model and an image division light model M1, and a division result 1 and a division result 2 are obtained. And then determining errors between the segmentation result 1 and the segmentation result 2, and correcting model parameters of the image segmentation lightweight model M1 by utilizing the errors to obtain a training corrected lightweight model, namely the image segmentation lightweight model M2.

Illustratively, the image segmentation full model, the image segmentation lightweight model may be a deep neural network model, a convolutional neural network model, or the like, which is not limited by the embodiments of the present application. For example, the image segmentation full model may be a deep neural network model A1, and the image segmentation light model may be a deep neural network model A2 obtained by clipping the deep neural network model A1. The model parameters of the deep neural network model A2 are smaller than the model parameters of the deep neural network model A1.

S205, the electronic device inputs the image T1 into the image segmentation lightweight model M2, and outputs the segmentation result 3.

Specifically, after the first round training, the electronic device obtains the image segmentation lightweight model M2, and the electronic device tests the image segmentation lightweight model M2. That is, the electronic device inputs the image T1 into the image segmentation lightweight model M2, and obtains an output result 3. The segmentation result 3 is used for indicating the region where the target object is located and the region where the background content is located in the image T1, and for the relevant description of the region where the target object is located and the region where the background content is located in the image T1 indicated by the segmentation result 3, reference is made to the relevant description in the segmentation result 1 and the segmentation result 2, which are not described herein.

S206, the electronic equipment judges whether the segmentation result 1 and the segmentation result 3 meet a preset condition D2, if not, the step S207 is executed; if yes, step S209 is executed.

The preset condition D2 is that the target area exists in the segmentation result 3, namely, the poorly segmented area. That is, with the division result 1 as a label, the electronic device calculates the error between the division result 1 and the division result 3 first, and when the division result 3 has an area with poor division with respect to the division result 1, it is determined that the difference between the division result 1 and the division result 3 satisfies the preset condition.

Specifically, the electronic device calculates the difference value of pixel information between the segmentation result 1 and the segmentation result 3, that is, the electronic device calculates the difference value of pixel information of the same pixel point in the segmentation result 1 and the segmentation result 3, determines that the pixel point with the difference value of the pixel information larger than a first threshold value is a first target pixel point, then matches the first target pixel point with a limb area of a target object in the image T2, determines whether the number of the first target pixel points in the limb area of the target object is larger than a second threshold value, and if the number of the first target pixel points in one limb area of the target object is larger than the second threshold value, the limb area is a target area, that is, an area with poor segmentation, and the electronic device determines that the segmentation result 1 and the segmentation result 3 meet a preset condition D2. In embodiments of the present application, one or more of the limbs contained in the target area may be referred to as a third limb.

In some optional embodiments, the electronic device may capture a plurality of images or videos at a time, input the plurality of images into the image segmentation full-size model and the image segmentation light-weight model, obtain a plurality of segmentation results of the image segmentation full-size model and a plurality of segmentation results of the image segmentation light-weight model, determine whether a plurality of differences between the plurality of segmentation results of the image segmentation full-size model and the plurality of segmentation results of the image segmentation light-weight model satisfy the preset condition D2, and consider that the segmentation results satisfy the preset condition D3 if the number of differences satisfying the preset condition D2 is greater than the preset number threshold.

It is understood that, in the embodiment of the present application, the above-mentioned preset condition D2 may also be referred to as a first preset condition.

By way of example, a specific process for determining the target region in the embodiment of the present application will be described below with reference to fig. 6, taking the probability value that each pixel point in the image T1 is a foreground pixel point as an example of the segmentation result.

S2061, the electronic device determines the first target pixel, which is the pixel with poor segmentation in the segmentation result 3.

Specifically, the probability value of the pixel point i in the segmentation result 1 as the foreground is Y1i, the probability value of the pixel point i in the segmentation result 3 as the foreground is Z1i, and the absolute value of the difference between the pixel points i in the segmentation result 1 and the segmentation result 3 is Hi:

Hi＝|Y1i–Z1i|

When the absolute value of the difference value of the pixel point i is greater than the first threshold, the pixel point i is a poorly segmented pixel point, i.e. a first target pixel point.

S2062, the electronic device inputs the image T1 into the limb area detection model to obtain a limb area diagram corresponding to the target object in the image T1, wherein the limb area diagram comprises areas where one or more limbs of the target object in the image T1 are located.

In some embodiments, the limb area detection model may also be referred to as a body area detection model, and the limb area refers to the area in which the limb is located, and the limb may include a head, neck, right shoulder, right thigh, right forearm, right hand, left shoulder, left thigh, left forearm, left hand, torso, right crotch, right thigh, right calf, right foot, left crotch, left thigh, left calf, left foot, and the like.

It will be appreciated that the above-described division of limbs is illustrated, and that in practical applications, other divisions are possible, and the application is not limited thereto.

And S2063, the electronic equipment matches the segmentation result 3 with the limb area diagram and determines the number of first target pixel points in the area where the limbs of the target object are located.

Specifically, the electronic device matches the pixel points in the segmentation result 3 with the areas where the limbs of the target object are located in the image T1, so as to obtain the number of corresponding first target pixel points in the areas where the limbs of the target object are located.

S2064, the electronic device determines a target region based on the number of first target pixel points in the region where the plurality of limbs of the target object are located.

Specifically, when the number of the first target pixel points in the region where one limb of the plurality of limbs is located is greater than the second threshold, the region where the limb is located is the target region. For example, the preset value may be 1000, and when the number of target pixels included in the area where the left hand is located is greater than 1000, the area where the left hand is located is considered to be the target area, that is, the area where the left hand is located is the area with poor segmentation.

S207, the electronic device replaces the initial background content in the image T1 with the preset background content to obtain a replaced image T2.

Specifically, in the case where the segmentation result 1 and the segmentation result 3 satisfy the preset condition, the electronic device determines the image segmentation lightweight model M2 as a target image segmentation lightweight model, which is a model for subsequent image segmentation. The electronic device replaces the initial background content in the image T1 with the preset background content based on the segmentation result 3 to obtain a replaced image T2.

The preset background content can be preset background content of a target object, the target object can select image background content liked by the target object as the preset background content, and the preset background content can also be default background content set by the electronic equipment in a factory.

In the embodiment of the present application, the image T2 may be referred to as a seventh image. The initial background content in the image T1 may be referred to as a third background content.

S208, the electronic device displays the replaced image T2.

For example, as shown in fig. 7, the electronic device may display a user interface 270, the user interface 270 comprising: a replacement effect box 271, prompt 272, return control 273, determine control 274. Wherein, the replacement effect box 271 is used for displaying the replaced image, the prompt information 272 is used for prompting whether the target object is satisfied with the segmentation effect of the image segmentation lightweight model 2, the control 273 is returned, and the determination control 274 is used for determining that the image segmentation lightweight model 2 is the target image segmentation lightweight model. When the target object is satisfied with the replacement effect, the target object may click on the determination control 274, at which point the electronic device ends training of the image segmentation lightweight model M2, the image segmentation lightweight model 2 being the target image segmentation lightweight model, that is, the image segmentation lightweight model being used for subsequent image segmentation. In this way, redundant training can be avoided. When the target object is not satisfied with the replacement effect, the target object may click on the return control 273 and the electronic device continues the next round of training.

In some possible implementations, when the electronic device detects a touch operation on the return control 273, the electronic device modifies the first threshold to the third threshold in response to the touch operation. Wherein the third threshold is less than the first threshold. Then, the electronic device determines whether the division result 1 and the division result 3 satisfy a second preset condition, and determines a posture map P based on the division result 1 and the division result 3 when a difference between the division result 1 and the division result 3 does not satisfy the second preset condition, and continues to train the image division lightweight model using the posture map P. The electronic device determines the relevant description of the gesture map P based on the segmentation result 1 and the segmentation result 3, which are described in the following embodiments, and are not described in detail herein.

The second preset condition is that the number of the target pixel points in the area where the third limb is located in the first image is larger than a second threshold value; the target pixel point is a pixel point of which the difference value between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the second segmentation result is larger than a third threshold value.

Illustratively, the electronic device calculates the pixel information of the pixel points in the division result 1 and the division result 3, and the electronic device determines the pixel points in the division result 1, the pixel points of which the difference value between the pixel information of the pixel points in the division result 3 and the pixel information of the pixel points in the division result 1 is larger than a third threshold value, which are the poorly divided pixel points. The electronic device matches the poorly segmented pixels with the limb area of the target object in the image T2, determines that the number of poorly segmented pixels in the limb area of the target object is greater than a second threshold, and if the number of poorly segmented pixels in one limb area of the target object is greater than the second threshold, the limb area is the poorly segmented area, and the electronic device determines that the segmentation result 1 and the segmentation result 3 satisfy the preset condition D3.

S209, the electronic device determines the posture map P2 based on the division result 1 and the division result 3.

Specifically, the electronic device compares the segmentation result 1 and the segmentation result 3, so that a poorly segmented region in the segmentation result 3 can be determined, and the electronic device selects a gesture graph including limbs corresponding to the poorly segmented region from the representative gesture library according to the poorly segmented region in the segmentation result 3. The electronic device can configure a representative gesture library, wherein the representative gesture library comprises a plurality of gesture graphs. The representative gesture library may be pre-constructed by the electronic device, and the construction of the representative gesture library is specifically referred to in the following description of the embodiment of fig. 9, which is not repeated herein.

The process of determining the gesture map P2 by the electronic device based on the segmentation result 1 and the segmentation result 3 may specifically include:

1, the electronic device determines a target area based on the division result 1 and the division result 3.

The related operation of the electronic device for determining the target area based on the segmentation result 1 and the segmentation result 3 may specifically refer to the related operation of determining the target area based on the segmentation result 1 and the segmentation result 3 in step S106, which is not described herein.

2, the electronic device determines a gesture map P2 based on the target region.

In particular, the target region corresponds to a limb of the target object, and the target region may correspond to one or more limbs of the target object.

When the limb containing the target object in the target area is one, the electronic equipment firstly determines one or more gesture graphs containing the limb from the representative gesture library. When the posture image including the limb is one, then the posture image is a posture image P2. When the plurality of gesture images including the limb are included, the electronic device randomly selects one gesture image from the plurality of gesture images as a gesture image P2, or the electronic device determines that the gesture image with the largest area of the region where the limb is located in the plurality of gesture images is the gesture image P2. In some embodiments, the limb encompassed by the gesture map P2 may be referred to as a first limb.

When the number of limbs containing the target object in the target area is multiple, the electronic equipment firstly determines one or more gesture graphs containing the multiple limbs from the representative gesture library. Similarly, when the posture map including the plurality of limbs is one, then the posture map is a posture map P2. When the plurality of gesture images including the plurality of limbs are included, the electronic device randomly selects one gesture image from the plurality of gesture images as a gesture image P2, or the electronic device determines that the gesture image with the largest area of the plurality of limbs in the plurality of gesture images is the gesture image P2. Specifically, the electronic device calculates the number of corresponding target pixel points in the region where the plurality of limbs are located in the plurality of gesture graphs, wherein the gesture graph with the largest addition of the number of corresponding target pixel points in the region where the plurality of limbs are located is taken as a gesture graph P2. In some embodiments, the plurality of limbs contained in the gesture map P2 may be referred to as a first limb.

In the embodiment of the present application, the posture chart P2 may be referred to as a second posture chart.

S210, the electronic device displays a posture chart P2, the posture chart P2 indicating a posture G2.

Specifically, the electronic device displays a gesture map P2 on the display screen, the gesture map P2 being used to instruct the user to make a gesture G2. The posture chart P2 may be the posture chart shown in fig. 3A described above. The pose G2 may be the pose of a two-handed grooming earpiece in the embodiment of fig. 2B. It should be understood that the above-mentioned gesture G2 is merely illustrative, and in practical applications, the gesture G2 may be other gestures, for example, hand lifting, head holding, etc., and the specific gesture form of the present application is not limited.

In the embodiment of the present application, the posture G2 may also be referred to as a second posture.

S211, the electronic device acquires an image T3, and the pose of the target object in the image T3 is the pose G2 indicated in the pose map P2.

Specifically, the image T3 is an image taken by the target object in the posture indicated in the posture chart P2 or one frame of image in the recorded video. The image T3 contains the target object.

In an embodiment of the present application, the image T3 may be referred to as a fifth image.

S212, according to the steps S203-S209, the electronic equipment trains the image segmentation lightweight model M2 based on the image T3 and the image segmentation total model until the end condition of model training is met, and a target image segmentation lightweight model is obtained.

Specifically, the image segmentation lightweight model is subjected to multiple rounds of iterative training according to the steps, and each round of iterative training is to gradually converge the model by adjusting model parameters of the initial image segmentation lightweight model of the round, so that the target image segmentation lightweight model is obtained.

The end condition of the model training may be that the iteration training frequency of the image segmentation lightweight model reaches a preset iteration frequency, or that the index of the image segmentation processing performance of the image segmentation lightweight model after the parameter adjustment reaches a preset index. For example, the preset index may satisfy the preset condition D2 with respect to the segmentation result of the image segmentation lightweight model and the segmentation result of the image segmentation full-scale model.

In some implementations, the electronic device inputs the image T3 into the image segmentation full model and the image segmentation light model 2, respectively, resulting in a segmentation result 4 and a segmentation result 5, and the electronic device trains the image segmentation light model M2 using the segmentation result 4 and the segmentation result 5, resulting in the image segmentation light model M3. Then the electronic device inputs the image T3 into the image segmentation lightweight model M3 to obtain a segmentation result 6, the electronic device judges whether the segmentation result 4 and the segmentation result 6 meet a preset condition D2, and when the segmentation result 4 and the segmentation result 6 meet the preset condition D2, the image segmentation lightweight model M3 is the target image segmentation lightweight model. The electronic device may replace the initial background content in the region of the image T3 where the background content is located with the preset background content based on the segmentation result 6.

In some implementations, in a case where the segmentation result 4 and the segmentation result 6 do not meet the preset condition D2, the electronic device determines an area with poor segmentation based on the segmentation result 4 and the segmentation result 6, and then determines a gesture map P3 from a representative gesture library according to the area with poor segmentation, where the gesture map P3 is used to instruct the user to make the gesture G3. The relevant operation of the electronic device in determining the gesture map P3 based on the segmentation result 4 and the segmentation result 6 is referred to the relevant operation in step S109, and will not be described herein.

After determining the gesture map P3, the electronic device may display the gesture map P3. The relevant description of the electronic device display gesture map P3 may be referred to the relevant description in the embodiments of fig. 3D to 3F, which is not repeated herein.

The electronic device acquires an image T4, wherein the image T4 is an image or a frame of image in a video frame taken by a user according to a gesture G3 in a gesture map P3, and the gesture of the target object in the image T4 is a gesture G3 indicated in the gesture map P3.

The electronic device may train the image segmentation lightweight model M3 based on the image T4 to obtain the image segmentation lightweight model M4. The electronic device tests the segmentation effect of the image segmentation lightweight model M4 by using the image T4, and when the segmentation result of the image segmentation lightweight model M4 satisfies the preset condition D2, the image segmentation lightweight model M4 is the target image segmentation lightweight model. If the segmentation result of the image segmentation lightweight model M4 does not meet the preset condition D2, the electronic device re-determines the gesture graph according to the segmentation result, and acquires the image training image segmentation lightweight model M4 until the segmentation result of the image segmentation lightweight model M4 meets the preset condition D2.

In the embodiment of the present application, the segmentation result 4 may be referred to as a fourth segmentation result, the segmentation result 5 may be referred to as a fifth segmentation result, the image segmentation lightweight model M3 may be referred to as a third image segmentation lightweight model, and the segmentation result 6 may be referred to as a sixth segmentation result. The posture map P3 may be referred to as a third posture map, and the posture G3 may be referred to as a third posture. The image T4 may also be referred to as a sixth image.

S213, the electronic device acquires an original image T5, inputs the original image T5 into a target image segmentation lightweight model, and determines the area where the target object in the original image is located and the original background content.

The original image may be an image of a frame in an image or video uploaded by the target object, or may be an image of a frame in an image or video including the target object, which is shot by the electronic device.

In the embodiment of the present application, the original image T5 may also be referred to as a first image, and the original background content may also be referred to as a first background content.

S214, replacing the original background content of the original background content area in the original image T5, to obtain a replaced image T6.

Specifically, after dividing the region where the original background content is located and the region where the target object is located in the original image T5, the electronic device synthesizes the region where the target object is located and the preset background into a new image, that is, a replaced image T6. Wherein the background content in the replaced image is different from the original background content. The preset background may be set by the target object itself or may be set by default by the electronic device, for example, may be a landscape image or the like.

In an embodiment of the present application, the image T6 may also be referred to as a fourth image.

Illustratively, fig. 8 shows a schematic diagram of an electronic device user segmenting an original image using an image segmentation lightweight model M1 and a target image segmentation lightweight model, and then replacing the segmented image.

As shown in fig. 8 (a), when the user clicks the background replacement control 112, the electronic device segments the foreground region and the background region in the video image frame 811, and may segment the foreground region and the background region. As shown in fig. 8 (b), the video image frame 811 is divided by the image division light model M1, and a division result 7 is obtained, and for better illustration, the foreground region and the background region in the division result 7 shown in fig. 8 (b) are respectively distinguished by different colors, wherein the white region indicates the foreground region divided by the image division light model M1, and the black region indicates the background region divided by the image division light model M1. In the segmentation result shown in fig. 8 (B), it can be seen that the image segmentation lightweight model M1 erroneously regards the edge region of the target object as the background region, i.e., the hair convex portions in the region 8111B and the region 8111A in the drawing as the background content in the region 811. As shown in fig. 8 (c), after the background content in the video image frame 811 is replaced with the preset background content again, a replaced video image frame 821 is obtained, the hair-protruding portions in the region 8111B and the region 8111A thereof are replaced as the background content, and the hair-protruding portions are not present in the regions 8211A and 8211B in the replaced video image frame 821.

As shown in fig. 8 (d), when the user clicks the background replacement control 812, the electronic device segments the foreground region and the background region in the video image frame 811, and may segment the foreground region and the background region. As shown in fig. 8 (e), the video image frame 811 is segmented by the target image segmentation lightweight model, and a segmentation result 8 can be obtained. It can be seen that the target image segmentation lightweight model can well distinguish the hair bulge of the target object in the region 8111A and the region 8111B from the background region. Accordingly, as shown in fig. 8 (f), after the video image frame 811 is replaced, a video image frame 821 is obtained, and the hair convex portions in the region 8111B and the region 8111A remain in the video image frame 821.

It should be noted that, before step S101, the electronic device may also construct a representative gesture library.

Illustratively, as shown in FIG. 9, the electronic device building a representative gesture library may include the steps of:

s301, the electronic device acquires an image dataset.

The image data set may be a large number of images of the target object collected in advance, or may be an image frame included in video data of the target object collected in advance. The embodiment of the present application is not limited thereto.

In an alternative embodiment, the preset image set may be crawled for public websites or obtained from a large public image database.

The image dataset contains gesture features and contour features of the user, the gesture features referring to the action behavior of the user, such as turning around, turning around the body, sitting up, etc. The outline feature of the user refers to a line that constitutes the outer edge of the user.

S302, the electronic equipment inputs the image dataset into a human body posture estimation model to obtain a plurality of human body posture vectors corresponding to the image dataset.

Specifically, the human body posture estimation model can identify skeleton key points of a human body in an image and limb vectors formed by the skeleton key points. Wherein, the skeleton key of human body is used to represent skeleton information of human body, which can be used to describe human body posture.

The number and the types of the skeleton key points are determined by the human body posture estimation model, and the number and the types of the skeleton key points output by different human body posture estimation models are different. In the embodiment of the present application, the explanation is given by taking the example of dividing the skeletal key points of the human body into 15 skeletal key points, and in practical application, the skeletal key points of the human body can be further divided into 9, 17, etc., which is not limited in this application. Wherein, 15 skeleton key points can be connected to form 14 limb vectors, and the limb vectors can be calculated by the coordinate positions of the 15 skeleton key points.

Illustratively, fig. 10 illustrates a set of bone keypoint data, with only a portion of the bone keypoints and a portion of the limb vectors illustrated in fig. 10. As shown in fig. 10, the circular points in the figure are skeletal keypoints, each skeletal keypoint is represented by coordinates (X, Y), and adjacent keypoints are connected to form a limb vector, and a pair of limb vectors of a target object in an image may be referred to as a pose vector. For example, the coordinates of bone keypoint 3 are (X3, Y3), the coordinates of bone keypoint 4 are (X4, Y4), and bone keypoint 3 and bone keypoint 4 may be connected to form a limb vector (X3-X4, Y3-Y4) representing a limb, which may be referred to as a left shoulder.

S303, the electronic equipment inputs a plurality of human body posture vectors corresponding to the image data set into a clustering model to obtain one or more representative posture vectors.

Specifically, one image may obtain a plurality of limb vectors, the plurality of limb vectors of the image may form one pose vector, and the image dataset includes a plurality of images, a plurality of pose vectors may be obtained. The electronic device maps the plurality of gesture vectors to a vector space, one gesture vector is a point of the vector space, then the similarity between the pixel points is calculated, the gesture vectors with high similarity are gathered together to form a cluster, and the vector at the center of the cluster (namely, the cluster center) is selected as the representative gesture vector.

Illustratively, FIG. 11 illustrates a schematic diagram of a process by which electronic devices cluster to obtain representative gesture vectors. As shown in fig. 11 (a) by way of example, 4 clusters, in which one circular point in each cluster represents one posture vector, that is, a posture of one human body, for example, a posture of holding headphones with both hands, holding headphones with one hand, or the like. The black five-pointed star in the cluster represents the cluster center point of the cluster, namely the cluster center, and the cluster center vector of each cluster is selected as a representative gesture vector. As shown in fig. 11 (b), the cluster center vector of cluster 1 represents a posture in which the headset is held by one hand; as shown in fig. 11 (c), the cluster center vector of the cluster 2 represents a posture in which both hands hold the earphone.

S304, the electronic equipment inputs one or more representative gesture vectors into the human body contour detection model to obtain contour diagrams corresponding to the one or more representative gesture vectors.

Referring to fig. 12 for an example, fig. 12 shows an exemplary schematic diagram representing a profile map derived from a gesture vector. Fig. 12 (a) and (c) show two representative gesture vectors, which represent the gesture of holding the earphone with one hand and the gesture of holding the earphone with both hands, respectively. Fig. 12 (b) and (d) show contour diagrams obtained based on two representative posture vectors.

S305, the electronic device inputs the training data set into the limb area detection model to obtain one or more limb area diagrams corresponding to the image data set.

Illustratively, as shown in fig. 13, fig. 13 illustrates a schematic view of a limb area. As shown in fig. 13, different color areas in the drawing represent different limb areas, for example, dark gray represents an area where the head is located, light gray represents an area where the left hand is located, and the like.

Wherein the limb may include a head, neck, right shoulder, right thigh, right forearm, right hand, left shoulder, left thigh, left forearm, left hand, torso, right crotch, right thigh, right calf, right foot, left crotch, left thigh, left calf, left foot.

And S306, the electronic equipment matches one or more limb area diagrams corresponding to the image dataset with contour diagrams corresponding to one or more representative gesture vectors to obtain one or more gesture diagrams.

Wherein the representative pose library may comprise one or more pose graphs corresponding to the image dataset.

It should be noted that, for simplicity of description, the above method embodiments are all described as a series of combinations of actions, but it should be understood by those skilled in the art that the present invention is not limited by the order of actions described, and further, those skilled in the art should also understand that the embodiments described in the specification belong to preferred embodiments, and the actions involved are not necessarily required for the present invention.

It should be noted that, the electronic device referred to in the above embodiments may be referred to as an electronic device 100, and the electronic device 100 may include at least one of a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a cellular phone, a personal digital assistant (personal digital assistant, PDA), an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, an artificial intelligence (artificial intelligence, AI) device, a wearable device, a vehicle-mounted device, a smart home device, or a smart city device. The embodiment of the present application is not particularly limited as to the specific type of the electronic device 100.

Next, a software architecture of the electronic device 100 in the embodiment of the present application will be described.

Fig. 14 shows a software structure of the electronic device 100 according to an embodiment of the present application.

As shown in fig. 14, the software structure of the electronic device 100 may include: the system comprises a background replacement engine, a background replacement client, an acquisition module, a display module and a representative gesture library. Wherein:

And the interactive display module is used for receiving user operation, displaying the gesture graph and displaying the image before background replacement and the image after the background replacement. For example, the interactive display module may receive the first operation and display the gesture map P1 based on the first operation, e.g., may display the user interfaces in fig. 3A to 3F described above, etc. Specifically, the interactive display module is used for displaying the gesture map P1. For another example, the interactive display module may display the first image, display the second image with background replacement based on the first image segmentation lightweight model, or display the fourth image with background replacement based on the target image segmentation lightweight model.

And the acquisition module is used for acquiring the image or video of the target object. For example, the image T1 in step S201 may be acquired, wherein the image T1 is an image of the target object photographed in the posture G1 in the posture chart P1, or one frame of image of the video of the target object photographed in the posture G1 in the posture chart P1.

And the background replacement client is used for acquiring a gesture image P1 from the representative gesture library according to preset configuration after receiving the first operation, and sending the gesture image P1 to the interactive display module for display.

The image processing module is used for receiving the image including the target object sent by the acquisition module and sending the image to the background replacement engine. For example, the image including the target object may be the image T1 in step S202, the image T2 in step S211, or the like described above.

And the system is also used for receiving the segmentation result of the image segmentation full model and the segmentation result of the image segmentation light model, which are sent by the background replacement engine, analyzing the segmentation result and obtaining the posture graph information required by the next training round. And then sending the posture graph information to a representative posture library to obtain a corresponding posture graph, and sending the corresponding posture graph to a display module for display.

The background replacement engine is used for acquiring the image of the target object from the background replacement client, and training the image segmentation lightweight model by using the image of the target object and the preset image segmentation full model. Specifically, first, the background replacement engine receives an image T1 of a target object sent by the background replacement client, and trains an image segmentation lightweight model M1 by using the image T1 and a preset image segmentation full model to obtain an image segmentation lightweight model M2. Then, dividing the image T1 of the target object by using a preset image division total model and an image division lightweight model M2 to obtain a division result 1 and a division result 3, judging whether the difference between the division result 1 and the division result 3 meets a preset condition D2, and storing the image division lightweight model M2 for subsequent image division if the difference meets the preset condition D2; if the preset condition D2 is not met, the segmentation result 1 and the segmentation result 3 are sent to the background replacement client side, so that the background replacement client side can determine the gesture required to be displayed on the user interface in the next training round.

And the representative gesture library is used for storing gesture graphs, receiving a request instruction of the background replacement client and sending the gesture graph corresponding to the request instruction to the background replacement client.

Next, taking the embodiment of fig. 14 as an example, the cooperative relationship of each module in the electronic device 100 in the embodiment of the present application will be described in detail, referring to fig. 15, and fig. 15 schematically illustrates the cooperative relationship of each module in the electronic device 100 in the embodiment of the present application. As shown in fig. 15, the electronic device 100 includes: the system comprises a background replacement engine, a background replacement client, an acquisition module, a display module and a representative gesture library. In the embodiment of fig. 15, taking two rounds of training for training the image segmentation lightweight model as an example, a target image segmentation lightweight model is obtained, and the following is specifically described:

1. the interactive display module detects a first operation. The first operation may be a touch operation applied to the determination control 214b in fig. 3A.

2. And the interactive display module sends a background replacement instruction to the background replacement client.

3. And the background replacement client responds to the background replacement instruction sent by the interactive display module and sends an instruction for requesting to acquire the gesture graph to the gesture library.

4. The representative gesture library responds to the instruction of the background replacement client for acquiring the gesture map, and the representative gesture library sends the gesture map P1 to the background replacement client.

5. The background replacement client receives the gesture map P1 sent by the gesture library, and sends the gesture map P1 to the interactive display module.

6. The interactive display module receives the gesture map P1 and displays the gesture map P1. Wherein the posture map P1 indicates the posture G1.

7. The acquisition module acquires an image T1 and sends the image T1 to the background replacement client.

8. The background replacement client receives the image T1 and sends it to the background replacement engine.

9-11, training the image segmentation lightweight model M1 by using the image T1 and the image segmentation full model to obtain an image segmentation lightweight model M2. And the image T1 is respectively input into an image segmentation lightweight model M2 and an image segmentation total model to obtain a segmentation result 1 and a segmentation result 3. The segmentation result 1 and the segmentation result 3 are then sent to the context replacement client.

12. The background replacement client judges whether the segmentation result 1 and the segmentation result 3 meet a preset condition D2, and determines a posture graph P2 based on the segmentation result 1 and the segmentation result 3 under the condition that the preset condition D2 is not met. Wherein the gesture map P2 is used to indicate the gesture G2.

13. The context replacement client wants to send an instruction requesting to acquire the gesture map P2 on behalf of the gesture library.

14. The representative gesture library sends the gesture map P2 to the context replacement client in response to an instruction requesting to acquire the gesture map P2.

15. The background replacement client receives the gesture image P2 and sends the gesture image P2 to the interactive display module.

16. The interactive display module receives the gesture map P2 and displays the gesture map P2.

17. The acquisition module acquires an image T3 and sends the image T3 to the background replacement client. The posture of the target object in the image T2 is the posture G2 in the posture chart P3.

18. The background replacement client receives the image T3 and sends it to the background replacement engine.

And 19-21, training the image segmentation lightweight model M2 by using the image T3 and the image segmentation full model to obtain an image segmentation lightweight model M3. And the image T3 is respectively input into an image segmentation lightweight model M3 and an image segmentation total model to obtain a segmentation result 4 and a segmentation result 6. The segmentation result 4 and the segmentation result 6 are then sent to the context replacement client.

22. The background replacement client receives the segmentation result 4 and the segmentation result 6, judges whether the segmentation result 4 and the segmentation result 6 meet a preset condition point D2, and determines an M image segmentation lightweight model M3 as a target image segmentation lightweight model when the preset condition point D2 is met.

23. The acquisition module acquires an original image T5 and sends the original image T5 to the background replacement client.

24. After receiving the original image T5, the background replacement client sends the original image T5 to the background replacement engine.

25-27. The background replacement engine receives the original image T5, and uses the target image segmentation lightweight model to segment the original image T5, so as to obtain a foreground region and a background region. Then, the background area in the original image T5 is replaced with a preset background, and a replaced image T6 is obtained. And sends the replaced image T6 to the background replacement client.

28. The background replacement client receives the replaced image T6 and sends the image T6 to the interactive display module for display.

29. The interactive display module receives the replaced image T6 and displays the replaced image T6.

It should be noted that, the context replacement client, the representative gesture library, and the context replacement engine may be disposed on the same electronic device, or may be disposed on different electronic devices, for example, the context replacement client may be disposed on one electronic device, the representative gesture library and the context replacement engine may be disposed on another electronic device, or the like, which is not limited in this aspect of the present application.

A background replacement system provided in an embodiment of the present application is described next.

Fig. 16 shows a schematic diagram of a background replacement system provided by an embodiment of the present application. As shown in fig. 16, the context replacement system includes an electronic device 200 and a server 300. A communication connection may exist between the electronic device 200 and the server 300, enabling data communication therebetween. Wherein,,

The electronic device 200 is configured to acquire a first image and send the first image to the server;

the server 300 is configured to receive a first image, input the first image into a first image segmentation lightweight model, and determine an area where a target object in the first image is located and an area where a first background content is located; replacing the first background content in the first image with second background content to obtain a second image, and sending the second image to the electronic equipment;

the electronic device 200 is configured to acquire a second image and display the second image; responding to a background replacement training operation of a user, and displaying a first gesture graph, wherein the first gesture graph is used for indicating the user to make a first gesture;

the electronic device 200 is configured to acquire a third image, and send the third image to the server, where a gesture of a user in the third image is the first gesture;

the server 300 is configured to acquire a third image, train the first image segmentation lightweight model based on the third image, and obtain a target image segmentation lightweight model; inputting the first image into a target image segmentation lightweight model, and determining an area where a target object in the first image is located and an area where first background content is located; replacing the first background content in the first image with the second background content to obtain a fourth image, and sending the fourth image to the electronic equipment;

The electronic device 200 is configured to receive the fourth image and display the fourth image.

Optionally, in a possible implementation manner, the electronic device 200 may also be used to perform the method of background replacement that may be implemented by any one of the above steps S201, S202, S206, S208, S209, S210, and S211, which is not described herein.

In one possible implementation manner, the server 300 may also be used to perform the method of replacing the background that may be implemented in any one of the steps S203, S205, S207, S213, and S214, which is not described herein.

In some possible implementations, the electronic device 200 may also acquire the image T3, send the image T3 to the server 300, the server 300 receives the image T3, trains the image segmentation lightweight model M2 using the image T3 and the image segmentation full-scale model until an end condition of the model training is satisfied, and obtains the target image segmentation lightweight model.

The electronic device 200 may include an interactive display module, an acquisition module, a background replacement client; the server 300 may include a context replacement engine and a representative gesture library.

Optionally, in a possible implementation manner, the interactive display module 301 may also be used to perform the method of performing any one of the possible implementation manners of the steps 1-2, 6, 16, and 29 in the embodiment of fig. 15. The acquisition module may also be used to perform the method of performing any one of the possible background replacement in step 7, step 17, and step 23 in the embodiment of fig. 15, which is not described herein.

The background replacement client may also be configured to perform the method of performing any one of the possible background replacement in the embodiment of fig. 15 in step 3, step 5, step 8, step 12, step 13, step 15, step 18, step 22, step 24, and step 28, which is not described herein.

In a possible implementation manner, the context replacement engine 304 may also be configured to perform any one of the methods that may implement the context replacement in steps 9-11 and 19-21 in the embodiment of fig. 15, which are not described herein.

The representative gesture library may also be used to perform the method of performing the background replacement in any one of step 4 and step 14 in the embodiment of fig. 15, which is not described herein.

The method for replacing the background provided by the embodiment of the application is described in detail above with reference to fig. 2A to 15, and the background replacing device and the electronic device provided by the embodiment of the application will be described below with reference to fig. 17A, 17B and 18.

Fig. 17A is a schematic diagram of a background replacement apparatus according to an embodiment of the present application, where the background replacement apparatus 400 includes a display unit 401, an acquisition unit 402, and a display unit,

a display unit 401, configured to display a second image obtained by performing a first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image comprises an area where the target object is located and an area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;

The display unit 401 is further configured to respond to a background replacement training operation of the user, and the electronic device displays a first gesture graph, where the first gesture graph is used to instruct the user to make a first gesture;

the display unit 401 is further configured to obtain a third image by using the electronic device, where a gesture of a user in the third image is the first gesture;

the display unit 401 is further configured to display a fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on a target image segmentation lightweight model, wherein the target image segmentation lightweight model is obtained based on third image training; the fourth image is obtained by replacing the first background content in the first image with the second background content.

It should be appreciated that the context replacement apparatus 400 of embodiments of the present application may be implemented by an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), which may be a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. When the method of background replacement shown in fig. 2A to 13 is implemented by software, the background replacement apparatus 400 and its respective modules may be software modules.

In a possible implementation manner, as shown in fig. 17B, the background replacing apparatus further includes an image segmentation unit 403, a determination unit 404, a model training unit 405, and a background replacing unit 406. Wherein,,

an image segmentation unit 403, configured to input a third image into the image segmentation full model to obtain a first segmentation result, and input the third image into the first image segmentation light model to obtain a second segmentation result; the number of model parameters in the image segmentation full model is greater than the number of model parameters in the first image segmentation light model; the first segmentation result and the second segmentation result are used for indicating the region where the target object is located and the region where the background content is located in the first image.

A model training unit 405, configured to train the first image segmentation lightweight model based on the first segmentation result and the second segmentation result, to obtain a second image segmentation lightweight model;

the image segmentation unit 403 is further configured to input a third image to the second image segmentation lightweight model, to obtain a third segmentation result;

and a determining unit 404, where the first segmentation result and the third segmentation result are different, the electronic device determines a second gesture graph based on the first segmentation result and the third segmentation result, where the second gesture graph is used to instruct the user to make a second gesture, and the second gesture graph includes a first limb, and an area where the first limb is located in the first segmentation result is different from an area where the first limb is located in the third segmentation result.

The model training unit 405 is configured to train the second image segmentation lightweight model based on the fifth image, to obtain a target image segmentation lightweight model, where the pose of the target object in the fifth image is the second pose.

A background replacement unit 406 for: the second segmentation result replaces the initial background content in the area where the background content in the third image is located with the preset background content to obtain a seventh image;

in particular, the operation of the background replacing apparatus 400 to implement the background replacing may refer to the related operation of the electronic device in the method embodiment, which is not described in detail herein.

In some optional implementations, the display unit 401, the acquiring unit 402, the image segmentation unit 403, the determining unit 404, the model training unit 405, and the background replacing unit 406 may correspond to the electronic device 100, and may perform operations performed by the electronic device 100 in the above method embodiments, which are not described herein.

In some alternative implementations, the display unit 401, the acquisition unit 402, the determination unit 404, and the image segmentation unit 403, the model training unit 405, and the background replacement unit 406 may correspond to the electronic device 200, and the server 300.

Fig. 18 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device 10 includes: the processor 11, the communication interface 12 and the memory 13 are connected to each other by a bus 14, wherein the processor 11 is configured to execute instructions stored in the memory 13. The memory 13 stores program code, and the processor 11 may call the program code stored in the memory 13 to:

displaying a second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image comprises an area where the target object is located and an area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;

responding to a background replacement training operation of a user, and displaying a first gesture graph, wherein the first gesture graph is used for indicating the user to make a first gesture;

acquiring a third image, wherein the gesture of a user in the third image is a first gesture;

displaying a fourth image obtained by replacing the second background of the first image; the second background replacement is performed based on a target image segmentation lightweight model, wherein the target image segmentation lightweight model is obtained based on third image training; the fourth image is obtained by replacing the first background content in the first image with the second background content.

In the embodiment of the present application, the processor 11 may have various specific implementation forms, for example, the processor 11 may be any one or a combination of multiple processors, such as CPU, GPU, TPU or NPU, and the processor 11 may also be a single-core processor or a multi-core processor. The processor 11 may be formed by a combination of a CPU (GPU, TPU or NPU) and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL) or any combination thereof. The processor 11 may also be implemented solely with logic devices incorporating processing logic, such as an FPGA or a digital signal processor (digital signal processor, DSP) or the like.

The communication interface 12 may be a wired interface, which may be an ethernet interface, a controller area network (controller area network, CAN) interface, or a local interconnect network (local interconnect network, LIN) interface, or a wireless interface, which may be a cellular network interface, or use a wireless lan interface, etc., for communicating with other modules or devices.

The memory 13 may be a nonvolatile memory such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The memory 13 may also be a volatile memory, which may be a random access memory (random access memory, RAM) that acts as an external cache.

Memory 13 may also be used to store instructions and data. In addition, the electronic device 10 may contain more or fewer components than illustrated in FIG. 18, or may have a different arrangement of components.

The bus 14 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 18, but not only one bus or one type of bus.

Optionally, the electronic device 10 may further include an input/output interface 15, where the input/output interface 15 is connected to an input/output device, for receiving input information and outputting an operation result.

In some possible implementations, the electronic device 10 in the embodiment of the present application may correspond to the background replacing apparatus 400 in the above embodiment, and may perform the operations performed by the electronic device 100 in the above method embodiment, which are not described herein.

In some possible implementations, the electronic device 10 may be the electronic device 100 described above or the electronic device 200 described above.

Fig. 19 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device 20 includes: the processor 21, the communication interface 22 and the memory 23 are connected to each other by a bus 24, wherein the processor 21 is configured to execute instructions stored in the memory 23. The memory 23 stores program code, and the processor 21 may call the program code stored in the memory 23 to:

receiving a first image, inputting the first image into a first image segmentation lightweight model, and determining an area where a target object in the first image is located and an area where first background content is located; replacing the first background content in the first image with the second background content to obtain a second image,

acquiring a third image, and training the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model;

inputting the first image into a target image segmentation lightweight model, and determining an area where a target object in the first image is located and an area where first background content is located;

And replacing the first background content in the first image with the second background content to obtain a fourth image.

In the embodiment of the present application, the processor 21 may have various specific implementation forms, for example, the processor 21 may be any one or a combination of multiple processors, such as CPU, GPU, TPU or NPU, and the processor 21 may also be a single-core processor or a multi-core processor. The processor 21 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL) or any combination thereof. The processor 21 may also be implemented solely with logic devices incorporating processing logic, such as an FPGA or a digital signal processor (digital signal processor, DSP) or the like.

The communication interface 22 may be a wired interface or a wireless interface for communicating with other modules or devices. The wired interface may be an ethernet interface, a controller area network (controller area network, CAN) interface or a local interconnect network (local interconnect network, LIN) interface, and the wireless interface may be a cellular network interface or use a wireless local area network interface, etc.

The memory 23 may be a nonvolatile memory such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The memory 23 may also be a volatile memory, which may be a random access memory (random access memory, RAM) that acts as an external cache.

Memory 23 may also be used to store instructions and data. Furthermore, the electronic device 20 may contain more or fewer components than shown in FIG. 19, or may have a different arrangement of components.

The bus 24 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 19, but not only one bus or one type of bus.

Optionally, the electronic device 20 may further include an input/output interface 25, where the input/output interface 25 is connected to an input/output device, for receiving input information and outputting an operation result.

In some possible implementations, the electronic device 20 may be the electronic device 100 described above or the server 300 described above.

The embodiment of the present application further provides a non-transitory computer readable storage medium, where a computer program is stored, where when the computer program runs on a processor, the method steps executed by the electronic device in the foregoing method embodiment may be implemented, and the specific implementation of the processor of the computer storage medium in executing the foregoing method steps may refer to the specific operation of the electronic device in the foregoing method embodiment, which is not repeated herein.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk (solid state drive, SSD).

The steps in the method of the embodiment of the application can be sequentially adjusted, combined or deleted according to actual needs; the modules in the device of the embodiment of the application can be divided, combined or deleted according to actual needs.

The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of context replacement, the method comprising:

the electronic equipment displays a second image obtained by carrying out first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image comprises an area where the target object is located and an area where the first background content is located; the second image is obtained by replacing the first background content in the first image with second background content; the second background content is different from the first background content;

Responding to a background replacement training operation of a user, and displaying a first gesture graph by the electronic equipment, wherein the first gesture graph is used for indicating the user to make a first gesture;

the electronic equipment acquires a third image, wherein the gesture of a user in the third image is the first gesture;

the electronic equipment displays a fourth image obtained by carrying out second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is obtained based on the third image training; the fourth image is obtained by replacing the first background content in the first image with the second background content.

2. The method of claim 1, wherein before the electronic device displays a fourth image resulting from a second background replacement of the first image, the method further comprises:

the electronic equipment inputs the third image into an image segmentation full model to obtain a first segmentation result;

the electronic equipment inputs the third image into a first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the image segmentation full model is greater than the number of model parameters in the first image segmentation light-weight model; the first segmentation result and the second segmentation result are used for indicating an area where the target object is located and an area where third background content is located in the third image;

The electronic equipment trains the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model;

the electronic equipment inputs the third image into the second image segmentation lightweight model to obtain a third segmentation result;

when the first segmentation result and the third segmentation result are different, the electronic device determines a second gesture graph based on the first segmentation result and the third segmentation result, wherein the second gesture graph is used for indicating a user to make a second gesture, the second gesture graph comprises a first limb, and the area of the first limb in the first segmentation result is different from the area of the first limb in the third segmentation result;

the electronic device trains the second image segmentation lightweight model based on a fifth image to obtain the target image segmentation lightweight model, and the gesture of the target object in the fifth image is the second gesture.

3. The method of claim 2, wherein the electronic device trains the second image segmentation lightweight model based on a fifth image, resulting in the target image segmentation lightweight model, comprising:

The electronic equipment acquires the fifth image;

the electronic equipment inputs the fifth image to the image segmentation full model to obtain a fourth segmentation result;

the electronic equipment inputs the fifth image into the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used for indicating an area where the target object is located and an area where fourth background content is located in the fifth image;

the electronic equipment trains the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result to obtain a third image segmentation lightweight model;

the electronic equipment inputs the fifth image into the third image segmentation lightweight model to obtain a sixth segmentation result;

and the third image segmentation lightweight model is the target image segmentation lightweight model under the condition that the fourth segmentation result and the sixth segmentation result do not meet a first preset condition.

4. A method according to claim 3, wherein the electronic device inputs the fifth image into the third image segmentation lightweight model, resulting in a sixth segmentation result, the method further comprising:

In the case that the fourth segmentation result and the sixth segmentation result meet a first preset condition, the electronic device determines a third gesture graph based on the fourth segmentation result and the sixth segmentation result, wherein the third gesture graph is used for indicating a user to make a third gesture, the third gesture graph comprises a second limb, and the second limb in the fourth segmentation result is different from the second limb in the sixth segmentation result;

the electronic device trains the third image segmentation lightweight model based on a sixth image to obtain the target image segmentation lightweight model, and the gesture of the target object in the sixth image is the third gesture.

5. The method of any of claims 2-4, wherein the electronic device determining a second pose graph based on the first and third segmentation results if the first and third segmentation results are different, comprises:

in the case that the difference between the first segmentation result and the third segmentation result meets the first preset condition, the electronic equipment determines a target area of the first image based on the difference between the first segmentation result and the third segmentation result;

The electronic device determines the second gesture graph based on the target region.

6. The method of claim 5, wherein the first segmentation result and the third segmentation result comprise pixel information of a pixel point in the third image;

the electronic device determining, based on the difference between the first segmentation result and the third segmentation result, a target region of the first image if the difference between the first segmentation result and the third segmentation result satisfies a first preset condition, including:

the electronic equipment determines a first target pixel point in the third image based on the difference value of the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the third segmentation result, wherein the first target pixel point is the pixel point of which the difference value of the pixel information in the first segmentation result and the third segmentation result is larger than a first threshold value;

the electronic device determines an area where one or more limbs of the target object in the third image are located, wherein the one or more limbs comprise a third limb;

and under the condition that the number of the first target pixel points in the area where the third limb is located in the third image is larger than a second threshold value, the electronic equipment determines the area where the third limb is located in the third image as the target area.

7. The method of claim 6, wherein the electronic device determining a second gesture map based on the target region comprises:

the electronic equipment determines the third limb of the target object contained in the target area;

the electronic device determines a second gesture map that includes the third limb.

8. The method of claim 7, wherein the electronic device determining a second gesture map comprising the third limb comprises:

the electronic device determining a plurality of gesture graphs comprising the third limb;

the electronic equipment determines the second gesture image from the gesture images, wherein the second gesture image is the gesture image with the largest area containing the first target pixel points in the gesture images.

9. The method of claim 2, wherein the step of determining the position of the substrate comprises,

under the condition that the first segmentation result and the third segmentation result do not meet a first preset condition, the electronic equipment replaces third background content in an area where the background content in the third image is located with preset background content based on the third segmentation result to obtain a seventh image;

The electronic device displays the seventh image, a first control and first prompt information, wherein the first prompt information is used for prompting training of the second image segmentation lightweight model.

10. The method of claim 9, wherein after the electronic device displays the seventh image, first control, and first reminder information, the method further comprises:

the electronic device detects an operation acting on the first control, and determines a third threshold value which is smaller than the first threshold value;

in the case that the difference value between the first segmentation result and the third segmentation result meets a second preset condition, the electronic equipment determines a fourth gesture graph based on the first segmentation result and the third segmentation result, wherein the fourth gesture graph is used for indicating a user to make a fourth gesture; the second preset condition is: the number of the second target pixel points in the region where the fourth limb body is located in the first image is larger than a second threshold value; the second target pixel point is a pixel point of which the difference value between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the second segmentation result is larger than the third threshold value;

The electronic device trains the second image segmentation lightweight model based on an eighth image to obtain the target image segmentation lightweight model, and the gesture of a target object in the eighth image is the fourth gesture.

11. The method of claim 1, wherein the electronic device, in response to a user context replacement training operation, further comprises, prior to displaying the first gesture graph:

when the electronic equipment detects that the using time length of the first image segmentation lightweight model is longer than a first time length, the electronic equipment displays second prompt information and a second control, wherein the second prompt information is used for prompting training of the first image segmentation lightweight model; the background replacement training operation is an operation acting on the second control.

12. A background replacement apparatus, comprising:

the display unit is used for displaying a second image obtained by carrying out first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image comprises an area where the target object is located and an area where the first background content is located; the second image is obtained by replacing the first background content in the first image with second background content; the second background content is different from the first background content;

The display unit is further used for responding to the background replacement training operation of the user and displaying a first gesture graph, wherein the first gesture graph is used for indicating the user to make a first gesture;

the acquisition unit is used for acquiring a third image, wherein the gesture of a user in the third image is the first gesture;

the display unit is further used for displaying a fourth image obtained by performing second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is obtained based on third image training; the fourth image is obtained by replacing the first background content in the first image with the second background content.

13. The apparatus of claim 12, wherein the apparatus further comprises:

the image segmentation unit is used for inputting the third image into the image segmentation total model to obtain a first segmentation result;

the image segmentation unit is further used for inputting the third image into the first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the image segmentation full model is greater than the number of model parameters in the first image segmentation light-weight model; the first segmentation result and the second segmentation result are used for indicating an area where the target object is located and an area where third background content is located in the third image;

The model training unit is used for training the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model;

the image segmentation unit is further used for inputting the third image into the second image segmentation lightweight model to obtain a third segmentation result;

the determining unit is used for determining a second gesture graph based on the first segmentation result and the third segmentation result when the first segmentation result and the third segmentation result are different, wherein the second gesture graph is used for indicating a user to make a second gesture, the second gesture graph comprises a first limb, and the area of the first limb in the first segmentation result is different from the area of the first limb in the third segmentation result;

the model training unit is further configured to train the second image segmentation lightweight model based on a fifth image, so as to obtain the target image segmentation lightweight model, where a pose of the target object in the fifth image is the second pose.

14. The apparatus of claim 13, wherein the device comprises a plurality of sensors,

the acquisition unit is further used for acquiring the fifth image;

The image segmentation unit is further used for inputting the fifth image into the image segmentation full model to obtain a fourth segmentation result;

the image segmentation unit is further used for inputting the fifth image into the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used for indicating an area where the target object is located and an area where fourth background content is located in the fifth image;

the model training unit is further configured to train the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result to obtain a third image segmentation lightweight model;

the image segmentation unit is further used for inputting the fifth image into the third image segmentation lightweight model to obtain a sixth segmentation result;

the determining unit is further configured to, when the fourth segmentation result and the sixth segmentation result do not satisfy a first preset condition, determine that the third image segmentation lightweight model is the target image segmentation lightweight model.

15. The apparatus of claim 14, wherein the device comprises a plurality of sensors,

the determining unit is further configured to determine, based on the fourth segmentation result and the sixth segmentation result, a third gesture graph, where the third gesture graph is used to instruct a user to make a third gesture, the third gesture graph includes a second limb, and the second limb in the fourth segmentation result is different from the second limb in the sixth segmentation result;

The model training unit is further configured to train the third image segmentation lightweight model based on a sixth image to obtain the target image segmentation lightweight model, where a pose of the target object in the sixth image is the third pose.

16. The device according to any one of claims 13-15, wherein,

the determining unit is further configured to: determining a target area of the first image based on the difference between the first segmentation result and the third segmentation result in the case that the difference between the first segmentation result and the third segmentation result meets the first preset condition;

17. The apparatus of claim 16, wherein the first segmentation result and the third segmentation result comprise pixel information of a pixel point in the third image;

the determining unit is specifically configured to: determining a first target pixel point in the third image based on the difference value of the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the third segmentation result, wherein the first target pixel point is a pixel point of which the difference value of the pixel information in the first segmentation result and the third segmentation result is larger than a first threshold value;

Determining an area of the third image where one or more limbs of the target object are located, the one or more limbs including the third limb;

and determining the area where the third limb is located in the third image as the target area under the condition that the number of the first target pixel points in the area where the third limb is located in the third image is larger than a second threshold value.

18. The apparatus of claim 17, wherein the device comprises a plurality of sensors,

the determining unit is specifically configured to: determining the third limb of the target object contained in the target area;

a second pose map is determined that includes the third limb.

19. The apparatus of claim 18, wherein the device comprises a plurality of sensors,

the determining unit is specifically configured to: determining a plurality of gesture graphs comprising the third limb;

and determining the second gesture image from the gesture images, wherein the second gesture image is the gesture image with the largest first target pixel point in the region where the third limb is located in the gesture images.

20. The apparatus of claim 13, further comprising a replacement unit,

the replacing unit is configured to replace, when the first segmentation result and the third segmentation result do not meet a first preset condition, third background content in an area where the background content in the third image is located with preset background content based on the third segmentation result, so as to obtain a seventh image;

The display unit is further configured to: and displaying the seventh image, a first control and first prompt information, wherein the first prompt information is used for prompting training of the second image segmentation lightweight model.

21. The apparatus of claim 20, wherein the device comprises a plurality of sensors,

the determining unit is further configured to detect an operation acting on the first control, and the electronic device determines a third threshold value, where the third threshold value is smaller than the first threshold value;

the determining unit is further configured to determine, based on the first segmentation result and the third segmentation result, a fourth gesture graph, where the fourth gesture graph is used to instruct a user to make a fourth gesture, where a difference between the first segmentation result and the third segmentation result meets a second preset condition; the second preset condition is: the number of the second target pixel points in the region where the fourth limb body is located in the first image is larger than a second threshold value; the second target pixel point is a pixel point of which the difference value between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the second segmentation result is larger than the third threshold value;

the model training unit is further configured to train the second image segmentation lightweight model based on an eighth image to obtain the target image segmentation lightweight model, where a pose of a target object in the eighth image is the fourth pose.

22. The apparatus of claim 12, wherein the device comprises a plurality of sensors,

the display unit is further configured to display second prompt information and a second control when the electronic device detects that the usage time of the first image segmentation lightweight model is longer than a first time, where the second prompt information is used for prompting training of the first image segmentation lightweight model; the background replacement training operation is an operation acting on the second control.

23. A context replacement system, the system comprising an electronic device and a server; wherein,,

the electronic equipment is used for acquiring a first image and sending the first image to the server;

the server is used for receiving the first image, inputting the first image into a first image segmentation lightweight model, and determining an area where a target object in the first image is located and an area where first background content is located;

the server replaces the first background content in the first image with second background content to obtain a second image, and the second image is sent to the electronic equipment;

the electronic equipment is used for acquiring the second image and displaying the second image;

The electronic equipment is used for responding to the background replacement training operation of the user, and displaying a first gesture graph which is used for indicating the user to make a first gesture;

the electronic equipment is used for acquiring a third image and sending the third image to the server, and the gesture of a user in the third image is the first gesture;

the server is used for acquiring the third image, training the first image segmentation lightweight model based on the third image, and obtaining a target image segmentation lightweight model;

the server is used for inputting the first image into a target image segmentation lightweight model and determining an area where a target object in the first image is located and an area where first background content is located;

the server is used for replacing the first background content in the first image with the second background content to obtain a fourth image, and sending the fourth image to the electronic equipment;

the electronic device receives the fourth image and displays the fourth image.

24. An electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, the one or more memories for storing computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-11.

25. A chip system for application to an electronic device, the chip system comprising one or more processors to invoke computer instructions to cause the electronic device to perform the method of any of claims 1-11.

26. A computer readable storage medium comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-11.