WO2023174063A1

WO2023174063A1 - Background replacement method and electronic device

Info

Publication number: WO2023174063A1
Application number: PCT/CN2023/079248
Authority: WO
Inventors: 李炜; 黄睿
Original assignee: 华为技术有限公司
Priority date: 2022-03-18
Filing date: 2023-03-02
Publication date: 2023-09-21
Also published as: CN116823869A

Abstract

Provided in the present application are a background replacement method, the method comprising: segmenting a first image on the basis of an image segmentation lightweight model, and determining an area where a target object in the first image is located and an area where the background content is located, and when a user is not satisfied with the background replacement effect, the user training the image segmentation lightweight model by means of a background replacement training operation until a user-satisfied background replacement effect is achieved. Therefore, when the user is not satisfied with the replacement effect of background replacement performed on the basis of the current image segmentation lightweight model, training of the image segmentation lightweight model can be started by user operation, so as to raise the segmentation accuracy of the image segmentation lightweight model, thereby raising the accuracy of background replacement, reducing the probability of replacement errors, and improving the user experience when background replacement is performed.

Description

Background replacement methods and electronic devices

This application claims priority to the Chinese patent application filed with the China Patent Office on March 18, 2022, with the application number 202210271308.4 and the application title "Background replacement method and electronic device", the entire content of which is incorporated into this application by reference. .

Technical field

The present application relates to the field of terminals, and in particular to background replacement methods and electronic devices.

Background technique

In order to reduce application usage environment restrictions and protect user privacy, background replacement has become a necessity in many video applications, such as video calls, video conferencing, video customer service, etc. Among them, background replacement is a technology that replaces the content of the background area contained in a video or image with specified background content. In background replacement, the core step is image segmentation, that is, the input image is segmented into a target area and a background area through the image segmentation model.

Currently, the training of image segmentation models is to collect a large amount of historical image or video data of the target object in advance to build an image data set, and then use the image data set to train the image segmentation model to obtain an optimized image segmentation model for the target object. Therefore, the accuracy of image segmentation depends on the quality of the image dataset used in the training process of the image segmentation model, and the pre-collected image dataset cannot include all individual features. When the image dataset is of low quality, that is, it does not contain the target object. When using a trained image segmentation model to segment an image containing these individual features of the target object, the area containing these individual features in the target area of the image will be regarded as the background area, which will This results in poor image segmentation accuracy, which in turn causes replacement errors when the image background is replaced, affecting the user experience.

Contents of the invention

This application provides a background replacement method and electronic device. Implementing this method can improve the image segmentation accuracy of the background segmentation model, thereby improving the accuracy of background replacement and improving user experience.

In a first aspect, embodiments of the present application provide a method for background replacement. The method includes: an electronic device displays a second image obtained by performing a first background replacement on a first image; the first background replacement is based on lightweight segmentation of the first image. The model is performed; the first image includes the area where the target object is located and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is the same as The first background content is different; in response to the user's background replacement training operation, the electronic device displays a first gesture image, and the first gesture image is used to instruct the user to make the first gesture; the electronic device acquires a third image, and in the third image The user's posture is the first posture; the electronic device displays the fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is based on the third It is obtained by training on three images; the fourth image is obtained by replacing the first background content in the first image with the second background content.

It can be understood that after detecting the background replacement training operation, the electronic device displays the first gesture image to instruct the user to make the first gesture, and then the electronic device obtains a third image including the user, and the user's gesture in the third image is The first gesture. The electronic device uses the third image as training data for the first image segmentation lightweight model, and can obtain the latest individual characteristics of the user. In this way, the target image segmentation lightweight model trained using the third image can segment the image more accurately. In other words, the target image segmentation lightweight model can more accurately segment the target object and the background within the image. Allow. Therefore, for the second image obtained by performing the first background replacement on the first image based on the first image segmentation lightweight model, and the fourth image obtained by performing the first background replacement on the first image based on the target image segmentation lightweight model , the size of the area where the target object is located in the fourth image is closer to the size of the area where the target object is located in the first image than in the second image. That is, the accuracy of background replacement of the fourth image is higher than the accuracy of background replacement of the second image. The segmentation accuracy of the target image segmentation lightweight model is higher than the segmentation accuracy of the first image segmentation lightweight model.

To implement the method of the first aspect, the electronic device uses an image segmentation lightweight model to segment the first image, and determines the area where the target object is located and the area where the background content is located in the first image. When the user is not satisfied with the effect of background replacement , users can train the image segmentation lightweight model through the background replacement training operation until the background replacement effect that satisfies the user is achieved. In this way, when users are dissatisfied with the effect of background replacement based on the current lightweight model for image segmentation, they can start the training of the lightweight model for image segmentation through user operations to improve the segmentation accuracy of the lightweight model for image segmentation, thereby improving the accuracy of background replacement. degree and improve user experience.

In connection with the first aspect, in some implementations, before the electronic device displays the fourth image obtained by performing the second background replacement on the first image, the above method further includes: the electronic device inputs the third image to the full image segmentation model. , obtain the first segmentation result; the electronic device inputs the third image into the first image segmentation lightweight model to obtain the second segmentation result; the number of model parameters in the full image segmentation model is greater than the number of model parameters in the first image segmentation lightweight model Quantity; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image; the electronic device trains the first image segmentation based on the first segmentation result and the second segmentation result. A lightweight model is used to obtain a second image segmentation lightweight model; the electronic device inputs the third image into the second image segmentation lightweight model to obtain a third segmentation result;

When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result. The second posture map is used to instruct the user to make the second posture. The posture map includes the first limb, and the area where the first limb is located in the first segmentation result is different from the area where the first limb is located in the third segmentation result;

The electronic device trains a second image segmentation lightweight model based on the fifth image to obtain a target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.

Among them, the image segmentation lightweight model is obtained by cropping and quantifying based on the image segmentation full model. The number of model parameters in the image segmentation full model is greater than the number of model parameters in the image segmentation lightweight model. Generally speaking, the segmentation results of the full image segmentation model are more accurate than the segmentation results of the lightweight image segmentation model, but the calculation amount of the full image segmentation model is relatively large. In order to reduce the amount of model calculations, an image segmentation lightweight model with fewer model parameters is generally deployed on electronic devices for image segmentation. However, the segmentation effect of the image segmentation lightweight model is not as good as the segmentation results of the image segmentation full model. Therefore, in the image During the training process of the segmentation lightweight model, the image segmentation full model is used to guide the training of the image lightweight model to be trained, so that the performance of the image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the segmentation of the image segmentation lightweight model The effect is close to that of the full image segmentation model. That is to say, the electronic device uses the full image segmentation model and the image segmentation lightweight model to segment the third image, respectively, to obtain the first segmentation result and the second segmentation result. Based on the error between the first segmentation result and the second segmentation result, the model parameters of the image segmentation lightweight model are adjusted so that the segmentation effect of the image lightweight model is closer to that of the full image segmentation model. In this way, the calculation amount of the electronic device can be reduced while ensuring the effect of image segmentation.

In this way, after the electronic device uses the first segmentation result and the second segmentation result to train the second image segmentation lightweight model, it uses the third image to verify the trained second image segmentation lightweight model. After the third segmentation result and When the first segmentation results are different, the electronic device determines the posture map based on the segmentation results, and instructs the user to make the posture shown in the posture map as training data to guide the next round of training. In this way, the training data is actively screened and the quality of the training data is improved. Using the filtered training data to train the model improves the training effect of the model, thereby improving the model's performance. type of segmentation accuracy.

Combined with the first aspect, in some implementations, the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, which specifically includes:

The electronic device acquires the fifth image; the electronic device inputs the fifth image into the full image segmentation model to obtain a fourth segmentation result; the electronic device inputs the fifth image into the second image segmentation lightweight model to obtain the fifth segmentation result; The fourth segmentation result and the fifth segmentation result are used to indicate the area where the target object is located and the area where the fourth background content is located in the fifth image; the electronic device trains the third segmentation result based on the fourth segmentation result and the fifth segmentation result. An image segmentation lightweight model is used to obtain a third image segmentation lightweight model; the electronic device inputs the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;

When the fourth segmentation result and the sixth segmentation result do not meet the first preset condition, the third image segmentation lightweight model is the target image segmentation lightweight model.

In this way, after the electronic device obtains the second image segmentation lightweight model through training, it uses the fifth image to verify the second image segmentation lightweight model, and determines the segmentation results of the second image segmentation lightweight model and the segmentation results of the full image segmentation model. The master satisfies the first preset condition. If the first preset condition is not met, the electronic device determines that the second image segmentation lightweight model is the target image segmentation lightweight model and stops training. That is to say, after a round of training, the electronic device will determine whether the segmentation results of the lightweight image segmentation model meet the requirements. If the requirements are met, the training will be stopped, thus avoiding unnecessary training.

Combined with the first aspect, in some implementations, the electronic device inputs the fifth image to the third image segmentation lightweight model, and after obtaining the sixth segmentation result, the method further includes: between the fourth segmentation result and the sixth segmentation result When the first preset condition is met, the electronic device determines a third posture map based on the fourth segmentation result and the sixth segmentation result. The third posture map is used to instruct the user to make the third posture. The third posture map includes the second posture map. Limb, the second limb in the fourth segmentation result is different from the second limb in the sixth segmentation result;

The electronic device trains a third image segmentation lightweight model based on the sixth image to obtain a target image segmentation lightweight model, and the posture of the target object in the sixth image is the third posture.

In this way, when the fourth segmentation result and the fifth segmentation result do not meet the requirements, that is, when the segmentation result of the third image segmentation lightweight model and the segmentation result of the third image segmentation full model do not meet the requirements, the third electronic device The segmentation results of the image segmentation lightweight model and the segmentation results of the image segmentation full model determine the third posture map, which is used to instruct the user to make the third posture. In this way, the electronic device continuously filters the training data through each round of training until the segmentation results of the model meet the requirements. In this way, the quality of training data can be improved, the training effect of the model can be improved, the waste of training data can be avoided, and the training time caused by invalid training data can be avoided.

In conjunction with the first aspect, in some implementations, when the first segmentation result and the third segmentation result are different, the electronic device determines the second posture map based on the first segmentation result and the third segmentation result, including:

When the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, the electronic device determines the target area of the first image based on the difference between the first segmentation result and the third segmentation result; the electronic device determines the target area of the first image based on the target The region determines the second pose map.

The target area is a poorly segmented area, and the electronic device determines whether there is a poorly segmented area in the third segmentation result based on the first segmentation result and the third segmentation result. If there is a poorly segmented area in the third segmentation result, the electronic device determines a posture map containing the poorly segmented area from the representative posture library, and is used to instruct the user to make the posture map. The included pose is the pose included in the area that is not divided. In this way, in the next round of training of the image segmentation lightweight model, user images containing these poorly segmented areas are obtained and focused on training these poorly segmented areas, that is, personalized training is conducted based on the individual characteristics of the target object, which improves the model training effect, thereby improving the segmentation accuracy of the model.

Combined with the first aspect, in some implementations, the first segmentation result and the third segmentation result include pixel information of pixels in the third image;

When the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, the electronic device determines the target area of the first image based on the difference between the first segmentation result and the third segmentation result, specifically including: : The electronic device determines the first target pixel point in the third image based on the difference between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the third segmentation result, and the first target pixel point is the first The difference between the mid-pixel information between the segmentation result and the third segmentation result is a pixel point greater than the first threshold; the electronic device determines the area where one or more limbs of the target object in the third image are located, and the one or more limbs include the third limb. ; When the number of first target pixels in the area where the third limb is located in the third image is greater than the second threshold, the electronic device determines that the area where the third limb is located in the third image is the target area.

In some implementations, the limbs may include head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right Feet, left hip, left thigh, left calf, left foot and other limbs.

Combined with the first aspect, in some implementations, the electronic device determines the second posture map based on the target area, specifically including: the electronic device determines the third limb of the target object contained in the target area; the electronic device determines the third limb containing the third limb. Second pose diagram.

In this way, the electronic device can filter out posture images containing poorly segmented limbs, and in the next round of training, focus on training the poorly segmented areas, which can improve the training effect of the model.

Combined with the first aspect, in some implementations, the electronic device determines a second posture map including the third limb, including: the electronic device determines multiple posture maps including the third limb; the electronic device determines from the multiple posture maps A second posture image is generated, and the second posture image is the posture image in which the area where the third limb is located contains the most pixels of the first target among the plurality of posture images.

In some implementations, the posture map includes a contour map and a human body area map; before the electronic device detects the first user operation, the method further includes: the electronic device obtains a training data set, the training data set includes a plurality of images; the electronic device The image data set is input into the human body posture estimation model to obtain multiple human body posture vectors corresponding to the image data set; the electronic device inputs the multiple human body posture vectors corresponding to the image data set into the clustering model to obtain one or more representative postures vector; the electronic device inputs one or more representative posture vectors into the human body contour detection model, and obtains a contour map corresponding to one or more representative posture vectors; the electronic device inputs the image data set into the human body area detection model, and obtains the corresponding contour map of the image data set One or more limb region maps.

Combined with the first aspect, in some implementations, when the first segmentation result and the third segmentation result do not meet the first preset condition, the electronic device divides the area where the background content in the third image is located based on the third segmentation result. The third background content is replaced with the preset background content to obtain the seventh image;

The electronic device displays the seventh image, the first control and the first prompt information, and the first prompt information is used to prompt training of the second image segmentation lightweight model.

In this way, when the segmentation result of the image segmentation lightweight model meets the requirements, the electronic device can replace the background content in the segmentation result with the preset background content, obtain the replaced image, and display the replacement on the display screen The picture after. And the retraining control can be displayed. When the user is not satisfied with the effect of background replacement, the user can use this control to Retrain lightweight models for image segmentation. Can improve user experience.

With reference to the first aspect, in some implementations, after the electronic device displays the seventh image, the first control and the first prompt information, the method further includes: the electronic device detects an operation acting on the first control, and the electronic device determines the third Threshold, the third threshold is smaller than the first threshold;

When the difference between the first segmentation result and the third segmentation result satisfies the second preset condition, the electronic device determines a fourth posture map based on the first segmentation result and the third segmentation result, and the fourth posture map is used to indicate the fourth posture map. posture; the second preset condition is: the number of second target pixels in the area where the fourth limb is located in the first image is greater than the second threshold; the second target pixel is the pixel information of the pixel in the first segmentation result and the second The difference in pixel information of the pixels in the second segmentation result is greater than the third threshold pixel; the electronic device trains the second image segmentation lightweight model based on the eighth image to obtain the target image segmentation lightweight model, and the target object in the eighth image The posture is the fourth posture.

Combined with the first aspect, in some implementations, in response to the user's background replacement training operation, before the electronic device displays the first posture map, the method further includes:

When the electronic device detects that the usage time of the first image segmentation lightweight model is longer than the first time duration, the electronic device displays second prompt information and a second control. The second prompt information is used to prompt training of the first image segmentation lightweight model. quantity model; the background replacement training operation is an operation that acts on the second control.

In a second aspect, embodiments of the present application provide a background replacement device, including various units for performing the background replacement method in the first aspect or any possible implementation of the first aspect.

In a third aspect, embodiments of the present application provide an electronic device, which includes one or more processors and one or more memories; wherein one or more memories are coupled to one or more processors, and one or more The plurality of memories are used to store computer program codes. The computer program codes include computer instructions. When one or more processors execute the computer instructions, the electronic device performs as described in the first aspect and any possible implementation manner of the first aspect. method.

In a fourth aspect, embodiments of the present application provide a chip system, which is applied to an electronic device. The chip system includes one or more processors, and the processor is used to call computer instructions to cause the electronic device to execute the first step. aspect and the method described in any possible implementation manner in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium that includes instructions. When the instructions are run on an electronic device, the electronic device causes the electronic device to execute the first aspect and any possible implementation of the first aspect. described method.

It can be understood that the background replacement device provided by the second aspect, the electronic device provided by the third aspect, the chip system provided by the fourth aspect, and the computer storage medium provided by the fifth aspect are all used to execute the method provided by the embodiments of the present application. . Therefore, the beneficial effects it can achieve can be referred to the beneficial effects in the corresponding methods, and will not be described again here.

Description of the drawings

1A-1B are schematic user interface diagrams of video conferencing on electronic devices provided by embodiments of the present application;

Figure 2A is a flow chart of a background replacement method provided by an embodiment of the present application;

Figure 2B is a flow chart of the background replacement method provided by the embodiment of the present application;

Figures 3A-3F are schematic diagrams of some user interfaces provided by embodiments of the present application;

Figure 4 is a schematic diagram of the segmentation results provided by the embodiment of the present application;

Figure 5 is a schematic diagram of the training process of the image segmentation lightweight model provided by the embodiment of the present application;

Figure 6 is a process flow chart of the electronic device determining the target area provided by the embodiment of the present application;

Figure 7 is a schematic diagram of another user interface provided by an embodiment of the present application;

Figure 8 is a schematic diagram of an electronic device user provided by an embodiment of the present application using the image segmentation lightweight model 1 and the target image segmentation lightweight model to segment the original image and then replace it to obtain the replaced image;

Figure 9 is a flow chart for an electronic device to construct a representative posture library provided by an embodiment of the present application;

Figure 10 is a schematic diagram of a set of skeletal key point data provided by the embodiment of the present application;

Figure 11 is a schematic diagram of the process of clustering electronic devices to obtain representative posture vectors according to an embodiment of the present application;

Figure 12 is a schematic diagram of a contour diagram obtained from a representative posture vector provided by an embodiment of the present application;

Figure 13 is a schematic diagram of the limb region provided by the embodiment of the present application;

Figure 14 is a schematic diagram of the software structure of the electronic device 100 provided by the embodiment of the present application;

Figure 15 shows the cooperation relationship between various modules in the electronic device provided by the embodiment of the present application in the embodiment of the present application;

Figure 16 is a schematic diagram of the background replacement system provided by the embodiment of the present application;

Figure 17A is a schematic diagram of a background replacement device provided by an embodiment of the present application;

Figure 17B is a schematic diagram of another background replacement device provided by an embodiment of the present application;

Figure 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

Figure 19 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and in detail below with reference to the accompanying drawings. Among them, in the description of the embodiments of this application, unless otherwise stated, "/" means or, for example, A/B can mean A or B; "and/or" in the text is only a way to describe related objects. The association relationship means that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiment of the present application , "plurality" means two or more than two.

Hereinafter, the terms “first” and “second” are used for descriptive purposes only and shall not be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of this application, unless otherwise specified, “plurality” The meaning is two or more.

Currently, when a user conducts a video conference through an electronic device, the electronic device obtains the video containing the user and displays the video image frame in real time. The video image frame includes the user image and the environment background image around the user, so that during the video conference It may lead to the leakage of user privacy. In order to protect the user's privacy, the user can choose to replace the environmental background image in the video image frame with a preset background image.

Exemplarily, FIG. 1A illustrates a user interface 110 for video conferencing of an electronic device in an embodiment of the present application. As shown in FIG. 1A , the user interface 110 of the video conference may include a real-time displayed video image frame 111 and a background replacement control 112 . Each video image frame may include an area where the background is located and an area where the target object is located. The area where the target object is located may be called the foreground area, and the area where the background content is located may be called the background area. As shown in (a) of FIG. 1A , the video image frame 111 includes an area 1111 and an area 1112. As shown in (b) of Figure 1A , area 1111 is the background area, and area 1112 is the foreground area, that is, the area where the target object is located. The user can click on the background replacement control 112, In response to the user operation, the electronic device may replace the background content in the background area and display the user interface 120 .

As shown in Figure 1B, the electronic device displays user interface 120. The user interface 120 may include the replaced video image frame 121 and the background replacement control 122. Among them, area 1211 included in the video image frame 121 is the replaced background area, and area 1212 is the foreground area. It can be seen that the hair bulges of the target object do not exist in the area 1211A and the area 1211B in the video image frame 121. That is to say, when the electronic device performs background replacement, some limbs or parts of some limbs of the target object It was replaced as a background area, resulting in a replacement error. Among them, in background replacement, the image segmentation model is first used to segment the foreground area and background area in the video or image to obtain the target object and the initial background content, and then the initial background content of the background area in the video or image is replaced. Get the replaced image or video for the preset background content. In the process of dividing the foreground area and the background area, because the image segmentation model mistakenly regards some body parts or parts of some body parts of the target object as the initial background content, that is, some body parts or parts of some body parts of the target object Segmented into background areas, resulting in inaccurate image segmentation. Furthermore, during the background replacement process, some body parts or parts of some body parts of the target object are replaced as background content. In this way, the limb parts of the target object in the replaced image will be incomplete, that is, some limb parts of the target object are missing or part of some limb parts are missing. For example, the hair raised portions of area 1211A and area 1211B in Figure 1B.

In background replacement technology, the accuracy of image segmentation will affect the effect of background replacement. Generally, the more accurate the image segmentation, the smaller the probability that the electronic device will mistakenly replace the content in the target object area with the preset background content when performing background replacement.

Currently, image segmentation involves pre-training an image segmentation model and then using the image segmentation model to segment video image frames. Specifically, first, the electronic device needs to collect a large number of images or videos including target objects to build an image data set. Then, the electronic device uses the image data set to train the initial image segmentation model to obtain an optimized image segmentation model. The optimized image segmentation model can segment the target object from the target image more accurately. The accuracy of image segmentation depends on the quality of the image dataset used during the training of the image segmentation model. Generally speaking, pre-collected image datasets cannot capture all individual characteristics. For example, the target object's hairstyle, headwear, appearance, etc., these characteristics can change over time.

When the quality of the image data set during training is low, that is, it does not contain some individual features of the target object, when using the trained image segmentation model to segment the image containing these individual features of the target object, the target object of the image The areas containing these individual features will be regarded as background areas, which will lead to poor image segmentation accuracy and affect the user experience. For example, when training the image segmentation model, collect images of the target object with short hair. As time goes by, the hairstyle of the target object will change. At this time, when using the original image segmentation model for image segmentation, it is easy to segment the target object. The area where the target object's hairstyle has changed is mistakenly segmented into the background area, resulting in inaccurate segmentation and subsequent replacement errors, affecting the user experience.

Therefore, embodiments of the present application provide a background replacement method. In this method: first, the electronic device can first replace the acquired first image based on the first image segmentation lightweight model, and replace the first image in the first image. The background content is replaced with the second background content to obtain the second image. The electronic device displays the second image after background replacement for the first image on the display screen. When the user thinks that the background replacement accuracy of the second image is not high, the user can choose to retrain the first image segmentation lightweight model, that is, the electronic device can respond to the user's background replacement training operation, and the electronic device can display the image on the display screen A first gesture diagram is displayed, the first gesture diagram is used to indicate the first gesture, and the first gesture diagram is used to guide the user to record a video or image according to the first gesture included in the first gesture diagram. Afterwards, the electronic device can obtain a third image containing the user, and the electronic device can retrain the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model. Finally, the electronic device uses the target image segmentation lightweight model to segment the first image and obtains the segmentation result, Replace the first background content in the segmentation result with the second background content to obtain a fourth image. That is to say, the electronic device uses an image segmentation lightweight model to segment the first image, and determines the area where the target object is located and the area where the background content is located in the first image. When the user is not satisfied with the effect of background replacement, the user can Through the background replacement training operation, the image segmentation lightweight model is trained until the background replacement effect that satisfies the user is achieved. In this way, when users are dissatisfied with the effect of background replacement based on the current image segmentation lightweight model, they can start the training of the image segmentation lightweight model through user operations to improve the segmentation accuracy of the image segmentation lightweight model, so that when performing background replacement , improve the accuracy of background replacement and enhance user experience.

In some implementations, the electronic device may train the image segmentation lightweight model by displaying a pose map P1 on the display screen. The pose map P1 is used to guide the target object to shoot a video according to the pose G1 contained in the pose map P1. Or image. Afterwards, the electronic device acquires the image T1 containing the target object, and inputs the image T1 into the image segmentation full model and the image segmentation lightweight model M1, respectively, to obtain segmentation result 1 and segmentation result 2. The electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2, and obtains the image segmentation lightweight model M2. The electronic device inputs the image T1 into the image segmentation lightweight model M2 and obtains the segmentation result 3. Next, the electronic device determines whether the error between segmentation result 1 and segmentation result 3 is the preset condition D2. If it is met, the image segmentation lightweight model M2 is the target image segmentation lightweight model, that is, the image segmentation lightweight model M2 is available. for subsequent image segmentation; if not satisfied, the electronic device determines the poorly segmented area in segmentation result 3 based on segmentation result 1 and segmentation result 3. The poorly segmented area can be used as the area that needs to be focused on training in the next round, and then Determine the pose map P2 that contains poorly segmented regions. The electronic device displays the posture diagram P2 on the display screen, and the posture diagram P2 is used to guide the user to shoot a video or image according to the posture G2 contained in the posture diagram P2. The electronic device obtains the image T2. According to the above steps, the electronic device uses the image T2 which is the image segmentation lightweight model M2 to continue training until the segmentation results of the full image segmentation model and the segmentation results of the image lightweight model meet the preset condition D2.

Among them, the image segmentation lightweight model is obtained by cropping and quantifying based on the image segmentation full model. The number of model parameters in the image segmentation full model is greater than the number of model parameters in the image segmentation lightweight model. Generally speaking, the segmentation results of the full image segmentation model are more accurate than the segmentation results of the lightweight image segmentation model, but the calculation amount of the full image segmentation model is relatively large. In order to reduce the amount of model calculations, an image segmentation lightweight model with fewer model parameters is generally deployed on electronic devices for image segmentation. However, the segmentation effect of the image segmentation lightweight model is not as good as the segmentation results of the image segmentation full model. Therefore, in the image During the training process of the segmentation lightweight model, the image segmentation full model is used to guide the training of the image lightweight model to be trained, so that the performance of the image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the segmentation of the image segmentation lightweight model The effect is close to that of the full image segmentation model. In this way, the calculation amount of the electronic device can be reduced while ensuring the effect of image segmentation. Regarding the training of the lightweight model for image segmentation, please refer to the following embodiments for details and will not be described again here.

Among them, segmentation result 1 and segmentation result 3 are used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the background content indicated in segmentation result 1 is located is different from the area where the background content indicated in segmentation result 3 is located.

Among them, the electronic device determines the poorly segmented area based on segmentation result 1 and segmentation result 3, that is, based on the segmentation result of the previous round of image segmentation lightweight model, it determines the poorly segmented area, and then filters out the areas containing poorly segmented images in the representative posture library. The posture map of the best area is used as training data to guide the next round of training. In this way, the training data is actively screened and the quality of the training data is improved. In the next round of training of the image segmentation lightweight model, user images containing these poorly segmented areas are obtained and focused on training these poorly segmented areas, that is, personalized training is conducted based on the individual characteristics of the target object, which improves the training of the model. effect, thus improving the segmentation accuracy of the model. This reduces the probability of substitution errors when performing background replacement. In addition, after a round of training, the electronic device will determine whether the segmentation results of the image segmentation lightweight model meet the predetermined Set conditions and stop training when the preset conditions are met, thus avoiding unnecessary training.

In the embodiment of the present application, the above-mentioned posture image P1 can also be called the first posture image, the posture G1 can also be called the first posture, the image T1 can also be called the third image, and the image segmentation lightweight model 1 can also be called It can be called the first image segmentation lightweight model, segmentation result 1 can also be called the first segmentation result, segmentation result 2 can also be called the second segmentation result, and segmentation result 3 can also be called the third segmentation result. , the image segmentation lightweight model 2 can be called the second image segmentation lightweight model.

The following is an exemplary introduction to a background replacement method provided by an embodiment of the present application with reference to FIGS. 2A to 13 .

Figure 2A shows a flow chart of a background replacement method according to an embodiment of the present application. As shown in Figure 2A, the method includes steps S101 to S104.

S101. The electronic device displays a second image obtained by replacing the first background with the first image.

The image T1 may be uploaded by the user. For example, if the user needs to replace the background of a certain picture, the user may upload the picture whose background needs to be replaced on the electronic device. It may also be an image captured by an electronic device or an image in a video frame. For example, when a user is using video conferencing software for a video conference, or when the user is video chatting with other people, the electronic device can capture the user's image.

Specifically, the first image includes an area where the target object is located and an area where the first background is located. The electronic device can determine the area where the target object in the first image is located and the area where the first background content is located based on the first image segmentation lightweight model, and the electronic device determines the first background of the area where the first background content is located in the first image. The content is replaced with the second background content to obtain the second image. The first background content and the second background content are different, and the second background content may be background content preset by the user, or may be default background content set by the electronic device at the factory.

S102. In response to the user's background replacement training operation, the electronic device displays the first posture image.

The first gesture diagram is used to instruct the user to make the first gesture. The background replacement training operation may be an operation acting on the background replacement training control. For example, after the electronic device displays the second image, the background replacement training control is displayed on the display screen. When the user is dissatisfied with the replacement effect of the second image obtained by replacing the first background based on the first image segmentation lightweight model, he can click the replacement training control to start the first image segmentation lightweight model training. The background replacement training operation can also be a voice command or a button pressing operation. This application does not limit this.

In some embodiments, in response to the user's background replacement training operation, in the video conference scenario, it can also be interpreted as clicking the background replacement training control, and then ending the series of operations of the video conference. For example, after the electronic device detects that the user has acted on the background replacement training control, the electronic device can display the first pose image after the video conference ends, and then start training the image segmentation lightweight model.

S103. The electronic device acquires a third image, where the user's posture in the third image is the first posture.

Wherein, the third image may be an image taken by the user according to the first posture indicated in the first posture diagram. It can be understood that the third image can also be an image in the video frame. The target object records the video according to the first posture 1 indicated in the first posture diagram. The electronic device obtains the video and intercepts a frame of image in the video. as the third image.

In some optional embodiments, the third image may be an image set including multiple images. For example, the electronic device can capture multiple images at a time, intercept multiple frames of images from the recorded video, and one of the multiple frames of images can be used as a third image.

S104: The electronic device displays a fourth image that replaces the first image with a second background.

Specifically, the electronic device can train the first image segmentation lightweight model based on the third image to obtain the target image segmentation lightweight model. By inputting the first image into the target image segmentation lightweight model for image segmentation, the area where the target object is located and the area where the first background content is located in the first image can be determined. The electronic device replaces the first background of the area where the first background content is located in the first image with the second background content, obtains a replaced fourth image, and displays the fourth image.

It can be understood that after detecting the background replacement training operation, the electronic device displays the first gesture image to instruct the user to make the first gesture, and then the electronic device obtains a third image including the user, and the user's gesture in the third image is The first gesture. The electronic device uses the third image as training data for the first image segmentation lightweight model, and can obtain the latest individual characteristics of the user. In this way, the target image segmentation lightweight model trained using the third image can segment the image more accurately. In other words, the target image segmentation lightweight model can more accurately segment the target object and background content from the image. Therefore, for the second image obtained by performing the first background replacement on the first image based on the first image segmentation lightweight model, and the fourth image obtained by performing the first background replacement on the first image based on the target image segmentation lightweight model , the size of the area where the target object is located in the fourth image is closer to the size of the area where the target object is located in the first image than in the second image. That is, the accuracy of background replacement of the fourth image is higher than the accuracy of background replacement of the second image.

The specific training process for the electronic device to obtain the lightweight model of the target image based on the third image training may refer to the description in the embodiment in FIG. 2B and will not be described again here.

In a possible implementation, FIG. 2B schematically illustrates the process of how an electronic device trains to obtain a lightweight model for target image segmentation. As shown in Figure 2B, the process of training electronic equipment to obtain a lightweight model for target image segmentation is as follows:

S201: Based on the first operation, the electronic device displays the posture map P1, and the posture map P1 indicates the posture G1.

In a possible implementation, before step S101, when the electronic device satisfies the preset condition D1, the electronic device displays a user interface A, and a prompt box may be displayed in the user interface A, and the prompt box is used to prompt the target Object whether to retrain the image segmentation lightweight model M1. Among them, the electronic device is pre-configured with the image segmentation lightweight model M1. The preset condition D1 may include:

The electronic device detects that the usage time of the image segmentation lightweight model M1 exceeds the preset time, or the electronic device periodically detects the segmentation effect of the image segmentation lightweight model M1, and when it is detected that the segmentation effect of the image segmentation lightweight model M1 meets the preset conditions D2.

A preset time period can be configured in the electronic device, and the preset time period can be one month, one year, etc. The embodiments of this application do not limit the specific value of the preset duration. That is to say, when the electronic device detects that the image segmentation lightweight model M1 has been used for longer than one month, the electronic device can restart the training of the image segmentation lightweight model M1.

Regarding the preset condition D2, please refer to the description in step S106 below, which will not be described again here.

For example, the user interface A may be user interface 210 as shown in FIG. 3A. As shown in FIG. 3A , the user interface 210 may include: a status bar 211 , a calendar indicator 212 , a weather indicator 213 , and a prompt box 214 .

The status bar 211 may include one or more signal strength indicators of mobile communication signals, one or more signal strength indicators of wireless fidelity (WiFi) signals, a battery status indicator, and a time indicator. Calendar indicator 212 may be used to indicate the current time. Weather indicator 213 may be used to indicate weather type.

The prompt box 214 includes prompt information 214a, a confirmation control 214b and a cancel control 214c. Among them, the prompt information 214a is used to prompt the target object whether to retrain the image segmentation lightweight model M1. As shown in Figure 3A, the prompt information 214a can be "Do you want to start the process of rebuilding the background replacement engine?", and the determination control 214b is used to determine The image segmentation lightweight model M1 is retrained, and the cancel control 214c is used to cancel the retraining of the image segmentation lightweight model M1. It can be understood that the prompt box 214 can be displayed in a layered manner on the user interface 210 .

It can be understood that the embodiment of the present application does not limit the shape of the prompt box 214 and the specific content in the prompt box 214.

In some optional embodiments, the user interface A may be user interface 220 as shown in Figure 3B. As shown in FIG. 3B , the user interface 220 includes a prompt box 221 . For the relevant description of the prompt box 221, please refer to the relevant description of the above-mentioned prompt box 214, which will not be described again here.

In some optional embodiments, the user interface A may be user interface 230 as shown in Figure 3C. As shown in FIG. 3C , the user interface 230 includes a calendar indicator 232 and a prompt box 231 . For the relevant description of the calendar indicator 232, please refer to the relevant description of the above-mentioned calendar indicator 212, which will not be described again here; for the relevant description of the prompt box 231, please refer to the relevant description of the above-mentioned prompt box 214, which will not be described again here.

The first operation may be a touch operation acting on the determination control 214b in FIG. 3A. In response to the touch operation, the electronic device displays the gesture map P1. The posture diagram P1 is used to indicate the posture G1. The posture G1 may be, for example, the posture of arranging the earphones with both hands. It can be understood that the above posture G1 is only an example. In practical applications, the posture G1 can also be other postures, such as raising hands, holding the head, etc. The specific posture form is not limited in this application. In the embodiment of the present application, the first operation may be called a background replacement training operation. In an optional implementation manner, the first operation may also be an operation acting on the background replacement training control. For example, when an electronic device uses the image segmentation lightweight model M1 to segment an image, replace the background, and display a user interface containing the replaced image, the electronic device can display a background replacement training control in the user interface, and the background replacement training The control is used to train the image segmentation lightweight model M1.

In some optional embodiments, based on the first operation, in the video conference scenario, it can also be interpreted as clicking the background replacement training control, and then ending the video conference series of operations. For example, after the electronic device detects that the user has acted on the background replacement training control, the electronic device can display the first pose image after the video conference ends, and then start training the image segmentation lightweight model.

Exemplarily, referring to FIG. 3D , FIG. 3D exemplarily shows a user interface 240 of the electronic device displaying the gesture map P1. As shown in FIG. 3D , the user interface 240 includes: recording guidance box 241 , prompt information 242 , confirmation control 243 , and return control 244 . Among them, the recording guidance frame 241 is used to display the posture picture P1, and the posture picture P1 is used to indicate posture 1, and posture 1 is the posture of arranging the earphones with both hands. The prompt information 242 is used to prompt the target object to complete the action corresponding to gesture 1. For example, the prompt information 242 may be "action requirement: arrange the earphones with both hands" or "please complete the specified action in the white area in the recording guidance box". The return control 244 is used to exit the current user interface 240 and return to the upper level user interface, such as the user interface 220. The determination control 243 is used to obtain photos or videos taken by the electronic device. When the electronic device detects the touch operation of the determination control 243, in response to the touch operation, the electronic device displays the user interface 250.

It can be understood that the above-mentioned recording guidance frame 241 and prompt information 242 are only examples, and the embodiment of the present application does not limit the shape of the recording guidance frame 241 and the prompt information 242 and the specific contents of the recording guidance frame 241 and the prompt information 242 .

As shown in Figure 3E, the user interface 250 includes: recording effect preview box 251, prompt information 252, confirmation control 253, and return control 254. The recording effect preview frame 251 is used to display photos or videos currently taken by the electronic device. The return control 254 is used to exit the current user interface 250 and return to the upper level user interface, for example, the user interface 240. The determination control 253 is used to obtain photos or videos taken by the electronic device. When the electronic device detects the touch operation of the determination control 253, in response to the touch operation, the electronic device obtains an image containing the target object. The image includes the area where the target object is located and the area where the background is located. The posture of the target object is The pose shown in P1.

It can be understood that the above-mentioned recording effect preview box 251 and prompt information 252 are only examples, and the embodiment of the present application does not make any reference to the shapes of the recording effect preview box 251 and prompt information 252 or the specific contents of the recording effect preview box 251 and prompt information 252. limited.

In some optional embodiments, the electronic device can display the recording guidance box 241 and the recording effect preview box 251 on the same user interface. For example, when the electronic device detects a touch operation acting on the determination control 214b, in response to the touch operation, the electronic device displays the user interface 260. As shown in Figure 3F, the user interface 260 includes: a recording guidance box 261, Recording effect preview box 262, prompt information 263, return control 264, and confirmation control 265. in:

For the relevant description of the recording guidance box 261, please refer to the relevant description of the above-mentioned recording guidance box 241, which will not be described again here.

For the relevant description of the recording effect preview box 262, please refer to the relevant description of the above-mentioned recording effect preview box 251, which will not be described again here. For the relevant description of the prompt information 263, please refer to the relevant description of the above-mentioned prompt information 252, which will not be described again here. The return control 264 is used to return to the previous level user interface, and the determination control 265 is used to obtain photos or videos taken by the electronic device.

S202. The electronic device acquires the image T1. The posture of the target object in the image T1 is the posture G1 indicated in the posture map P1.

Specifically, the image T1 is an image of the target object photographed according to the posture G1 in the posture diagram P1, and the posture of the target object included in the image T1 is the posture G1 indicated in the posture diagram P1. For example, the image T1 may be the image in the recording effect preview box 251 in the above-mentioned embodiment of FIG. 3F.

It can be understood that the image T1 can also be an image in a video frame. The target object records a video according to the posture G1 indicated in the posture diagram P1. The electronic device obtains the video and intercepts a frame of image in the video as the image T1.

In some optional embodiments, the image T1 may be an image set including multiple images. For example, the electronic device can capture multiple images at one time, intercept multiple frames of images from the recorded video, and one frame of the multiple frames of images can be used as the image T1.

S203. The electronic device inputs the image T1 into the full image segmentation model to obtain segmentation result 1, and inputs the image T1 into the image segmentation lightweight model M1 to obtain segmentation result 2.

Among them, the full image model is a pre-trained machine learning model with high image segmentation accuracy. In other words, the full image segmentation model is a model obtained after training and convergence based on the initial full image segmentation model. The image segmentation lightweight model M1 is a model trained based on the initial image segmentation lightweight model. It can be understood that the full image segmentation model and the initial image segmentation lightweight model can be pre-trained on the electronic device, or can be pre-configured by the electronic device, which is not limited in this application.

In some embodiments, the initial image segmentation lightweight model is obtained by cropping and quantizing the initial image segmentation full model, and the number of model parameters in the initial image segmentation full model is greater than the number of model parameters in the initial image segmentation lightweight model. Among them, the image segmentation full model is used to guide the training of the image lightweight model to be trained to obtain the target image segmentation lightweight model, so that the performance of the target image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the target image segmentation lightweight model is The segmentation effect of the quantitative model is close to or consistent with the segmentation effect of the full-quantity model for image segmentation.

Specifically, the segmentation result is used to indicate the area where the target object is located and the area where the background content is located in the segmented image, and the segmentation result may include pixel information of the pixels in the segmented image. That is to say, the segmentation result 1 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the target object is located and the area where the background content is located indicated in the segmentation result 1 is after segmentation by the full image segmentation model. , the segmentation result 1 includes the pixel information 1 of the pixels in the image T1. Segmentation result 2 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the target object is located and the area where the background content is located indicated in the segmentation result 2 are segmented by the image segmentation lightweight model 1. The segmentation result 2 includes pixel information 2 of the pixels in the image T1.

It can be understood that different image segmentation lightweight models have different segmentation results for the same image, that is to say, pixel information 1 and pixel information 2 are different, that is, the pixel points in image T1 included in segmentation results 1 and 2 The pixel information is different.

In some embodiments, the pixel information of a pixel may be a probability value that the pixel is a foreground pixel. Specifically, the segmentation result may be a predicted foreground probability result of the image T1. The predicted foreground probability result includes a probability value when each pixel in the image T1 is a foreground pixel, where the probability value is a real number between 0 and 1. For example, when the foreground area in the segmentation result The probability value of a pixel in the domain being a foreground pixel is 1, and the probability value of a pixel in the background area being a foreground pixel is 0.

In other embodiments, the pixel information of the pixel point may also be the pixel value of the pixel point, for example, RGB value, grayscale value, etc. The segmentation result can be a binary image corresponding to the image T1, used to distinguish the foreground area and the background area, where the pixel value of the pixel in the foreground area is 255 and the pixel value of the pixel in the background area is 0. Alternatively, the pixel value of the pixels in the foreground area may be 0, and the pixel value of the pixels in the background area may be 255.

Exemplarily, as shown in Figure 4, Figure 4 exemplarily shows the segmentation result, in which the black part is the background area, the pixel value of the pixels it contains is 255, and the white part is the foreground area, that is, where the target object is located. Area containing pixels with a pixel value of 0.

In other embodiments, the pixel information of the pixel can also be the foreground label of the pixel, and the foreground label can be a numerical value, for example, a numerical value of 1 or a numerical value of 0. For example, when the image is input into the image segmentation lightweight model and the predicted pixel is a foreground pixel, a foreground label of 1 is added to the pixel; when the pixel is a background pixel, a label of 0 is added to the pixel. wait.

S204. The electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2, and obtains the image segmentation lightweight model M2.

Specifically, the electronic device calculates the error value between the segmentation result 1 and the segmentation result 2, and uses the error value to train the image segmentation lightweight model 1 to adjust the model parameters of the image segmentation lightweight model 1 to obtain the image Split lightweight model 2.

Specifically, Figure 5 exemplarily shows the training process of the image segmentation lightweight model. As shown in Figure 5, the image T1 is segmented and input into the image segmentation full model and the image segmentation lightweight model M1, and segmentation results 1 and 2 are obtained. Then the error between segmentation result 1 and segmentation result 2 is determined, and the error is used to correct the model parameters of the image segmentation lightweight model M1 to obtain the trained and corrected lightweight model, that is, the image segmentation lightweight model M2.

For example, the image segmentation full model and the image segmentation lightweight model may be a deep neural network model, a convolutional neural network model, etc., which are not limited in the embodiments of the present application. For example, the full image segmentation model can be a deep neural network model A1, and the image segmentation lightweight model can be a deep neural network model A2 obtained by cropping the deep neural network model A1. The model parameters of the deep neural network model A2 are smaller than the model parameters of the deep neural network model A1.

S205, the electronic device inputs the image T1 into the image segmentation lightweight model M2, and outputs the segmentation result 3.

Specifically, after the electronic device obtains the image segmentation lightweight model M2 in the first round of training, the electronic device tests the image segmentation lightweight model M2. That is, the electronic device inputs the image T1 to the image segmentation lightweight model M2 and obtains the output result 3. Segmentation result 3 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. For the relevant description of the segmentation result 3 indicating the area where the target object is located and the area where the background content is located in the image T1, please refer to the above segmentation result 1 and segmentation. The relevant descriptions in Result 2 will not be repeated here.

S206: The electronic device determines whether segmentation result 1 and segmentation result 3 satisfy the preset condition D2. If not, step S207 is executed; if yes, step S209 is executed.

Among them, the preset condition D2 is that there is a target area in the segmentation result 3, that is, a poorly segmented area. That is to say, with segmentation result 1 as the label, the electronic device first calculates the error between segmentation result 1 and segmentation result 3. When segmentation result 3 has a poorly segmented area relative to segmentation result 1, it determines segmentation result 1 and segmentation result 3. The difference between results 3 meets the preset conditions.

Specifically, the electronic device first calculates the difference in pixel information between segmentation result 1 and segmentation result 3, that is, the electronic device Calculate the difference between the pixel information of the same pixel in segmentation result 1 and segmentation result 3, determine the pixel whose pixel information difference is greater than the first threshold as the first target pixel, and then compare the first target pixel with image T2 Match the limb area of the target object, and determine whether the number of first target pixel points in the limb area of the target object is greater than the second threshold. If the number of first target pixel points in a limb area of the target object is greater than second threshold, the limb area is the target area, that is, a poorly segmented area, and the electronic device determines that segmentation result 1 and segmentation result 3 satisfy the preset condition D2. In this embodiment of the present application, one or more limbs included in the target area may be called a third limb.

In some optional embodiments, the electronic device can capture multiple images or videos at one time, input the multiple images into the full image segmentation model and the lightweight image segmentation model, and obtain multiple segmentation results of the full image segmentation model and the image segmentation results. Multiple segmentation results of the lightweight model, the electronic device determines whether the multiple differences between the multiple segmentation results of the full image segmentation model and the multiple segmentation results of the image segmentation lightweight model satisfy the preset condition D2, among the multiple differences When the number of differences satisfying the preset condition D2 is greater than the preset number threshold, the segmentation result is considered to satisfy the preset condition D3.

It can be understood that in the embodiment of the present application, the above-mentioned preset condition D2 may also be called the first preset condition.

Illustratively, with reference to Figure 6, the specific process of determining the target area in the embodiment of the present application will be introduced below, taking the segmentation result as the probability value that each pixel in the image T1 is a foreground pixel as an example.

S2061. The electronic device determines the first target pixel, which is the poorly segmented pixel in segmentation result 3.

Specifically, the probability value that pixel point i in segmentation result 1 is the foreground is Y1i, and the probability value that pixel point i in segmentation result 3 is the foreground is Z1i. Calculate the difference between pixel point i in segmentation result 1 and segmentation result 3. The absolute value of is Hi:

Hi＝|Y1i–Z1i|

When the absolute value of the difference between pixel i is greater than the first threshold, pixel i is a poorly segmented pixel, that is, the first target pixel.

S2062. The electronic device inputs the image T1 into the limb area detection model to obtain a limb area map corresponding to the target object in the image T1. The limb area map includes the area where one or more limbs of the target object in the image T1 are located.

In some embodiments, the limb area detection model may also be called the human body area detection model. The limb area refers to the area where the limb is located. The limb may include the head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left Upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right foot, left hip, left thigh, left calf, left foot and other limbs.

It can be understood that the above-mentioned division of limbs is used as an example. In actual applications, other division methods are possible, and this application is not limited thereto.

S2063: The electronic device matches the segmentation result 3 with the limb area map, and determines the number of first target pixels in the area where multiple limbs of the target object are located.

Specifically, the electronic device matches the pixel points in the segmentation result 3 with the areas where the multiple limbs of the target object are located in the image T1, and obtains the number of corresponding first target pixel points in the area where the multiple limbs of the target object are located.

S2064: The electronic device determines the target area based on the number of first target pixels in the area where the multiple limbs of the target object are located.

Specifically, when the number of first target pixels in the area where one of the limbs is located is greater than the second threshold, then the area where the limb is located is the target area. For example, the preset value may be 1000. When the number of target pixels contained in the area where the left hand is located is greater than 1000, the area where the left hand is located is considered to be the target area, that is, the area where the left hand is located is a poorly segmented area.

S207: The electronic device replaces the initial background content in the image T1 with the preset background content to obtain the replaced image T2.

Specifically, when segmentation result 1 and segmentation result 3 satisfy the preset conditions, the electronic device determines that the image segmentation lightweight model M2 is the target image segmentation lightweight model, and the target image segmentation lightweight model is used for subsequent image segmentation. Model. Based on the segmentation result 3, the electronic device replaces the initial background content in the image T1 with the preset background content to obtain the replaced image T2.

Among them, the preset background content can be the background content preset by the target object. The target object can choose the image background content he likes as the preset background content. The preset background content can also be the default background content set by the factory of the electronic device.

In this embodiment of the present application, image T2 may be called the seventh image. The initial background content in image T1 may be called third background content.

S208, the electronic device displays the replaced image T2.

For example, as shown in FIG. 7 , the electronic device can display a user interface 270 , which includes: a replacement effect box 271 , prompt information 272 , a return control 273 , and a confirmation control 274 . Among them, the replacement effect box 271 is used to display the replaced image, the prompt information 272 is used to prompt the target object whether he is satisfied with the segmentation effect of the image segmentation lightweight model 2, the return control 273 is used, and the determination control 274 is used to determine the image segmentation lightweight model. 2 is a lightweight model for target image segmentation. When the target object is satisfied with the replacement effect, the target object can click the OK control 274. At this time, the electronic device ends the training of the image segmentation lightweight model M2, and the image segmentation lightweight model 2 is the target image segmentation lightweight model, that is, Image segmentation lightweight model is used for subsequent image segmentation. In this way, redundant training can be avoided. When the target subject is not satisfied with the replacement effect, the target subject can click return control 273, and the electronic device continues the next round of training.

In some possible implementations, when the electronic device detects a touch operation on the return control 273, in response to the touch operation, the electronic device modifies the first threshold to a third threshold. Wherein, the third threshold is smaller than the first threshold. Then, the electronic device determines whether the segmentation result 1 and the segmentation result 3 satisfy the second preset condition. If the difference between the segmentation result 1 and the segmentation result 3 does not satisfy the second preset condition, the electronic device determines whether the segmentation result 1 and the segmentation result 3 satisfy the second preset condition. Result 3 determines the pose map P, and uses this pose map P to continue training the image segmentation lightweight model. For the relevant description of the electronic device determining the posture image P based on the segmentation result 1 and the segmentation result 3, please refer to the relevant description in the following embodiments, and will not be described again here.

Wherein, the second preset condition is that the number of the target pixel points in the area where the third limb is located in the first image is greater than the second threshold; the target pixel point is the sum of the pixel information of the pixel points in the first segmentation result. The difference in pixel information of the pixels in the second segmentation result is greater than the third threshold.

For example, the electronic device calculates the pixel information of the pixels in the segmentation result 1 and the segmentation result 3, and the electronic device determines that the difference between the pixel information of the pixels in the segmentation result 1 and the pixel information in the segmentation result 3 is greater than the third Threshold pixels, these pixels are poorly segmented pixels. The electronic device matches these poorly segmented pixels with the limb area of the target object in the image T2, and determines whether the number of poorly segmented pixels in the limb area of the target object is greater than the second threshold. If the limb area of the target object If the number of poorly segmented pixels in a limb region is greater than the second threshold, then the limb region is a poorly segmented region, and the electronic device determines that segmentation results 1 and 3 satisfy the preset condition D3.

S209: The electronic device determines the posture map P2 based on the segmentation result 1 and the segmentation result 3.

Specifically, the electronic device compares the segmentation result 1 and the segmentation result 3 to determine the poorly segmented area in the segmentation result 3. The electronic device selects the segmented image from the representative posture library based on the poorly segmented area in the segmentation result 3. bad area The pose graph of the limb corresponding to the domain. Wherein, the electronic device can be configured with a representative posture library, and the representative posture library includes multiple posture images. The representative posture library may be pre-constructed by the electronic device. For details on the construction of the representative posture library, please refer to the subsequent description in the embodiment of FIG. 9 , which will not be described again here.

The process of the electronic device determining the posture map P2 based on the segmentation result 1 and the segmentation result 3 may specifically include:

1. The electronic device determines the target area based on segmentation result 1 and segmentation result 3.

For the relevant operations of the electronic device to determine the target area based on the segmentation result 1 and the segmentation result 3, please refer to the related operations of determining the target area based on the segmentation result 1 and the segmentation result 3 in step S106, which will not be described again here.

2. The electronic device determines the posture map P2 based on the target area.

Specifically, the target area corresponds to the limbs of the target object, and the target area may correspond to one or more limbs of the target object.

When there is one limb containing the target object in the target area, the electronic device first determines one or more posture diagrams containing the limb from the representative posture library. When there is only one pose graph containing the limb, the pose graph is pose graph P2. When there are multiple pose images containing the limb, the electronic device randomly selects one pose image from the multiple pose images as the pose image P2, or determines the one with the largest area where the limb is located among the multiple pose images. The posture diagram is posture diagram P2. In some embodiments, the limb included in the pose diagram P2 may be called the first limb.

When the target area contains multiple limbs of the target object, the electronic device first determines one or more posture diagrams containing the multiple limbs from the representative posture library. Similarly, when there is one pose graph including the multiple limbs, the pose graph is the pose graph P2. When there are multiple posture diagrams containing the multiple limbs, the electronic device randomly selects one posture diagram from the multiple posture diagrams as the posture diagram P2, or the electronic device determines the location of the multiple limbs in the multiple posture diagrams. The pose map with the largest area is pose map P2. Specifically, the electronic device calculates the number of target pixels corresponding to the areas where the multiple limbs are located in multiple posture images, wherein the attitude image has the largest number of corresponding target pixels in the area where the multiple limbs are located. As pose diagram P2. In some embodiments, the multiple limbs included in the posture diagram P2 may be called first limbs.

In the embodiment of the present application, the posture graph P2 may be called the second posture graph.

S210, the electronic device displays the attitude map P2, and the attitude map P2 indicates the attitude G2.

Specifically, the electronic device displays the gesture image P2 on the display screen, and the gesture image P2 is used to instruct the user to make the gesture G2. The posture diagram P2 may be the posture diagram shown in FIG. 3A above. The posture G2 may be the posture of arranging the earphones with both hands in the embodiment of FIG. 2B. It can be understood that the above posture G2 is only an example. In practical applications, the posture G2 can also be other postures, such as raising hands, holding the head, etc. The specific posture form is not limited in this application.

In this embodiment of the present application, posture G2 may also be called the second posture.

S211. The electronic device acquires the image T3. The posture of the target object in the image T3 is the posture G2 indicated in the posture diagram P2.

Specifically, the image T3 is an image taken by the target object in the posture indicated in the posture diagram P2 or a frame of the video recorded. Image T3 contains the target object.

In this embodiment of the present application, image T3 may be called the fifth image.

S212, according to the above steps S203-S209, the electronic device trains the image segmentation lightweight model M2 based on the image T3 and the full image segmentation model until the end condition of the model training is met and the target image segmentation lightweight model is obtained.

Specifically, multiple rounds of iterative training are performed on the image segmentation lightweight model according to the above steps. Each round of iterative training is performed by adjusting Through the model parameters of the initial image segmentation lightweight model in this round, the model gradually converges, and the target image segmentation lightweight model has been obtained.

The end condition of model training can be that the number of iterative training of the image segmentation lightweight model reaches the preset number of iterations, or it can be that the image segmentation processing performance index of the image segmentation lightweight model after adjusting parameters reaches the preset index. For example, the preset index may be that the segmentation result of the image segmentation lightweight model and the segmentation result of the image segmentation full model satisfy the preset condition D2.

In some implementations, the electronic device inputs the image T3 into the full image segmentation model and the image segmentation lightweight model 2 respectively to obtain segmentation results 4 and 5, and the electronic device uses the segmentation results 4 and 5 to train the image segmentation lightweight model. M2, and obtain the image segmentation lightweight model M3. Then the electronic device inputs the image T3 into the image segmentation lightweight model M3 to obtain the segmentation result 6. The electronic device determines whether the segmentation results 4 and 6 satisfy the preset condition D2. When the segmentation result 4 and the segmentation result 6 satisfy the preset condition D2, In this case, the image segmentation lightweight model M3 is the target image segmentation lightweight model. The electronic device may replace the initial background content in the area where the background content is located in the image T3 with the preset background content based on the segmentation result 6 .

In some implementations, when the segmentation result 4 and the segmentation result 6 do not meet the preset condition D2, the electronic device determines the poorly segmented area based on the segmentation result 4 and the segmentation result 6, and then determines the poorly segmented area based on the segmentation result 4 and the segmentation result 6. The region determines the posture map P3 from the representative posture library, and the posture map P3 is used to instruct the user to make the posture G3. The relevant operations for the electronic device to determine the posture map P3 based on the segmentation result 4 and the segmentation result 6 refer to the relevant operations in the above-mentioned step S109, which will not be described again here.

After determining the posture map P3, the electronic device can display the posture map P3. For the relevant description of the electronic device display posture diagram P3, please refer to the relevant description in the above-mentioned embodiment of FIGS. 3D to 3F, and will not be described again here.

The electronic device acquires an image T4, where the image T4 is an image taken by the user according to the posture G3 in the posture diagram P3 or an image in a video frame, and the posture of the target object in the image T4 is the posture G3 indicated in the posture diagram P3.

The electronic device can train the image segmentation lightweight model M3 based on the image T4 to obtain the image segmentation lightweight model M4. The electronic device uses image T4 to test the segmentation effect of the image segmentation lightweight model M4. When the segmentation result of the image segmentation lightweight model M4 meets the preset condition D2, the image segmentation lightweight model M4 is the target image segmentation lightweight model. Model. If the segmentation result of the image segmentation lightweight model M4 does not meet the preset condition D2, the electronic device re-determines the pose map based on the segmentation result, obtains the image to train the image segmentation lightweight model M4, until the segmentation result of the image segmentation lightweight model M4 meets the preset condition. Assume condition D2.

In the embodiment of the present application, segmentation result 4 may be called the fourth segmentation result, segmentation result 5 may also be called the fifth segmentation result, and the image segmentation lightweight model M3 may be called the third image segmentation lightweight model. Segmentation result 6 may also be called the sixth segmentation result. The pose graph P3 may be called the third pose graph, and the pose G3 may be called the third pose. Image T4 may also be called the sixth image.

S213, the electronic device obtains the original image T5, inputs the original image T5 into the target image segmentation lightweight model, and determines the area where the target object is located and the original background content in the original image.

The original image may be an image or a frame of a video uploaded by the target object, or it may be a frame of an image or video captured by the electronic device including the target object.

In this embodiment of the present application, the original image T5 may also be called the first image, and the original background content may also be called the first background content.

S214: Replace the original background content of the original background content area in the original image T5 to obtain the replaced image T6.

Specifically, after the electronic device separates the area where the original background content is located and the area where the target object is located in the original image T5, it synthesizes the area where the target object is located and the preset background into a new image, that is, the replaced image T6. Among them, the background content in the replaced image is different from the original background content. The preset background can be set by the target object itself, or it can be the default setting of the electronic device, for example, it can be a landscape image, etc.

In the embodiment of the present application, image T6 may also be called the fourth image.

Exemplarily, FIG. 8 shows a schematic diagram in which an electronic device user uses the image segmentation lightweight model M1 and the target image segmentation lightweight model to segment the original image, and then performs replacement to obtain a replaced image.

As shown in (a) of Figure 8, when the user clicks the background replacement control 112, the electronic device divides the foreground area and the background area in the video image frame 811, and can separate the foreground area and the background area. As shown in (b) in Figure 8, the video image frame 811 is segmented through the image segmentation lightweight model M1, and the segmentation result 7 can be obtained. For a better illustration, the segmentation result 7 shown in (b) in Figure 8 is The foreground area and background area are distinguished by different colors. The white area represents the foreground area segmented by the image segmentation lightweight model M1, and the black area represents the background area segmented by the image segmentation lightweight model M1. In the segmentation result shown in (b) in Figure 8, it can be seen that the image segmentation lightweight model M1 mistakenly regards the edge area of the target object as the background area, that is, the hair bulge in area 8111B and area 8111A in the figure. , as the background content in area 811. As shown in (c) of Figure 8, after using the preset background content to replace the background content in the video image frame 811, the replaced video image frame 821 is obtained, with the hair in the area 8111B and the area 8111A bulging. The portion is replaced as background content, and there is no hair protruding portion in areas 8211A and 8211B in the replaced video image frame 821.

As shown in (d) of Figure 8, when the user clicks the background replacement control 812, the electronic device divides the foreground area and the background area in the video image frame 811, and can separate the foreground area and the background area. As shown in (e) of Figure 8 , the video image frame 811 is segmented through the target image segmentation lightweight model, and a segmentation result 8 can be obtained. It can be seen that the target image segmentation lightweight model can well distinguish the hair bulge of the target object in area 8111A and area 8111B from the background area. Therefore, as shown in (f) of FIG. 8 , after replacing the video image frame 811, a video image frame 821 is obtained, and the hair bulges in the area 8111B and the area 8111A in the video image frame 821 are retained.

It is worth noting that before step S101, the electronic device can also build a representative posture library.

For example, as shown in Figure 9, the electronic device may construct a representative gesture library including the following steps:

S301, the electronic device obtains the image data set.

The image data set may be a large number of pre-collected images of the target object, or may be image frames contained in pre-collected video data of the target object. The embodiments of the present application do not limit this.

In an optional embodiment, the preset image set may be crawled from a public website or obtained from a large public image database.

The image data set contains the user's posture features and contour features. Posture features refer to the user's action behaviors, such as turning the head, turning the body, standing up and sitting down, etc. The outline feature of a user refers to the lines that make up the outer edge of the user.

S302. The electronic device inputs the image data set into the human posture estimation model and obtains multiple human posture vectors corresponding to the image data set.

Specifically, the human posture estimation model can identify the key points of the human body's bones in the image, as well as the limb vectors composed of the key nodes of the bones. Among them, the skeleton of the human body is mainly used to represent the skeletal information of the human body and can be used to describe the posture of the human body.

Among them, the number and type of skeletal key points are determined by the human posture estimation model, and different human posture estimation models output different numbers and types of skeletal key points. In the embodiment of this application, the skeletal key points of the human body are divided into 15 skeletal key points as an example for illustrative explanation. In practical applications, the skeletal key points of the human body can also be divided into 9, 17, etc. There are no restrictions on this application. Among them, 15 skeletal key points can be connected to form 14 limb vectors, and the limb vectors can be calculated from the coordinate positions of the above 15 skeletal key points.

Exemplarily, FIG. 10 illustrates a set of skeletal key point data, and only some skeletal key points and some limb vectors are shown in FIG. 10 . As shown in Figure 10, the circular point in the figure is a bone key point. Each bone key point is represented by coordinates (X, Y). The adjacent key points are connected to form a limb vector, a pair of target objects in the image. A limb vector can be called a posture vector. For example, the coordinates of bone key point 3 are (X3, Y3), and the coordinates of bone key point 4 are (X4, Y4). Bone key point 3 and bone key point 4 can be connected to form a limb vector (X3-X4, Y3- Y4), this limb vector represents a limb, which can be called the left shoulder.

S303. The electronic device inputs multiple human posture vectors corresponding to the image data set into the clustering model to obtain one or more representative posture vectors.

Specifically, multiple limb vectors can be obtained from an image, and multiple limb vectors of the image can form a posture vector. If the image data set includes multiple images, multiple posture vectors can be obtained. The electronic device maps the multiple posture vectors to a vector space, where a posture vector is a point in the vector space, and then calculates the similarity between each pixel point. The posture vectors with high similarity are gathered together to form a cluster. The vector out of the center of this cluster (i.e., the cluster center) is used as the representative attitude vector.

Exemplarily, FIG. 11 illustrates a schematic diagram of the process of clustering electronic devices to obtain representative posture vectors. As shown in (a) of Figure 11, 4 clusters are exemplarily shown. A circular point in each cluster represents a posture vector, that is, it represents the posture of a human body. For example, it can be a posture such as holding headphones with both hands or holding the headset with one hand. The black five-pointed star in the cluster represents the cluster center point of the cluster, that is, the cluster center. The cluster center vector of each cluster is selected as the representative attitude vector. As shown in (b) of Figure 11, the posture represented by the cluster center vector of cluster 1 is the posture of holding the headset with one hand; as shown in (c) of Figure 11, the posture represented by the cluster center vector of cluster 2 is the posture of holding the headset with both hands. posture.

S304. The electronic device inputs one or more representative posture vectors into the human body contour detection model, and obtains a contour map corresponding to one or more representative posture vectors.

Exemplarily, referring to FIG. 12 , FIG. 12 exemplarily shows a schematic diagram of obtaining a contour diagram representing a posture vector. Figure 12 (a) and (c) show two representative posture vectors, which represent the posture of holding the earphones with one hand and the posture of holding the earphones with both hands respectively. Figure 12 (b) and (d) show the contour images obtained based on two representative posture vectors.

S305. The electronic device inputs the training data set into the limb region detection model to obtain one or more limb region maps corresponding to the image data set.

Exemplarily, as shown in FIG. 13 , FIG. 13 exemplarily shows a schematic diagram of a limb region. As shown in Figure 13, different color areas in the figure represent different limb areas. For example, dark gray represents the area where the head is located, light gray represents the area where the left hand is located, etc.

The limbs may include head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right foot, left hip , left thigh, left calf, left foot.

S306, the electronic device converts one or more limb region maps and one or more representative posture vectors corresponding to the image data set The corresponding contour images are matched to obtain one or more pose images.

The representative posture library may include one or more posture images corresponding to the image data set.

It should be noted that for the above method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Secondly, Those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily necessary for the present invention.

It should be noted that the electronic device involved in the above embodiments may be called an electronic device 100, and the electronic device 100 may include a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, and a notebook computer. , ultra-mobile personal computer (UMPC), netbook, cellular phone, personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device , at least one of artificial intelligence (AI) devices, wearable devices, vehicle-mounted devices, smart home devices, or smart city devices. The embodiment of the present application does not place any special restrictions on the specific type of the electronic device 100 .

Next, the software architecture of the electronic device 100 in the embodiment of the present application is introduced.

FIG. 14 shows a schematic software structure diagram of the electronic device 100 provided by the embodiment of the present application.

As shown in Figure 14, the software structure of the electronic device 100 may include: a background replacement engine, a background replacement client, a collection module, a display module, and a representative posture library. in:

The interactive display module is used to receive user operations, display posture images, and display images before and after background replacement. For example, the interactive display module may receive the first operation and display the gesture diagram P1 based on the first operation. For example, the user interface in the above-mentioned FIGS. 3A-3F may be displayed, etc. Specifically, the interactive display module is used to display the posture map P1. For another example, the interactive display module may display a first image, a second image with background replacement based on the first image segmentation lightweight model, or a fourth image with background replacement based on the target image segmentation lightweight model.

Acquisition module, used to obtain images or videos of target objects. For example, the image T1 in step S201 can be obtained, where the image T1 is an image taken by the target object according to the posture G1 in the posture diagram P1, or a frame image in the video taken by the target object according to the posture G1 in the posture diagram P1.

The background replacement client is used to obtain the posture image P1 from the representative posture library according to the preset configuration after receiving the first operation, and send it to the interactive display module for display.

Used to receive the image including the target object sent by the acquisition module and send it to the background replacement engine. For example, the image including the target object may be the image T1 in step S202 described above, the image T2 in step S211, etc.

It is also used to receive the segmentation results of the full image segmentation model and the segmentation results of the image segmentation lightweight model sent by the background replacement engine, and analyze the segmentation results to obtain the pose map information required for the next round of training. Then the attitude map information is sent to the representative attitude database to obtain the corresponding attitude map, which is sent to the display module for display.

The background replacement engine is used to obtain the image of the target object from the background replacement client, and use the image of the target object and the preset image segmentation full model to train the image segmentation lightweight model. Specifically, first, the background replacement engine receives the image T1 of the target object sent by the background replacement client. The background replacement engine uses the image T1 and the preset full image segmentation model to train the image segmentation lightweight model M1 to obtain the image segmentation lightweight model. M2. Then, use the preset image segmentation full model and image segmentation lightweight model M2 to segment the image T1 of the target object, obtain segmentation result 1 and segmentation result 3, and then determine whether the difference between segmentation result 1 and segmentation result 3 meets the predetermined Assume condition D2. If the preset condition D2 is met, the image segmentation lightweight model M2 is stored for subsequent image segmentation; if the preset condition D2 is not met, segmentation result 1 and segmentation result 3 are sent to the background replacement customer. client for background replacement to determine the next round of training needs The gesture displayed in the user interface.

Represents the posture library, which is used to store posture images, receive the request instructions from the background replacement client, and send the posture images corresponding to the request instructions to the background replacement client.

The following takes the embodiment of FIG. 14 as an example to describe in detail the cooperation relationship of each module in the electronic device 100 in this embodiment of the present application. Please refer to FIG. Collaborative relationships in embodiments. As shown in Figure 15, the electronic device 100 includes: a background replacement engine, a background replacement client, a collection module, a display module, and a representative posture library. In the embodiment of Figure 15, taking two rounds of image segmentation lightweight model training to obtain the target image segmentation lightweight model as an example, the following is a detailed description:

1. The interactive display module detects the first operation. The first operation may be a touch operation acting on the determination control 214b in FIG. 3A.

2. The interactive display module sends background replacement instructions to the background replacement client.

3. The background replacement client responds to the background replacement instruction sent by the interactive display module and sends an instruction requesting to obtain the posture image to the representative posture library.

4. The representative posture library responds to the request of the background replacement client to obtain instructions for the posture map, and the representative posture library sends the posture map P1 to the background replacement client.

5. The background replacement client receives the posture picture P1 sent on behalf of the posture library, and sends the posture picture P1 to the interactive display module.

6. The interactive display module receives the posture map P1 and displays the posture map P1. Among them, the posture graph P1 indicates the posture G1.

7. The acquisition module obtains the image T1 and sends the image T1 to the background replacement client.

8. The background replacement client receives the image T1 and sends it to the background replacement engine.

9-11. Use image T1 and the full image segmentation model to train the image segmentation lightweight model M1 to obtain the image segmentation lightweight model M2. The image T1 is input into the image segmentation lightweight model M2 and the image segmentation full model respectively, and segmentation result 1 and segmentation result 3 are obtained. Then segmentation result 1 and segmentation result 3 are sent to the background replacement client.

12. The background replacement client determines whether segmentation result 1 and segmentation result 3 meet the preset condition D2. If the preset D2 condition is not met, the pose map P2 is determined based on segmentation result 1 and segmentation result 3. Among them, the posture map P2 is used to indicate the posture G2.

13. The background replacement client wants to send a request to obtain the pose image P2 on behalf of the pose library.

14. The representative posture library responds to the instruction to obtain the posture map P2 and sends the posture map P2 to the background replacement client.

15. The background replacement client receives the posture image P2 and sends it to the interactive display module.

16. The interactive display module receives the posture map P2 and displays the posture map P2.

17. The acquisition module obtains image T3 and sends image T3 to the background replacement client. Among them, the posture of the target object in the image T2 is the posture G2 in the posture diagram P3.

18. The background replacement client receives image T3 and sends it to the background replacement engine.

19-21. Use image T3 and the full image segmentation model to train the image segmentation lightweight model M2, and obtain the image segmentation lightweight model M3. The image T3 is input into the image segmentation lightweight model M3 and the image segmentation full model respectively, and segmentation results 4 and 6 are obtained. Then segmentation result 4 and segmentation result 6 are sent to the background replacement client.

22. The background replacement client receives segmentation result 4 and segmentation result 6, determines whether segmentation result 4 and segmentation result 6 satisfy the preset condition point D2, and when the preset condition D2 is met, determines the M image segmentation lightweight model M3 as A lightweight model for target image segmentation.

23. The acquisition module obtains the original image T5 and sends the original image T5 to the background replacement client.

24. After receiving the original image T5, the background replacement client sends the original image T5 to the background replacement engine.

25-27. The background replacement engine receives the original image T5, uses the target image segmentation lightweight model to segment the original image T5, and obtains the foreground area and background area. Then, the background area in the original image T5 is replaced with the preset background to obtain the replaced image T6. And send the replaced image T6 to the background replacement client.

28. The background replacement client receives the replaced image T6 and sends it to the interactive display module for display.

29. The interactive display module receives the replaced image T6 and displays the replaced image T6.

It is worth noting that the above-mentioned background replacement client, representative gesture library and background replacement engine can be deployed on the same electronic device, or on different electronic devices. For example, the background replacement client can be deployed on one electronic device, The representative posture library and background replacement engine can be deployed on another electronic device, etc., and this application does not limit this.

Next, a background replacement system provided in the embodiment of this application is introduced.

Figure 16 shows a schematic diagram of a background replacement system provided by an embodiment of the present application. As shown in FIG. 16 , the background replacement system includes an electronic device 200 and a server 300 . A communication connection may exist between the electronic device 200 and the server 300, enabling data communication between the two. in,

The electronic device 200 is used to obtain the first image and send the first image to the server;

The server 300 is configured to receive the first image, input the first image into the first image segmentation lightweight model, and determine the area where the target object is located and the area where the first background content is located in the first image; Replace the first background content with the second background content, obtain the second image, and send the second image to the electronic device;

The electronic device 200 is used to acquire a second image and display the second image; in response to the user's background replacement training operation, display a first posture map, and the first posture map is used to instruct the user to make the first posture;

The electronic device 200 is configured to obtain a third image, and send the third image to the server, where the user's posture in the third image is the first posture;

The server 300 is used to obtain the third image, and train the first image segmentation lightweight model based on the third image to obtain the target image segmentation lightweight model; input the first image into the target image segmentation lightweight model to determine the target image segmentation lightweight model. The area where the target object is located and the area where the first background content is located; replace the first background content in the first image with the second background content, obtain a fourth image, and send the fourth image to the electronic device;

The electronic device 200 is configured to receive the fourth image and display the fourth image.

Optionally, in a possible implementation, the electronic device 200 can also be used to perform any one of the above steps S201, S202, S206, S208, S209, S210, and S211. The replacement method will not be described again here.

In a possible implementation, the server 300 may also be used to perform the background replacement method that may be implemented in any one of the above steps S203, S205, S207, S213, and S214, which will not be described again here.

In some possible implementations, the electronic device 200 can also obtain the image T3, send the image T3 to the server 300, the server 300 receives the image T3, and uses the image T3 and the full image segmentation model to train the image segmentation lightweight model M2 until the model is satisfied. The end condition of training is to obtain a lightweight model for target image segmentation.

The electronic device 200 may include an interactive display module, a collection module, and a background replacement client; the server 300 may include a background replacement engine and a representative gesture library.

Optionally, in a possible implementation, the interactive display module 301 can also be used to perform any of the possible implementations of steps 1 to 2, step 6, step 16, and step 29 in the embodiment of FIG. 15 . Background replacement method. The collection module can also be used to perform any of the possible background replacement methods in steps 7, 17, and 23 in the embodiment of FIG. 15, which will not be described again here.

The background replacement client can also be used to perform steps 3, 5, 8, 12, and The method of performing any possible background replacement in step 13, step 15, step 18, step 22, step 24, and step 28 will not be described again here.

In a possible implementation, the background replacement engine 304 can also be used to perform any of the steps 9 to 11 and 19 to 21 in the above embodiment of FIG. 15 to achieve background replacement. Again.

The representative posture library can also be used to perform any possible background replacement method in steps 4 and 14 in the embodiment of FIG. 15 , which will not be described again here.

The background replacement method provided by the embodiment of the present application is described in detail above with reference to FIGS. 2A to 15 . Next, the background replacement device and electronic equipment provided by the embodiment of the present application will be described with reference to FIGS. 17A, 17B and 18 .

Figure 17A is a schematic diagram of a background replacement device provided by an embodiment of the present application. The background replacement device 400 includes a display unit 401 and an acquisition unit 402, where,

The display unit 401 is used to display the second image obtained by performing the first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and the first background. The area where the content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;

The display unit 401 is also configured to, in response to the user's background replacement training operation, the electronic device display a first posture map, and the first posture map is used to instruct the user to make the first posture;

The display unit 401 is also used by the electronic device to obtain a third image, where the user's posture in the third image is the first posture;

The display unit 401 is also used for the electronic device to display the fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is trained based on the third image. Obtained; the fourth image is obtained by replacing the first background content in the first image with the second background content.

It should be understood that the background replacement device 400 in the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD can be a complex program. Logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. When the background replacement method shown in FIGS. 2A to 13 can also be implemented through software, the background replacement device 400 and its respective modules can also be software modules.

In a possible implementation, as shown in FIG. 17B , the background replacement device also includes an image segmentation unit 403 , a determination unit 404 , a model training unit 405 , and a background replacement unit 406 . in,

The image segmentation unit 403 is used to input the third image into the full image segmentation model to obtain the first segmentation result, and is also used to input the third image into the first image segmentation lightweight model to obtain the second segmentation result; in the full image segmentation model The number of model parameters is greater than the number of model parameters in the first image segmentation lightweight model; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the background content is located in the first image.

The model training unit 405 is used to train the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain the second image segmentation lightweight model;

The image segmentation unit 403 is also used to input the third image to the second image segmentation lightweight model to obtain the third segmentation result;

Determining unit 404: When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result. The second posture map is used to instruct the user to make a second gesture. Posture, the second posture map includes the first limb, the area where the first limb is located in the first segmentation result and the area where the first limb is located in the third segmentation result The areas are different.

The model training unit 405 is configured to train the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.

The background replacement unit 406 is configured to: replace the initial background content in the area where the background content is located in the third image with the preset background content in the second segmentation result to obtain the seventh image;

Specifically, the operation of the background replacement device 400 to implement background replacement may refer to the related operations of the electronic device in the above method embodiment, and will not be described in detail here.

In some optional implementations, the above display unit 401, acquisition unit 402, image segmentation unit 403, determination unit 404, model training unit 405, and background replacement unit 406 may correspond to the above electronic device 100, and may perform the above method implementation. The operations performed by the electronic device 100 in the example will not be described again here.

In some optional implementations, the above-mentioned display unit 401, acquisition unit 402, and determination unit 404 may correspond to the above-mentioned electronic device 200, and the above-mentioned image segmentation unit 403, model training unit 405, and background replacement unit 406 may correspond to the above-mentioned server. 300.

Figure 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device 10 includes: a processor 11, a communication interface 12 and a memory 13. The processor 11, the communication interface 12 and the memory 13 are connected to each other through a bus 14. The processor 11 is used to execute instructions stored in the memory 13 . The memory 13 stores program codes, and the processor 11 can call the program codes stored in the memory 13 to perform the following operations:

Display the second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and where the first background content is located. area; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;

In response to the user's background replacement training operation, displaying a first posture diagram, where the first posture diagram is used to instruct the user to make the first posture;

Obtain a third image, and the user's posture in the third image is the first posture;

The fourth image obtained by performing the second background replacement on the first image is displayed; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is obtained based on the training of the third image; the fourth image is obtained by replacing the first image. Obtained by replacing the first background content in the image with the second background content.

In the embodiment of the present application, the processor 11 can have a variety of specific implementation forms. For example, the processor 11 can be any one or a combination of CPU, GPU, TPU or NPU. The processor 11 can also be a single processor. core processor or multi-core processor. The processor 11 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip. The above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. The processor 11 can also be implemented solely using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).

The communication interface 12 can be a wired interface or a wireless interface for communicating with other modules or devices. The wired interface can be an Ethernet interface, a controller area network (controller area network, CAN) interface or a local interconnect network. LIN) interface, the wireless interface can be a cellular network interface or a wireless LAN interface, etc.

The memory 13 may be a non-volatile memory, such as a read-only memory (ROM), a programmable ROM (PROM), an erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The memory 13 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), which is used as an external cache.

Memory 13 may also be used to store instructions and data. Additionally, the electronic device 10 may include more or fewer components than shown in FIG. 18 , or may have components configured differently.

The bus 14 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 18, but it does not mean that there is only one bus or one type of bus.

Optionally, the electronic device 10 may also include an input/output interface 15 connected with an input/output device for receiving input information and outputting operation results.

In some possible implementations, the electronic device 10 in the embodiment of the present application may correspond to the background replacement device 400 in the above embodiment, and may perform the operations performed by the electronic device 100 in the above method embodiment, which will not be described again.

In some possible implementations, the electronic device 10 may be the above-mentioned electronic device 100 or the above-mentioned electronic device 200.

Figure 19 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device 20 includes: a processor 21, a communication interface 22, and a memory 23. The processor 21, the communication interface 22, and the memory 23 are connected to each other through a bus 24. The processor 21 is used to execute instructions stored in the memory 23 . The memory 23 stores program codes, and the processor 21 can call the program codes stored in the memory 23 to perform the following operations:

Receive the first image, input the first image into the first image segmentation lightweight model, and determine the area where the target object in the first image is located and the area where the first background content is located; The background content is replaced with the second background content, and the resulting second image is,

Obtain a third image, and train the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model;

Input the first image to the target image segmentation lightweight model to determine the area where the target object is located and the area where the first background content is located in the first image;

The fourth image is obtained by replacing the first background content in the first image with the second background content.

In the embodiment of the present application, the processor 21 can have a variety of specific implementation forms. For example, the processor 21 can be any one or a combination of CPU, GPU, TPU or NPU. The processor 21 can also be a single processor. core processor or multi-core processor. The processor 21 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip. The above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. The processor 21 can also be implemented solely by using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).

The communication interface 22 may be a wired interface or a wireless interface, used for communicating with other modules or devices. The wired interface can be an Ethernet interface, a controller area network (CAN) interface or a local interconnect network (LIN) interface, and the wireless interface can be a cellular network interface or use a wireless LAN interface, etc.

The memory 23 may be a non-volatile memory, such as read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), Electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The memory 23 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), It is used as an external cache.

Memory 23 may also be used to store instructions and data. In addition, the electronic device 20 may include more or fewer components than shown in FIG. 19 , or may have different component configurations.

The bus 24 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 19, but it does not mean that there is only one bus or one type of bus.

Optionally, the electronic device 20 may also include an input/output interface 25. The input/output interface 25 is connected with an input/output device and is used for receiving input information and outputting operation results.

In some possible implementations, the electronic device 20 may be the above-mentioned electronic device 100 or the above-mentioned server 300 .

Embodiments of the present application also provide a non-transitory computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When the computer program is run on a processor, the execution of the electronic device in the above method embodiment can be realized. The specific implementation of the method steps when the processor of the computer storage medium performs the above method steps can refer to the specific operations of the electronic device in the above method embodiment, which will not be described again here.

In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive (SSD).

The steps in the methods of the embodiments of this application can be sequentially adjusted, merged or deleted according to actual needs; the modules in the devices of the embodiments of this application can be divided, merged or deleted according to actual needs.

The embodiments of the present application have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and the core idea of the present application; at the same time, for Those of ordinary skill in the art will have changes in the specific implementation and application scope based on the ideas of the present application. In summary, the content of this description should not be understood as a limitation of the present application.

Claims

A method of background replacement, characterized in that the method includes:

The electronic device displays a second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content and the first background The content is different;

In response to the user's background replacement training operation, the electronic device displays a first posture diagram, where the first posture diagram is used to instruct the user to make a first gesture;

The electronic device acquires a third image, and the posture of the user in the third image is the first posture;

The electronic device displays a fourth image obtained by performing a second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is based on the target image segmentation lightweight model. The third image is obtained by training; the fourth image is obtained by replacing the first background content in the first image with the second background content.
The method according to claim 1, characterized in that, before the electronic device displays the fourth image obtained by replacing the second background with the first image, the method further includes:

The electronic device inputs the third image into the full image segmentation model to obtain a first segmentation result;

The electronic device inputs the third image into the first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the full image segmentation model is greater than the model parameters in the first image segmentation lightweight model. The number of; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image;

The electronic device trains the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model;

The electronic device inputs the third image to the second image segmentation lightweight model to obtain the third segmentation result;

When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result, and the second posture map Used to instruct the user to make a second gesture, the second gesture diagram includes the first limb, the area where the first limb is located in the first segmentation result and the first segment in the third segmentation result The limbs are located in different areas;

The electronic device trains the second image segmentation lightweight model based on a fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.
The method of claim 2, wherein the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, including:

The electronic device acquires the fifth image;

The electronic device inputs the fifth image into the full image segmentation model to obtain a fourth segmentation result;

The electronic device inputs the fifth image into the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used to indicate the content of the fifth image. The area where the target object is located and the area where the fourth background content is located;

The electronic device trains the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result. type to obtain the third image segmentation lightweight model;

The electronic device inputs the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;

When the fourth segmentation result and the sixth segmentation result do not satisfy the first preset condition, the third image segmentation lightweight model is the target image segmentation lightweight model.
The method according to claim 3, characterized in that, after the electronic device inputs the fifth image to the third image segmentation lightweight model and obtains a sixth segmentation result, the method further includes:

When the fourth segmentation result and the sixth segmentation result satisfy the first preset condition, the electronic device determines a third posture map based on the fourth segmentation result and the sixth segmentation result, the The third posture diagram is used to instruct the user to make a third posture. The third posture diagram includes a second limb. The second limb in the fourth segmentation result is the same as the third limb in the sixth segmentation result. The two limbs are not the same;

The electronic device trains the third image segmentation lightweight model based on the sixth image to obtain the target image segmentation lightweight model, and the posture of the target object in the sixth image is the third posture.
The method according to any one of claims 2 to 4, characterized in that, in the case where the first segmentation result and the third segmentation result are different, the electronic device is based on the first segmentation result. The segmentation result and the third segmentation result determine a second pose map, including:

When the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, the electronic device determines the difference between the first segmentation result and the third segmentation result based on the difference between the first segmentation result and the third segmentation result. Determine the target area of the first image;

The electronic device determines the second pose map based on the target area.
The method of claim 5, wherein the first segmentation result and the third segmentation result include pixel information of pixels in the third image;

In the case where the difference between the first segmentation result and the third segmentation result satisfies a first preset condition, the electronic device based on the difference between the first segmentation result and the third segmentation result Determining the target area of the first image includes:

The electronic device determines the first target pixel in the third image based on the difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the third segmentation result. A target pixel is a pixel whose difference in pixel information between the first segmentation result and the third segmentation result is greater than a first threshold;

The electronic device determines a region in the third image where one or more limbs of the target object are located, and the one or more limbs include a third limb;

When the number of the first target pixels in the area where the third limb is located in the third image is greater than a second threshold, the electronic device determines where the third limb is located in the third image. The area is the target area.
The method of claim 6, wherein the electronic device determines the second gesture map based on the target area, including:

The electronic device determines the third limb of the target object contained in the target area;

The electronic device determines a second posture map including the third limb.
The method according to claim 7, wherein the electronic device determines the second posture map including the third limb, including:

The electronic device determines multiple gesture images including the third limb;

The electronic device determines the second posture map from the plurality of posture maps, and the second posture map is a region where the third limb in the plurality of posture maps contains the first target pixel. The pose picture with the most points.
The method according to claim 2, characterized in that:

In the case where the first segmentation result and the third segmentation result do not meet the first preset condition, the electronic device segments the area in the third image where the background content is located based on the third segmentation result. The third background content is replaced with the preset background content to obtain the seventh image;

The electronic device displays the seventh image, the first control and first prompt information, where the first prompt information is used to prompt training of the second image segmentation lightweight model.
The method according to claim 9, characterized in that after the electronic device displays the seventh image, the first control and the first prompt information, the method further includes:

The electronic device detects an operation on the first control, and the electronic device determines a third threshold, and the third threshold is smaller than the first threshold;

When the difference between the first segmentation result and the third segmentation result satisfies a second preset condition, the electronic device determines a fourth posture map based on the first segmentation result and the third segmentation result. , the fourth posture image is used to instruct the user to make a fourth posture; the second preset condition is: the number of the second target pixel points in the area where the fourth limb is located in the first image is greater than the Two thresholds; the second target pixel is a pixel whose difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the second segmentation result is greater than the third threshold;

The electronic device trains the second image segmentation lightweight model based on an eighth image to obtain the target image segmentation lightweight model, and the posture of the target object in the eighth image is the fourth posture.
The method according to claim 1, characterized in that, before the electronic device displays the first gesture image in response to the user's background replacement training operation, the method further includes:

When the electronic device detects that the usage time of the first image segmentation lightweight model is greater than the first time duration, the electronic device displays second prompt information and a second control, and the second prompt information is used to prompt The first image segmentation lightweight model is trained; the background replacement training operation is an operation acting on the second control.
A background replacement device, characterized by including:

A display unit configured to display a second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the target object and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content and the The first background content is different;

The display unit is further configured to display a first gesture diagram in response to the user's background replacement training operation, where the first gesture diagram is used to instruct the user to make the first gesture;

An acquisition unit, configured to acquire a third image, where the user's posture in the third image is the first posture;

The display unit is also used to display a fourth image obtained by performing a second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model The model is trained based on the third image; the fourth image is obtained by replacing the first background content in the first image with the second background content.
The device according to claim 12, characterized in that the device further includes:

An image segmentation unit, configured to input the third image to the full image segmentation model to obtain the first segmentation result;

The image segmentation unit is also used to input the third image into the first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the image segmentation full model is greater than the first image segmentation lightweight model. measuring the number of model parameters in the model; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image;

A model training unit configured to train the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model;

The image segmentation unit is also configured to input the third image to the second image segmentation lightweight model to obtain the third segmentation result;

Determining unit, configured to determine a second pose map based on the first segmentation result and the third segmentation result when the first segmentation result and the third segmentation result are different, the second pose The image is used to instruct the user to make a second posture. The second posture image includes the first limb. The area where the first limb is located in the first segmentation result is the same as the area where the first limb is located in the third segmentation result. One limb is in a different area;

The model training unit is further configured to train the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the Second posture.
The device according to claim 13, characterized in that:

The acquisition unit is also used to acquire the fifth image;

The image segmentation unit is also used to input the fifth image to the full image segmentation model to obtain a fourth segmentation result;

The image segmentation unit is also configured to input the fifth image to the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used to indicate the The area where the target object is located and the area where the fourth background content is located in the fifth image;

The model training unit is further configured to train the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result to obtain a third image segmentation lightweight model;

The image segmentation unit is also configured to input the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;

The determination unit is also configured to, when the fourth segmentation result and the sixth segmentation result do not meet the first preset condition, the third image segmentation lightweight model is the target image segmentation lightweight model. Model.
The device according to claim 14, characterized in that:

The determining unit is further configured to determine a third posture based on the fourth segmentation result and the sixth segmentation result when the fourth segmentation result and the sixth segmentation result satisfy a first preset condition. The third posture diagram is used to instruct the user to make a third posture. The third posture diagram includes a second limb. The second limb in the fourth segmentation result is the same as the second limb in the sixth segmentation result. The second limb is different;

The model training unit is further configured to train the third image segmentation lightweight model based on the sixth image to obtain the target image segmentation lightweight model, and the posture of the target object in the sixth image is the third image segmentation lightweight model. attitude.
The device according to any one of claims 13-15, characterized in that,

The determining unit is further configured to: when the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, determine based on the first segmentation result and the third segmentation result. The difference between the segmentation results determines the target area of the first image;

The electronic device determines the second pose map based on the target area.
The device according to claim 16, wherein the first segmentation result and the third segmentation result include pixel information of pixels in the third image;

The determining unit is specifically configured to determine the first target pixel in the third image based on the difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the third segmentation result. point, the first target pixel point is a pixel point in which the difference in pixel information between the first segmentation result and the third segmentation result is greater than a first threshold;

Determine the area in the third image where one or more limbs of the target object are located, and the one or more limbs include a third limb;

When the number of the first target pixel points in the area where the third limb is located in the third image is greater than the second threshold, it is determined that the area where the third limb is located in the third image is the area where the third limb is located. Describe the target area.
The device according to claim 17, characterized in that:

The determining unit is specifically configured to: determine the third limb of the target object contained in the target area;

A second posture map including the third limb is determined.
The device according to claim 18, characterized in that:

The determining unit is specifically configured to: determine multiple posture images including the third limb;

The second posture map is determined from the plurality of posture maps, and the second posture map is the posture in which the area where the third limb is located in the plurality of posture maps contains the most first target pixels. picture.
The device of claim 13, further comprising a replacement unit,

The replacement unit is configured to replace the location of the background content in the third image based on the third segmentation result when the first segmentation result and the third segmentation result do not meet the first preset condition. The third background content in the area is replaced with the preset background content to obtain the seventh image;

The display unit is further configured to display the seventh image, the first control and first prompt information, where the first prompt information is used to prompt training of the second image segmentation lightweight model.
The device according to claim 20, characterized in that:

The determining unit is further configured to detect an operation on the first control, and the electronic device determines a third threshold, where the third threshold is smaller than the first threshold;

The determination unit is further configured to determine based on the first segmentation result and the third segmentation result when the difference between the first segmentation result and the third segmentation result satisfies a second preset condition. A fourth posture map, the fourth posture map is used to instruct the user to make a fourth posture; the second preset condition is: the second target pixel point in the area where the fourth limb is located in the first image The number of pixels is greater than the second threshold; the second target pixel is a pixel whose difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the second segmentation result is greater than the third threshold. pixel;

The model training unit is further configured to train the second image segmentation lightweight model based on the eighth image to obtain the target image segmentation lightweight model, and the posture of the target object in the eighth image is the fourth image segmentation model. attitude.
The device according to claim 12, characterized in that:

The display unit is also configured to display second prompt information and a second control when the electronic device detects that the usage time of the first image segmentation lightweight model is longer than the first time length. The second prompt The information is used to prompt training of the first image segmentation lightweight model; the background replacement training operation is an operation acting on the second control.
A background replacement system, the system includes electronic equipment and a server; wherein,

The electronic device is used to obtain a first image and send the first image to the server;

The server is configured to receive the first image, input the first image into a first image segmentation lightweight model, and determine the area where the target object in the first image is located and the area where the first background content is located. ;

The server replaces the first background content in the first image with a second background content to obtain a second image, and sends the second image to the electronic device;

The electronic device is used to acquire the second image and display the second image;

The electronic device is configured to display a first gesture diagram in response to the user's background replacement training operation, and the first gesture diagram is used to instruct the user to make a first gesture;

The electronic device is configured to obtain a third image, and send the third image to the server, where the user's posture in the third image is the first posture;

The server is configured to obtain the third image, and train the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model;

The server is configured to input the first image into a target image segmentation lightweight model, and determine the area where the target object is located and the area where the first background content is located in the first image;

The server is configured to replace the first background content in the first image with the second background content, obtain a fourth image, and send the fourth image to the electronic device;

The electronic device receives the fourth image and displays the fourth image.
An electronic device, characterized in that the electronic device includes one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, and the one or a plurality of memories for storing computer program code, the computer program code including computer instructions, when the one or more processors execute the computer instructions, causing the electronic device to perform any one of claims 1-11 method described in the item.
A chip system, the chip system is applied to electronic equipment, the chip system includes one or more processors, the processor is used to call computer instructions to cause the electronic equipment to execute any one of claims 1-11 method described in the item.
A computer-readable storage medium includes instructions, characterized in that when the instructions are run on an electronic device, the electronic device is caused to perform the method according to any one of claims 1-11.