WO2023174063A1 - Background replacement method and electronic device - Google Patents

Background replacement method and electronic device Download PDF

Info

Publication number
WO2023174063A1
WO2023174063A1 PCT/CN2023/079248 CN2023079248W WO2023174063A1 WO 2023174063 A1 WO2023174063 A1 WO 2023174063A1 CN 2023079248 W CN2023079248 W CN 2023079248W WO 2023174063 A1 WO2023174063 A1 WO 2023174063A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
segmentation
segmentation result
electronic device
posture
Prior art date
Application number
PCT/CN2023/079248
Other languages
French (fr)
Chinese (zh)
Inventor
李炜
黄睿
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023174063A1 publication Critical patent/WO2023174063A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Definitions

  • the present application relates to the field of terminals, and in particular to background replacement methods and electronic devices.
  • background replacement is a technology that replaces the content of the background area contained in a video or image with specified background content.
  • the core step is image segmentation, that is, the input image is segmented into a target area and a background area through the image segmentation model.
  • the training of image segmentation models is to collect a large amount of historical image or video data of the target object in advance to build an image data set, and then use the image data set to train the image segmentation model to obtain an optimized image segmentation model for the target object. Therefore, the accuracy of image segmentation depends on the quality of the image dataset used in the training process of the image segmentation model, and the pre-collected image dataset cannot include all individual features. When the image dataset is of low quality, that is, it does not contain the target object.
  • the area containing these individual features in the target area of the image will be regarded as the background area, which will This results in poor image segmentation accuracy, which in turn causes replacement errors when the image background is replaced, affecting the user experience.
  • This application provides a background replacement method and electronic device. Implementing this method can improve the image segmentation accuracy of the background segmentation model, thereby improving the accuracy of background replacement and improving user experience.
  • embodiments of the present application provide a method for background replacement.
  • the method includes: an electronic device displays a second image obtained by performing a first background replacement on a first image; the first background replacement is based on lightweight segmentation of the first image.
  • the model is performed; the first image includes the area where the target object is located and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is the same as The first background content is different; in response to the user's background replacement training operation, the electronic device displays a first gesture image, and the first gesture image is used to instruct the user to make the first gesture; the electronic device acquires a third image, and in the third image The user's posture is the first posture; the electronic device displays the fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is based on the third It is obtained by training on three images; the fourth image is obtained by
  • the electronic device displays the first gesture image to instruct the user to make the first gesture, and then the electronic device obtains a third image including the user, and the user's gesture in the third image is The first gesture.
  • the electronic device uses the third image as training data for the first image segmentation lightweight model, and can obtain the latest individual characteristics of the user.
  • the target image segmentation lightweight model trained using the third image can segment the image more accurately.
  • the target image segmentation lightweight model can more accurately segment the target object and the background within the image. Allow.
  • the size of the area where the target object is located in the fourth image is closer to the size of the area where the target object is located in the first image than in the second image. That is, the accuracy of background replacement of the fourth image is higher than the accuracy of background replacement of the second image.
  • the segmentation accuracy of the target image segmentation lightweight model is higher than the segmentation accuracy of the first image segmentation lightweight model.
  • the electronic device uses an image segmentation lightweight model to segment the first image, and determines the area where the target object is located and the area where the background content is located in the first image.
  • users can train the image segmentation lightweight model through the background replacement training operation until the background replacement effect that satisfies the user is achieved.
  • users can start the training of the lightweight model for image segmentation through user operations to improve the segmentation accuracy of the lightweight model for image segmentation, thereby improving the accuracy of background replacement. degree and improve user experience.
  • the above method further includes: the electronic device inputs the third image to the full image segmentation model. , obtain the first segmentation result; the electronic device inputs the third image into the first image segmentation lightweight model to obtain the second segmentation result; the number of model parameters in the full image segmentation model is greater than the number of model parameters in the first image segmentation lightweight model Quantity; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image; the electronic device trains the first image segmentation based on the first segmentation result and the second segmentation result.
  • a lightweight model is used to obtain a second image segmentation lightweight model; the electronic device inputs the third image into the second image segmentation lightweight model to obtain a third segmentation result;
  • the electronic device determines a second posture map based on the first segmentation result and the third segmentation result.
  • the second posture map is used to instruct the user to make the second posture.
  • the posture map includes the first limb, and the area where the first limb is located in the first segmentation result is different from the area where the first limb is located in the third segmentation result;
  • the electronic device trains a second image segmentation lightweight model based on the fifth image to obtain a target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.
  • the image segmentation lightweight model is obtained by cropping and quantifying based on the image segmentation full model.
  • the number of model parameters in the image segmentation full model is greater than the number of model parameters in the image segmentation lightweight model.
  • the segmentation results of the full image segmentation model are more accurate than the segmentation results of the lightweight image segmentation model, but the calculation amount of the full image segmentation model is relatively large.
  • an image segmentation lightweight model with fewer model parameters is generally deployed on electronic devices for image segmentation.
  • the segmentation effect of the image segmentation lightweight model is not as good as the segmentation results of the image segmentation full model.
  • the image segmentation full model is used to guide the training of the image lightweight model to be trained, so that the performance of the image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the segmentation of the image segmentation lightweight model
  • the effect is close to that of the full image segmentation model. That is to say, the electronic device uses the full image segmentation model and the image segmentation lightweight model to segment the third image, respectively, to obtain the first segmentation result and the second segmentation result. Based on the error between the first segmentation result and the second segmentation result, the model parameters of the image segmentation lightweight model are adjusted so that the segmentation effect of the image lightweight model is closer to that of the full image segmentation model. In this way, the calculation amount of the electronic device can be reduced while ensuring the effect of image segmentation.
  • the electronic device uses the first segmentation result and the second segmentation result to train the second image segmentation lightweight model, it uses the third image to verify the trained second image segmentation lightweight model.
  • the electronic device determines the posture map based on the segmentation results, and instructs the user to make the posture shown in the posture map as training data to guide the next round of training.
  • the training data is actively screened and the quality of the training data is improved.
  • Using the filtered training data to train the model improves the training effect of the model, thereby improving the model's performance. type of segmentation accuracy.
  • the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, which specifically includes:
  • the electronic device acquires the fifth image; the electronic device inputs the fifth image into the full image segmentation model to obtain a fourth segmentation result; the electronic device inputs the fifth image into the second image segmentation lightweight model to obtain the fifth segmentation result;
  • the fourth segmentation result and the fifth segmentation result are used to indicate the area where the target object is located and the area where the fourth background content is located in the fifth image; the electronic device trains the third segmentation result based on the fourth segmentation result and the fifth segmentation result.
  • An image segmentation lightweight model is used to obtain a third image segmentation lightweight model; the electronic device inputs the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;
  • the third image segmentation lightweight model is the target image segmentation lightweight model.
  • the electronic device uses the fifth image to verify the second image segmentation lightweight model, and determines the segmentation results of the second image segmentation lightweight model and the segmentation results of the full image segmentation model.
  • the master satisfies the first preset condition. If the first preset condition is not met, the electronic device determines that the second image segmentation lightweight model is the target image segmentation lightweight model and stops training. That is to say, after a round of training, the electronic device will determine whether the segmentation results of the lightweight image segmentation model meet the requirements. If the requirements are met, the training will be stopped, thus avoiding unnecessary training.
  • the electronic device inputs the fifth image to the third image segmentation lightweight model, and after obtaining the sixth segmentation result, the method further includes: between the fourth segmentation result and the sixth segmentation result When the first preset condition is met, the electronic device determines a third posture map based on the fourth segmentation result and the sixth segmentation result.
  • the third posture map is used to instruct the user to make the third posture.
  • the third posture map includes the second posture map. Limb, the second limb in the fourth segmentation result is different from the second limb in the sixth segmentation result;
  • the electronic device trains a third image segmentation lightweight model based on the sixth image to obtain a target image segmentation lightweight model, and the posture of the target object in the sixth image is the third posture.
  • the third electronic device determines the third posture map, which is used to instruct the user to make the third posture.
  • the electronic device continuously filters the training data through each round of training until the segmentation results of the model meet the requirements. In this way, the quality of training data can be improved, the training effect of the model can be improved, the waste of training data can be avoided, and the training time caused by invalid training data can be avoided.
  • the electronic device determines the target area of the first image based on the difference between the first segmentation result and the third segmentation result; the electronic device determines the target area of the first image based on the target The region determines the second pose map.
  • the first segmentation result and the third segmentation result include pixel information of pixels in the third image
  • the electronic device determines the target area of the first image based on the difference between the first segmentation result and the third segmentation result, specifically including: : The electronic device determines the first target pixel point in the third image based on the difference between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the third segmentation result, and the first target pixel point is the first
  • the difference between the mid-pixel information between the segmentation result and the third segmentation result is a pixel point greater than the first threshold; the electronic device determines the area where one or more limbs of the target object in the third image are located, and the one or more limbs include the third limb. ;
  • the electronic device determines that the area where the third limb is located in the third image is the target area.
  • the limbs may include head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right Feet, left hip, left thigh, left calf, left foot and other limbs.
  • the electronic device determines the second posture map based on the target area, specifically including: the electronic device determines the third limb of the target object contained in the target area; the electronic device determines the third limb containing the third limb. Second pose diagram.
  • the electronic device can filter out posture images containing poorly segmented limbs, and in the next round of training, focus on training the poorly segmented areas, which can improve the training effect of the model.
  • the electronic device determines a second posture map including the third limb, including: the electronic device determines multiple posture maps including the third limb; the electronic device determines from the multiple posture maps A second posture image is generated, and the second posture image is the posture image in which the area where the third limb is located contains the most pixels of the first target among the plurality of posture images.
  • the posture map includes a contour map and a human body area map; before the electronic device detects the first user operation, the method further includes: the electronic device obtains a training data set, the training data set includes a plurality of images; the electronic device The image data set is input into the human body posture estimation model to obtain multiple human body posture vectors corresponding to the image data set; the electronic device inputs the multiple human body posture vectors corresponding to the image data set into the clustering model to obtain one or more representative postures vector; the electronic device inputs one or more representative posture vectors into the human body contour detection model, and obtains a contour map corresponding to one or more representative posture vectors; the electronic device inputs the image data set into the human body area detection model, and obtains the corresponding contour map of the image data set One or more limb region maps.
  • the electronic device divides the area where the background content in the third image is located based on the third segmentation result.
  • the third background content is replaced with the preset background content to obtain the seventh image;
  • the electronic device displays the seventh image, the first control and the first prompt information, and the first prompt information is used to prompt training of the second image segmentation lightweight model.
  • the method further includes: the electronic device detects an operation acting on the first control, and the electronic device determines the third Threshold, the third threshold is smaller than the first threshold;
  • the electronic device determines a fourth posture map based on the first segmentation result and the third segmentation result, and the fourth posture map is used to indicate the fourth posture map.
  • posture the second preset condition is: the number of second target pixels in the area where the fourth limb is located in the first image is greater than the second threshold; the second target pixel is the pixel information of the pixel in the first segmentation result and the second The difference in pixel information of the pixels in the second segmentation result is greater than the third threshold pixel; the electronic device trains the second image segmentation lightweight model based on the eighth image to obtain the target image segmentation lightweight model, and the target object in the eighth image The posture is the fourth posture.
  • the method in response to the user's background replacement training operation, before the electronic device displays the first posture map, the method further includes:
  • the electronic device When the electronic device detects that the usage time of the first image segmentation lightweight model is longer than the first time duration, the electronic device displays second prompt information and a second control.
  • the second prompt information is used to prompt training of the first image segmentation lightweight model.
  • quantity model; the background replacement training operation is an operation that acts on the second control.
  • embodiments of the present application provide a background replacement device, including various units for performing the background replacement method in the first aspect or any possible implementation of the first aspect.
  • embodiments of the present application provide an electronic device, which includes one or more processors and one or more memories; wherein one or more memories are coupled to one or more processors, and one or more The plurality of memories are used to store computer program codes.
  • the computer program codes include computer instructions. When one or more processors execute the computer instructions, the electronic device performs as described in the first aspect and any possible implementation manner of the first aspect. method.
  • embodiments of the present application provide a chip system, which is applied to an electronic device.
  • the chip system includes one or more processors, and the processor is used to call computer instructions to cause the electronic device to execute the first step. aspect and the method described in any possible implementation manner in the first aspect.
  • embodiments of the present application provide a computer-readable storage medium that includes instructions.
  • the electronic device When the instructions are run on an electronic device, the electronic device causes the electronic device to execute the first aspect and any possible implementation of the first aspect. described method.
  • the background replacement device provided by the second aspect the electronic device provided by the third aspect, the chip system provided by the fourth aspect, and the computer storage medium provided by the fifth aspect are all used to execute the method provided by the embodiments of the present application. . Therefore, the beneficial effects it can achieve can be referred to the beneficial effects in the corresponding methods, and will not be described again here.
  • 1A-1B are schematic user interface diagrams of video conferencing on electronic devices provided by embodiments of the present application.
  • Figure 2A is a flow chart of a background replacement method provided by an embodiment of the present application.
  • Figure 2B is a flow chart of the background replacement method provided by the embodiment of the present application.
  • FIGS 3A-3F are schematic diagrams of some user interfaces provided by embodiments of the present application.
  • Figure 5 is a schematic diagram of the training process of the image segmentation lightweight model provided by the embodiment of the present application.
  • Figure 9 is a flow chart for an electronic device to construct a representative posture library provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of the process of clustering electronic devices to obtain representative posture vectors according to an embodiment of the present application
  • Figure 13 is a schematic diagram of the limb region provided by the embodiment of the present application.
  • Figure 14 is a schematic diagram of the software structure of the electronic device 100 provided by the embodiment of the present application.
  • Figure 15 shows the cooperation relationship between various modules in the electronic device provided by the embodiment of the present application in the embodiment of the present application
  • Figure 16 is a schematic diagram of the background replacement system provided by the embodiment of the present application.
  • Figure 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 19 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
  • first and second are used for descriptive purposes only and shall not be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of this application, unless otherwise specified, “plurality” The meaning is two or more.
  • the electronic device obtains the video containing the user and displays the video image frame in real time.
  • the video image frame includes the user image and the environment background image around the user, so that during the video conference It may lead to the leakage of user privacy.
  • the user can choose to replace the environmental background image in the video image frame with a preset background image.
  • area 1111 is the background area
  • area 1112 is the foreground area, that is, the area where the target object is located.
  • the user can click on the background replacement control 112,
  • the electronic device may replace the background content in the background area and display the user interface 120 .
  • the electronic device displays user interface 120.
  • the user interface 120 may include the replaced video image frame 121 and the background replacement control 122.
  • area 1211 included in the video image frame 121 is the replaced background area
  • area 1212 is the foreground area. It can be seen that the hair bulges of the target object do not exist in the area 1211A and the area 1211B in the video image frame 121. That is to say, when the electronic device performs background replacement, some limbs or parts of some limbs of the target object It was replaced as a background area, resulting in a replacement error.
  • the image segmentation model is first used to segment the foreground area and background area in the video or image to obtain the target object and the initial background content, and then the initial background content of the background area in the video or image is replaced. Get the replaced image or video for the preset background content.
  • the image segmentation model mistakenly regards some body parts or parts of some body parts of the target object as the initial background content, that is, some body parts or parts of some body parts of the target object Segmented into background areas, resulting in inaccurate image segmentation.
  • some body parts or parts of some body parts of the target object are replaced as background content.
  • the target object of the image When the quality of the image data set during training is low, that is, it does not contain some individual features of the target object, when using the trained image segmentation model to segment the image containing these individual features of the target object, the target object of the image
  • the areas containing these individual features will be regarded as background areas, which will lead to poor image segmentation accuracy and affect the user experience.
  • the image segmentation model collect images of the target object with short hair. As time goes by, the hairstyle of the target object will change. At this time, when using the original image segmentation model for image segmentation, it is easy to segment the target object. The area where the target object's hairstyle has changed is mistakenly segmented into the background area, resulting in inaccurate segmentation and subsequent replacement errors, affecting the user experience.
  • embodiments of the present application provide a background replacement method.
  • the electronic device can first replace the acquired first image based on the first image segmentation lightweight model, and replace the first image in the first image.
  • the background content is replaced with the second background content to obtain the second image.
  • the electronic device displays the second image after background replacement for the first image on the display screen.
  • the user can choose to retrain the first image segmentation lightweight model, that is, the electronic device can respond to the user's background replacement training operation, and the electronic device can display the image on the display screen
  • a first gesture diagram is displayed, the first gesture diagram is used to indicate the first gesture, and the first gesture diagram is used to guide the user to record a video or image according to the first gesture included in the first gesture diagram.
  • the electronic device can obtain a third image containing the user, and the electronic device can retrain the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model.
  • the electronic device uses the target image segmentation lightweight model to segment the first image and obtains the segmentation result, Replace the first background content in the segmentation result with the second background content to obtain a fourth image. That is to say, the electronic device uses an image segmentation lightweight model to segment the first image, and determines the area where the target object is located and the area where the background content is located in the first image.
  • the image segmentation lightweight model is trained until the background replacement effect that satisfies the user is achieved.
  • the electronic device determines whether the error between segmentation result 1 and segmentation result 3 is the preset condition D2. If it is met, the image segmentation lightweight model M2 is the target image segmentation lightweight model, that is, the image segmentation lightweight model M2 is available. for subsequent image segmentation; if not satisfied, the electronic device determines the poorly segmented area in segmentation result 3 based on segmentation result 1 and segmentation result 3. The poorly segmented area can be used as the area that needs to be focused on training in the next round, and then Determine the pose map P2 that contains poorly segmented regions. The electronic device displays the posture diagram P2 on the display screen, and the posture diagram P2 is used to guide the user to shoot a video or image according to the posture G2 contained in the posture diagram P2.
  • the electronic device obtains the image T2. According to the above steps, the electronic device uses the image T2 which is the image segmentation lightweight model M2 to continue training until the segmentation results of the full image segmentation model and the segmentation results of the image lightweight model meet the preset condition D2.
  • the image segmentation full model is used to guide the training of the image lightweight model to be trained, so that the performance of the image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the segmentation of the image segmentation lightweight model The effect is close to that of the full image segmentation model. In this way, the calculation amount of the electronic device can be reduced while ensuring the effect of image segmentation.
  • the training of the lightweight model for image segmentation please refer to the following embodiments for details and will not be described again here.
  • segmentation result 1 and segmentation result 3 are used to indicate the area where the target object is located and the area where the background content is located in the image T1.
  • the area where the background content indicated in segmentation result 1 is located is different from the area where the background content indicated in segmentation result 3 is located.
  • the electronic device determines the poorly segmented area based on segmentation result 1 and segmentation result 3, that is, based on the segmentation result of the previous round of image segmentation lightweight model, it determines the poorly segmented area, and then filters out the areas containing poorly segmented images in the representative posture library.
  • the posture map of the best area is used as training data to guide the next round of training. In this way, the training data is actively screened and the quality of the training data is improved.
  • user images containing these poorly segmented areas are obtained and focused on training these poorly segmented areas, that is, personalized training is conducted based on the individual characteristics of the target object, which improves the training of the model.
  • the electronic device will determine whether the segmentation results of the image segmentation lightweight model meet the predetermined Set conditions and stop training when the preset conditions are met, thus avoiding unnecessary training.
  • the above-mentioned posture image P1 can also be called the first posture image
  • the posture G1 can also be called the first posture
  • the image T1 can also be called the third image
  • the image segmentation lightweight model 1 can also be called It can be called the first image segmentation lightweight model
  • segmentation result 1 can also be called the first segmentation result
  • segmentation result 2 can also be called the second segmentation result
  • segmentation result 3 can also be called the third segmentation result.
  • the image segmentation lightweight model 2 can be called the second image segmentation lightweight model.
  • Figure 2A shows a flow chart of a background replacement method according to an embodiment of the present application. As shown in Figure 2A, the method includes steps S101 to S104.
  • the image T1 may be uploaded by the user. For example, if the user needs to replace the background of a certain picture, the user may upload the picture whose background needs to be replaced on the electronic device. It may also be an image captured by an electronic device or an image in a video frame. For example, when a user is using video conferencing software for a video conference, or when the user is video chatting with other people, the electronic device can capture the user's image.
  • the first image includes an area where the target object is located and an area where the first background is located.
  • the electronic device can determine the area where the target object in the first image is located and the area where the first background content is located based on the first image segmentation lightweight model, and the electronic device determines the first background of the area where the first background content is located in the first image.
  • the content is replaced with the second background content to obtain the second image.
  • the first background content and the second background content are different, and the second background content may be background content preset by the user, or may be default background content set by the electronic device at the factory.
  • the electronic device displays the first posture image.
  • the first gesture diagram is used to instruct the user to make the first gesture.
  • the background replacement training operation may be an operation acting on the background replacement training control. For example, after the electronic device displays the second image, the background replacement training control is displayed on the display screen. When the user is dissatisfied with the replacement effect of the second image obtained by replacing the first background based on the first image segmentation lightweight model, he can click the replacement training control to start the first image segmentation lightweight model training.
  • the background replacement training operation can also be a voice command or a button pressing operation. This application does not limit this.
  • the electronic device in response to the user's background replacement training operation, in the video conference scenario, it can also be interpreted as clicking the background replacement training control, and then ending the series of operations of the video conference. For example, after the electronic device detects that the user has acted on the background replacement training control, the electronic device can display the first pose image after the video conference ends, and then start training the image segmentation lightweight model.
  • the electronic device acquires a third image, where the user's posture in the third image is the first posture.
  • the third image may be an image taken by the user according to the first posture indicated in the first posture diagram. It can be understood that the third image can also be an image in the video frame.
  • the target object records the video according to the first posture 1 indicated in the first posture diagram.
  • the electronic device obtains the video and intercepts a frame of image in the video. as the third image.
  • the third image may be an image set including multiple images.
  • the electronic device can capture multiple images at a time, intercept multiple frames of images from the recorded video, and one of the multiple frames of images can be used as a third image.
  • the electronic device displays the first gesture image to instruct the user to make the first gesture, and then the electronic device obtains a third image including the user, and the user's gesture in the third image is The first gesture.
  • the electronic device uses the third image as training data for the first image segmentation lightweight model, and can obtain the latest individual characteristics of the user.
  • the target image segmentation lightweight model trained using the third image can segment the image more accurately.
  • the target image segmentation lightweight model can more accurately segment the target object and background content from the image.
  • the size of the area where the target object is located in the fourth image is closer to the size of the area where the target object is located in the first image than in the second image. That is, the accuracy of background replacement of the fourth image is higher than the accuracy of background replacement of the second image.
  • the specific training process for the electronic device to obtain the lightweight model of the target image based on the third image training may refer to the description in the embodiment in FIG. 2B and will not be described again here.
  • the electronic device Based on the first operation, the electronic device displays the posture map P1, and the posture map P1 indicates the posture G1.
  • the electronic device detects that the usage time of the image segmentation lightweight model M1 exceeds the preset time, or the electronic device periodically detects the segmentation effect of the image segmentation lightweight model M1, and when it is detected that the segmentation effect of the image segmentation lightweight model M1 meets the preset conditions D2.
  • a preset time period can be configured in the electronic device, and the preset time period can be one month, one year, etc.
  • the embodiments of this application do not limit the specific value of the preset duration. That is to say, when the electronic device detects that the image segmentation lightweight model M1 has been used for longer than one month, the electronic device can restart the training of the image segmentation lightweight model M1.
  • the user interface A may be user interface 210 as shown in FIG. 3A.
  • the user interface 210 may include: a status bar 211 , a calendar indicator 212 , a weather indicator 213 , and a prompt box 214 .
  • the user interface A may be user interface 220 as shown in Figure 3B. As shown in FIG. 3B , the user interface 220 includes a prompt box 221 .
  • the prompt box 221 please refer to the relevant description of the above-mentioned prompt box 214, which will not be described again here.
  • the electronic device can also be interpreted as clicking the background replacement training control, and then ending the video conference series of operations. For example, after the electronic device detects that the user has acted on the background replacement training control, the electronic device can display the first pose image after the video conference ends, and then start training the image segmentation lightweight model.
  • FIG. 3D exemplarily shows a user interface 240 of the electronic device displaying the gesture map P1.
  • the user interface 240 includes: recording guidance box 241 , prompt information 242 , confirmation control 243 , and return control 244 .
  • the recording guidance frame 241 is used to display the posture picture P1
  • the posture picture P1 is used to indicate posture 1
  • posture 1 is the posture of arranging the earphones with both hands.
  • the prompt information 242 is used to prompt the target object to complete the action corresponding to gesture 1.
  • the prompt information 242 may be "action requirement: arrange the earphones with both hands" or "please complete the specified action in the white area in the recording guidance box".
  • the return control 244 is used to exit the current user interface 240 and return to the upper level user interface, such as the user interface 220.
  • the determination control 243 is used to obtain photos or videos taken by the electronic device. When the electronic device detects the touch operation of the determination control 243, in response to the touch operation, the electronic device displays the user interface 250.
  • the electronic device can display the recording guidance box 241 and the recording effect preview box 251 on the same user interface.
  • the electronic device detects a touch operation acting on the determination control 214b, in response to the touch operation, the electronic device displays the user interface 260.
  • the user interface 260 includes: a recording guidance box 261, Recording effect preview box 262, prompt information 263, return control 264, and confirmation control 265. in:
  • the return control 264 is used to return to the previous level user interface, and the determination control 265 is used to obtain photos or videos taken by the electronic device.
  • the electronic device acquires the image T1.
  • the posture of the target object in the image T1 is the posture G1 indicated in the posture map P1.
  • the image T1 is an image of the target object photographed according to the posture G1 in the posture diagram P1, and the posture of the target object included in the image T1 is the posture G1 indicated in the posture diagram P1.
  • the image T1 may be the image in the recording effect preview box 251 in the above-mentioned embodiment of FIG. 3F.
  • the image T1 can also be an image in a video frame.
  • the target object records a video according to the posture G1 indicated in the posture diagram P1.
  • the electronic device obtains the video and intercepts a frame of image in the video as the image T1.
  • the image T1 may be an image set including multiple images.
  • the electronic device can capture multiple images at one time, intercept multiple frames of images from the recorded video, and one frame of the multiple frames of images can be used as the image T1.
  • the electronic device inputs the image T1 into the full image segmentation model to obtain segmentation result 1, and inputs the image T1 into the image segmentation lightweight model M1 to obtain segmentation result 2.
  • the full image model is a pre-trained machine learning model with high image segmentation accuracy.
  • the full image segmentation model is a model obtained after training and convergence based on the initial full image segmentation model.
  • the image segmentation lightweight model M1 is a model trained based on the initial image segmentation lightweight model. It can be understood that the full image segmentation model and the initial image segmentation lightweight model can be pre-trained on the electronic device, or can be pre-configured by the electronic device, which is not limited in this application.
  • the initial image segmentation lightweight model is obtained by cropping and quantizing the initial image segmentation full model, and the number of model parameters in the initial image segmentation full model is greater than the number of model parameters in the initial image segmentation lightweight model.
  • the image segmentation full model is used to guide the training of the image lightweight model to be trained to obtain the target image segmentation lightweight model, so that the performance of the target image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the target image segmentation lightweight model is The segmentation effect of the quantitative model is close to or consistent with the segmentation effect of the full-quantity model for image segmentation.
  • the segmentation result is used to indicate the area where the target object is located and the area where the background content is located in the segmented image
  • the segmentation result may include pixel information of the pixels in the segmented image. That is to say, the segmentation result 1 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the target object is located and the area where the background content is located indicated in the segmentation result 1 is after segmentation by the full image segmentation model. , the segmentation result 1 includes the pixel information 1 of the pixels in the image T1.
  • Segmentation result 2 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the target object is located and the area where the background content is located indicated in the segmentation result 2 are segmented by the image segmentation lightweight model 1.
  • the segmentation result 2 includes pixel information 2 of the pixels in the image T1.
  • the pixel information of a pixel may be a probability value that the pixel is a foreground pixel.
  • the segmentation result may be a predicted foreground probability result of the image T1.
  • the predicted foreground probability result includes a probability value when each pixel in the image T1 is a foreground pixel, where the probability value is a real number between 0 and 1. For example, when the foreground area in the segmentation result The probability value of a pixel in the domain being a foreground pixel is 1, and the probability value of a pixel in the background area being a foreground pixel is 0.
  • the pixel information of the pixel point may also be the pixel value of the pixel point, for example, RGB value, grayscale value, etc.
  • the segmentation result can be a binary image corresponding to the image T1, used to distinguish the foreground area and the background area, where the pixel value of the pixel in the foreground area is 255 and the pixel value of the pixel in the background area is 0.
  • the pixel value of the pixels in the foreground area may be 0, and the pixel value of the pixels in the background area may be 255.
  • Figure 4 exemplarily shows the segmentation result, in which the black part is the background area, the pixel value of the pixels it contains is 255, and the white part is the foreground area, that is, where the target object is located. Area containing pixels with a pixel value of 0.
  • the pixel information of the pixel can also be the foreground label of the pixel, and the foreground label can be a numerical value, for example, a numerical value of 1 or a numerical value of 0.
  • the foreground label can be a numerical value, for example, a numerical value of 1 or a numerical value of 0.
  • a foreground label of 1 is added to the pixel; when the pixel is a background pixel, a label of 0 is added to the pixel. wait.
  • the electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2, and obtains the image segmentation lightweight model M2.
  • the electronic device calculates the error value between the segmentation result 1 and the segmentation result 2, and uses the error value to train the image segmentation lightweight model 1 to adjust the model parameters of the image segmentation lightweight model 1 to obtain the image Split lightweight model 2.
  • Figure 5 exemplarily shows the training process of the image segmentation lightweight model.
  • the image T1 is segmented and input into the image segmentation full model and the image segmentation lightweight model M1, and segmentation results 1 and 2 are obtained. Then the error between segmentation result 1 and segmentation result 2 is determined, and the error is used to correct the model parameters of the image segmentation lightweight model M1 to obtain the trained and corrected lightweight model, that is, the image segmentation lightweight model M2.
  • the image segmentation full model and the image segmentation lightweight model may be a deep neural network model, a convolutional neural network model, etc., which are not limited in the embodiments of the present application.
  • the full image segmentation model can be a deep neural network model A1
  • the image segmentation lightweight model can be a deep neural network model A2 obtained by cropping the deep neural network model A1.
  • the model parameters of the deep neural network model A2 are smaller than the model parameters of the deep neural network model A1.
  • the electronic device inputs the image T1 into the image segmentation lightweight model M2, and outputs the segmentation result 3.
  • the electronic device tests the image segmentation lightweight model M2. That is, the electronic device inputs the image T1 to the image segmentation lightweight model M2 and obtains the output result 3.
  • Segmentation result 3 is used to indicate the area where the target object is located and the area where the background content is located in the image T1.
  • the segmentation result 3 indicating the area where the target object is located and the area where the background content is located in the image T1
  • the relevant descriptions in Result 2 will not be repeated here.
  • step S206 The electronic device determines whether segmentation result 1 and segmentation result 3 satisfy the preset condition D2. If not, step S207 is executed; if yes, step S209 is executed.
  • the preset condition D2 is that there is a target area in the segmentation result 3, that is, a poorly segmented area. That is to say, with segmentation result 1 as the label, the electronic device first calculates the error between segmentation result 1 and segmentation result 3. When segmentation result 3 has a poorly segmented area relative to segmentation result 1, it determines segmentation result 1 and segmentation result 3. The difference between results 3 meets the preset conditions.
  • the electronic device first calculates the difference in pixel information between segmentation result 1 and segmentation result 3, that is, the electronic device Calculate the difference between the pixel information of the same pixel in segmentation result 1 and segmentation result 3, determine the pixel whose pixel information difference is greater than the first threshold as the first target pixel, and then compare the first target pixel with image T2 Match the limb area of the target object, and determine whether the number of first target pixel points in the limb area of the target object is greater than the second threshold.
  • the limb area is the target area, that is, a poorly segmented area, and the electronic device determines that segmentation result 1 and segmentation result 3 satisfy the preset condition D2.
  • one or more limbs included in the target area may be called a third limb.
  • the electronic device can capture multiple images or videos at one time, input the multiple images into the full image segmentation model and the lightweight image segmentation model, and obtain multiple segmentation results of the full image segmentation model and the image segmentation results.
  • Multiple segmentation results of the lightweight model the electronic device determines whether the multiple differences between the multiple segmentation results of the full image segmentation model and the multiple segmentation results of the image segmentation lightweight model satisfy the preset condition D2, among the multiple differences When the number of differences satisfying the preset condition D2 is greater than the preset number threshold, the segmentation result is considered to satisfy the preset condition D3.
  • the above-mentioned preset condition D2 may also be called the first preset condition.
  • the electronic device determines the first target pixel, which is the poorly segmented pixel in segmentation result 3.
  • the probability value that pixel point i in segmentation result 1 is the foreground is Y1i
  • the probability value that pixel point i in segmentation result 3 is the foreground is Z1i.
  • pixel i is a poorly segmented pixel, that is, the first target pixel.
  • the electronic device inputs the image T1 into the limb area detection model to obtain a limb area map corresponding to the target object in the image T1.
  • the limb area map includes the area where one or more limbs of the target object in the image T1 are located.
  • the limb area detection model may also be called the human body area detection model.
  • the limb area refers to the area where the limb is located.
  • the limb may include the head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left Upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right foot, left hip, left thigh, left calf, left foot and other limbs.
  • the electronic device matches the segmentation result 3 with the limb area map, and determines the number of first target pixels in the area where multiple limbs of the target object are located.
  • the electronic device matches the pixel points in the segmentation result 3 with the areas where the multiple limbs of the target object are located in the image T1, and obtains the number of corresponding first target pixel points in the area where the multiple limbs of the target object are located.
  • the electronic device determines the target area based on the number of first target pixels in the area where the multiple limbs of the target object are located.
  • the area where the limb is located is the target area.
  • the preset value may be 1000.
  • the area where the left hand is located is considered to be the target area, that is, the area where the left hand is located is a poorly segmented area.
  • S207 The electronic device replaces the initial background content in the image T1 with the preset background content to obtain the replaced image T2.
  • the electronic device determines that the image segmentation lightweight model M2 is the target image segmentation lightweight model, and the target image segmentation lightweight model is used for subsequent image segmentation. Model. Based on the segmentation result 3, the electronic device replaces the initial background content in the image T1 with the preset background content to obtain the replaced image T2.
  • the preset background content can be the background content preset by the target object.
  • the target object can choose the image background content he likes as the preset background content.
  • the preset background content can also be the default background content set by the factory of the electronic device.
  • image T2 may be called the seventh image.
  • the initial background content in image T1 may be called third background content.
  • the electronic device can display a user interface 270 , which includes: a replacement effect box 271 , prompt information 272 , a return control 273 , and a confirmation control 274 .
  • the replacement effect box 271 is used to display the replaced image
  • the prompt information 272 is used to prompt the target object whether he is satisfied with the segmentation effect of the image segmentation lightweight model 2
  • the return control 273 is used
  • the determination control 274 is used to determine the image segmentation lightweight model.
  • 2 is a lightweight model for target image segmentation.
  • the electronic device ends the training of the image segmentation lightweight model M2, and the image segmentation lightweight model 2 is the target image segmentation lightweight model, that is, Image segmentation lightweight model is used for subsequent image segmentation. In this way, redundant training can be avoided.
  • the target subject can click return control 273, and the electronic device continues the next round of training.
  • the electronic device when the electronic device detects a touch operation on the return control 273, in response to the touch operation, the electronic device modifies the first threshold to a third threshold. Wherein, the third threshold is smaller than the first threshold. Then, the electronic device determines whether the segmentation result 1 and the segmentation result 3 satisfy the second preset condition. If the difference between the segmentation result 1 and the segmentation result 3 does not satisfy the second preset condition, the electronic device determines whether the segmentation result 1 and the segmentation result 3 satisfy the second preset condition. Result 3 determines the pose map P, and uses this pose map P to continue training the image segmentation lightweight model.
  • the electronic device determine the posture image P based on the segmentation result 1 and the segmentation result 3
  • the second preset condition is that the number of the target pixel points in the area where the third limb is located in the first image is greater than the second threshold; the target pixel point is the sum of the pixel information of the pixel points in the first segmentation result. The difference in pixel information of the pixels in the second segmentation result is greater than the third threshold.
  • the electronic device calculates the pixel information of the pixels in the segmentation result 1 and the segmentation result 3, and the electronic device determines that the difference between the pixel information of the pixels in the segmentation result 1 and the pixel information in the segmentation result 3 is greater than the third Threshold pixels, these pixels are poorly segmented pixels.
  • the electronic device matches these poorly segmented pixels with the limb area of the target object in the image T2, and determines whether the number of poorly segmented pixels in the limb area of the target object is greater than the second threshold.
  • the electronic device determines that segmentation results 1 and 3 satisfy the preset condition D3.
  • the electronic device determines the posture map P2 based on the segmentation result 1 and the segmentation result 3.
  • the electronic device compares the segmentation result 1 and the segmentation result 3 to determine the poorly segmented area in the segmentation result 3.
  • the electronic device selects the segmented image from the representative posture library based on the poorly segmented area in the segmentation result 3. bad area
  • the electronic device can be configured with a representative posture library, and the representative posture library includes multiple posture images.
  • the representative posture library may be pre-constructed by the electronic device. For details on the construction of the representative posture library, please refer to the subsequent description in the embodiment of FIG. 9 , which will not be described again here.
  • the process of the electronic device determining the posture map P2 based on the segmentation result 1 and the segmentation result 3 may specifically include:
  • the electronic device determines the target area based on segmentation result 1 and segmentation result 3.
  • step S106 For the relevant operations of the electronic device to determine the target area based on the segmentation result 1 and the segmentation result 3, please refer to the related operations of determining the target area based on the segmentation result 1 and the segmentation result 3 in step S106, which will not be described again here.
  • the electronic device determines the posture map P2 based on the target area.
  • the target area corresponds to the limbs of the target object, and the target area may correspond to one or more limbs of the target object.
  • the electronic device When there is one limb containing the target object in the target area, the electronic device first determines one or more posture diagrams containing the limb from the representative posture library. When there is only one pose graph containing the limb, the pose graph is pose graph P2. When there are multiple pose images containing the limb, the electronic device randomly selects one pose image from the multiple pose images as the pose image P2, or determines the one with the largest area where the limb is located among the multiple pose images.
  • the posture diagram is posture diagram P2. In some embodiments, the limb included in the pose diagram P2 may be called the first limb.
  • the electronic device When the target area contains multiple limbs of the target object, the electronic device first determines one or more posture diagrams containing the multiple limbs from the representative posture library. Similarly, when there is one pose graph including the multiple limbs, the pose graph is the pose graph P2. When there are multiple posture diagrams containing the multiple limbs, the electronic device randomly selects one posture diagram from the multiple posture diagrams as the posture diagram P2, or the electronic device determines the location of the multiple limbs in the multiple posture diagrams. The pose map with the largest area is pose map P2. Specifically, the electronic device calculates the number of target pixels corresponding to the areas where the multiple limbs are located in multiple posture images, wherein the attitude image has the largest number of corresponding target pixels in the area where the multiple limbs are located. As pose diagram P2. In some embodiments, the multiple limbs included in the posture diagram P2 may be called first limbs.
  • the posture graph P2 may be called the second posture graph.
  • the electronic device displays the attitude map P2, and the attitude map P2 indicates the attitude G2.
  • the electronic device displays the gesture image P2 on the display screen, and the gesture image P2 is used to instruct the user to make the gesture G2.
  • the posture diagram P2 may be the posture diagram shown in FIG. 3A above.
  • the posture G2 may be the posture of arranging the earphones with both hands in the embodiment of FIG. 2B. It can be understood that the above posture G2 is only an example. In practical applications, the posture G2 can also be other postures, such as raising hands, holding the head, etc.
  • the specific posture form is not limited in this application.
  • posture G2 may also be called the second posture.
  • the electronic device acquires the image T3.
  • the posture of the target object in the image T3 is the posture G2 indicated in the posture diagram P2.
  • the image T3 is an image taken by the target object in the posture indicated in the posture diagram P2 or a frame of the video recorded.
  • Image T3 contains the target object.
  • image T3 may be called the fifth image.
  • the electronic device trains the image segmentation lightweight model M2 based on the image T3 and the full image segmentation model until the end condition of the model training is met and the target image segmentation lightweight model is obtained.
  • each round of iterative training is performed by adjusting Through the model parameters of the initial image segmentation lightweight model in this round, the model gradually converges, and the target image segmentation lightweight model has been obtained.
  • the end condition of model training can be that the number of iterative training of the image segmentation lightweight model reaches the preset number of iterations, or it can be that the image segmentation processing performance index of the image segmentation lightweight model after adjusting parameters reaches the preset index.
  • the preset index may be that the segmentation result of the image segmentation lightweight model and the segmentation result of the image segmentation full model satisfy the preset condition D2.
  • the electronic device inputs the image T3 into the full image segmentation model and the image segmentation lightweight model 2 respectively to obtain segmentation results 4 and 5, and the electronic device uses the segmentation results 4 and 5 to train the image segmentation lightweight model. M2, and obtain the image segmentation lightweight model M3. Then the electronic device inputs the image T3 into the image segmentation lightweight model M3 to obtain the segmentation result 6. The electronic device determines whether the segmentation results 4 and 6 satisfy the preset condition D2. When the segmentation result 4 and the segmentation result 6 satisfy the preset condition D2, In this case, the image segmentation lightweight model M3 is the target image segmentation lightweight model. The electronic device may replace the initial background content in the area where the background content is located in the image T3 with the preset background content based on the segmentation result 6 .
  • the electronic device determines the poorly segmented area based on the segmentation result 4 and the segmentation result 6, and then determines the poorly segmented area based on the segmentation result 4 and the segmentation result 6.
  • the region determines the posture map P3 from the representative posture library, and the posture map P3 is used to instruct the user to make the posture G3.
  • the relevant operations for the electronic device to determine the posture map P3 based on the segmentation result 4 and the segmentation result 6 refer to the relevant operations in the above-mentioned step S109, which will not be described again here.
  • the electronic device After determining the posture map P3, the electronic device can display the posture map P3.
  • the electronic device display posture diagram P3 please refer to the relevant description in the above-mentioned embodiment of FIGS. 3D to 3F, and will not be described again here.
  • the electronic device acquires an image T4, where the image T4 is an image taken by the user according to the posture G3 in the posture diagram P3 or an image in a video frame, and the posture of the target object in the image T4 is the posture G3 indicated in the posture diagram P3.
  • the electronic device can train the image segmentation lightweight model M3 based on the image T4 to obtain the image segmentation lightweight model M4.
  • the electronic device uses image T4 to test the segmentation effect of the image segmentation lightweight model M4.
  • the image segmentation lightweight model M4 is the target image segmentation lightweight model. Model. If the segmentation result of the image segmentation lightweight model M4 does not meet the preset condition D2, the electronic device re-determines the pose map based on the segmentation result, obtains the image to train the image segmentation lightweight model M4, until the segmentation result of the image segmentation lightweight model M4 meets the preset condition. Assume condition D2.
  • segmentation result 4 may be called the fourth segmentation result
  • segmentation result 5 may also be called the fifth segmentation result
  • the image segmentation lightweight model M3 may be called the third image segmentation lightweight model.
  • Segmentation result 6 may also be called the sixth segmentation result.
  • the pose graph P3 may be called the third pose graph
  • the pose G3 may be called the third pose.
  • Image T4 may also be called the sixth image.
  • the electronic device obtains the original image T5, inputs the original image T5 into the target image segmentation lightweight model, and determines the area where the target object is located and the original background content in the original image.
  • the original image may be an image or a frame of a video uploaded by the target object, or it may be a frame of an image or video captured by the electronic device including the target object.
  • the original image T5 may also be called the first image
  • the original background content may also be called the first background content
  • the electronic device separates the area where the original background content is located and the area where the target object is located in the original image T5, it synthesizes the area where the target object is located and the preset background into a new image, that is, the replaced image T6.
  • the background content in the replaced image is different from the original background content.
  • the preset background can be set by the target object itself, or it can be the default setting of the electronic device, for example, it can be a landscape image, etc.
  • image T6 may also be called the fourth image.
  • FIG. 8 shows a schematic diagram in which an electronic device user uses the image segmentation lightweight model M1 and the target image segmentation lightweight model to segment the original image, and then performs replacement to obtain a replaced image.
  • the electronic device divides the foreground area and the background area in the video image frame 811, and can separate the foreground area and the background area.
  • the video image frame 811 is segmented through the image segmentation lightweight model M1, and the segmentation result 7 can be obtained.
  • the segmentation result 7 shown in (b) in Figure 8 is The foreground area and background area are distinguished by different colors.
  • the white area represents the foreground area segmented by the image segmentation lightweight model M1
  • the black area represents the background area segmented by the image segmentation lightweight model M1.
  • the image segmentation lightweight model M1 mistakenly regards the edge area of the target object as the background area, that is, the hair bulge in area 8111B and area 8111A in the figure. , as the background content in area 811.
  • the replaced video image frame 821 is obtained, with the hair in the area 8111B and the area 8111A bulging.
  • the portion is replaced as background content, and there is no hair protruding portion in areas 8211A and 8211B in the replaced video image frame 821.
  • step S101 the electronic device can also build a representative posture library.
  • the electronic device may construct a representative gesture library including the following steps:
  • the electronic device obtains the image data set.
  • the image data set may be a large number of pre-collected images of the target object, or may be image frames contained in pre-collected video data of the target object.
  • the embodiments of the present application do not limit this.
  • the preset image set may be crawled from a public website or obtained from a large public image database.
  • the image data set contains the user's posture features and contour features.
  • Posture features refer to the user's action behaviors, such as turning the head, turning the body, standing up and sitting down, etc.
  • the outline feature of a user refers to the lines that make up the outer edge of the user.
  • the electronic device inputs the image data set into the human posture estimation model and obtains multiple human posture vectors corresponding to the image data set.
  • the human posture estimation model can identify the key points of the human body's bones in the image, as well as the limb vectors composed of the key nodes of the bones.
  • the skeleton of the human body is mainly used to represent the skeletal information of the human body and can be used to describe the posture of the human body.
  • the number and type of skeletal key points are determined by the human posture estimation model, and different human posture estimation models output different numbers and types of skeletal key points.
  • the skeletal key points of the human body are divided into 15 skeletal key points as an example for illustrative explanation. In practical applications, the skeletal key points of the human body can also be divided into 9, 17, etc. There are no restrictions on this application.
  • 15 skeletal key points can be connected to form 14 limb vectors, and the limb vectors can be calculated from the coordinate positions of the above 15 skeletal key points.
  • FIG. 10 illustrates a set of skeletal key point data, and only some skeletal key points and some limb vectors are shown in FIG. 10 .
  • the circular point in the figure is a bone key point.
  • Each bone key point is represented by coordinates (X, Y).
  • the adjacent key points are connected to form a limb vector, a pair of target objects in the image.
  • a limb vector can be called a posture vector.
  • the coordinates of bone key point 3 are (X3, Y3)
  • the coordinates of bone key point 4 are (X4, Y4).
  • Bone key point 3 and bone key point 4 can be connected to form a limb vector (X3-X4, Y3- Y4), this limb vector represents a limb, which can be called the left shoulder.
  • the electronic device inputs multiple human posture vectors corresponding to the image data set into the clustering model to obtain one or more representative posture vectors.
  • multiple limb vectors can be obtained from an image, and multiple limb vectors of the image can form a posture vector. If the image data set includes multiple images, multiple posture vectors can be obtained.
  • the electronic device maps the multiple posture vectors to a vector space, where a posture vector is a point in the vector space, and then calculates the similarity between each pixel point.
  • the posture vectors with high similarity are gathered together to form a cluster.
  • the vector out of the center of this cluster i.e., the cluster center
  • FIG. 11 illustrates a schematic diagram of the process of clustering electronic devices to obtain representative posture vectors.
  • 4 clusters are exemplarily shown.
  • a circular point in each cluster represents a posture vector, that is, it represents the posture of a human body.
  • it can be a posture such as holding headphones with both hands or holding the headset with one hand.
  • the black five-pointed star in the cluster represents the cluster center point of the cluster, that is, the cluster center.
  • the cluster center vector of each cluster is selected as the representative attitude vector.
  • the posture represented by the cluster center vector of cluster 1 is the posture of holding the headset with one hand; as shown in (c) of Figure 11, the posture represented by the cluster center vector of cluster 2 is the posture of holding the headset with both hands. posture.
  • the electronic device inputs one or more representative posture vectors into the human body contour detection model, and obtains a contour map corresponding to one or more representative posture vectors.
  • FIG. 12 exemplarily shows a schematic diagram of obtaining a contour diagram representing a posture vector.
  • Figure 12 (a) and (c) show two representative posture vectors, which represent the posture of holding the earphones with one hand and the posture of holding the earphones with both hands respectively.
  • Figure 12 (b) and (d) show the contour images obtained based on two representative posture vectors.
  • the electronic device inputs the training data set into the limb region detection model to obtain one or more limb region maps corresponding to the image data set.
  • FIG. 13 exemplarily shows a schematic diagram of a limb region.
  • different color areas in the figure represent different limb areas. For example, dark gray represents the area where the head is located, light gray represents the area where the left hand is located, etc.
  • the limbs may include head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right foot, left hip , left thigh, left calf, left foot.
  • the electronic device converts one or more limb region maps and one or more representative posture vectors corresponding to the image data set The corresponding contour images are matched to obtain one or more pose images.
  • the representative posture library may include one or more posture images corresponding to the image data set.
  • the electronic device involved in the above embodiments may be called an electronic device 100, and the electronic device 100 may include a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, and a notebook computer. , ultra-mobile personal computer (UMPC), netbook, cellular phone, personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device , at least one of artificial intelligence (AI) devices, wearable devices, vehicle-mounted devices, smart home devices, or smart city devices.
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • AI artificial intelligence
  • wearable devices wearable devices
  • smart home devices smart home devices
  • smart city devices smart city devices
  • FIG. 14 shows a schematic software structure diagram of the electronic device 100 provided by the embodiment of the present application.
  • the software structure of the electronic device 100 may include: a background replacement engine, a background replacement client, a collection module, a display module, and a representative posture library. in:
  • the interactive display module is used to receive user operations, display posture images, and display images before and after background replacement.
  • the interactive display module may receive the first operation and display the gesture diagram P1 based on the first operation.
  • the user interface in the above-mentioned FIGS. 3A-3F may be displayed, etc.
  • the interactive display module is used to display the posture map P1.
  • the interactive display module may display a first image, a second image with background replacement based on the first image segmentation lightweight model, or a fourth image with background replacement based on the target image segmentation lightweight model.
  • the image T1 in step S201 can be obtained, where the image T1 is an image taken by the target object according to the posture G1 in the posture diagram P1, or a frame image in the video taken by the target object according to the posture G1 in the posture diagram P1.
  • the background replacement client is used to obtain the posture image P1 from the representative posture library according to the preset configuration after receiving the first operation, and send it to the interactive display module for display.
  • the image including the target object used by the acquisition module and send it to the background replacement engine.
  • the image including the target object may be the image T1 in step S202 described above, the image T2 in step S211, etc.
  • attitude map information is sent to the representative attitude database to obtain the corresponding attitude map, which is sent to the display module for display.
  • the background replacement engine is used to obtain the image of the target object from the background replacement client, and use the image of the target object and the preset image segmentation full model to train the image segmentation lightweight model. Specifically, first, the background replacement engine receives the image T1 of the target object sent by the background replacement client. The background replacement engine uses the image T1 and the preset full image segmentation model to train the image segmentation lightweight model M1 to obtain the image segmentation lightweight model. M2. Then, use the preset image segmentation full model and image segmentation lightweight model M2 to segment the image T1 of the target object, obtain segmentation result 1 and segmentation result 3, and then determine whether the difference between segmentation result 1 and segmentation result 3 meets the predetermined Assume condition D2.
  • the image segmentation lightweight model M2 is stored for subsequent image segmentation; if the preset condition D2 is not met, segmentation result 1 and segmentation result 3 are sent to the background replacement customer. client for background replacement to determine the next round of training needs The gesture displayed in the user interface.
  • the electronic device 100 includes: a background replacement engine, a background replacement client, a collection module, a display module, and a representative posture library.
  • a background replacement engine taking two rounds of image segmentation lightweight model training to obtain the target image segmentation lightweight model as an example, the following is a detailed description:
  • the interactive display module detects the first operation.
  • the first operation may be a touch operation acting on the determination control 214b in FIG. 3A.
  • the interactive display module sends background replacement instructions to the background replacement client.
  • the background replacement client responds to the background replacement instruction sent by the interactive display module and sends an instruction requesting to obtain the posture image to the representative posture library.
  • the representative posture library responds to the request of the background replacement client to obtain instructions for the posture map, and the representative posture library sends the posture map P1 to the background replacement client.
  • the background replacement client receives the posture picture P1 sent on behalf of the posture library, and sends the posture picture P1 to the interactive display module.
  • the interactive display module receives the posture map P1 and displays the posture map P1. Among them, the posture graph P1 indicates the posture G1.
  • the acquisition module obtains the image T1 and sends the image T1 to the background replacement client.
  • the background replacement client receives the image T1 and sends it to the background replacement engine.
  • the background replacement client determines whether segmentation result 1 and segmentation result 3 meet the preset condition D2. If the preset D2 condition is not met, the pose map P2 is determined based on segmentation result 1 and segmentation result 3. Among them, the posture map P2 is used to indicate the posture G2.
  • the background replacement client wants to send a request to obtain the pose image P2 on behalf of the pose library.
  • the representative posture library responds to the instruction to obtain the posture map P2 and sends the posture map P2 to the background replacement client.
  • the background replacement client receives the posture image P2 and sends it to the interactive display module.
  • the interactive display module receives the posture map P2 and displays the posture map P2.
  • the acquisition module obtains image T3 and sends image T3 to the background replacement client.
  • the posture of the target object in the image T2 is the posture G2 in the posture diagram P3.
  • the background replacement client receives image T3 and sends it to the background replacement engine.
  • the background replacement client receives segmentation result 4 and segmentation result 6, determines whether segmentation result 4 and segmentation result 6 satisfy the preset condition point D2, and when the preset condition D2 is met, determines the M image segmentation lightweight model M3 as A lightweight model for target image segmentation.
  • the acquisition module obtains the original image T5 and sends the original image T5 to the background replacement client.
  • the background replacement client After receiving the original image T5, the background replacement client sends the original image T5 to the background replacement engine.
  • the background replacement engine receives the original image T5, uses the target image segmentation lightweight model to segment the original image T5, and obtains the foreground area and background area. Then, the background area in the original image T5 is replaced with the preset background to obtain the replaced image T6. And send the replaced image T6 to the background replacement client.
  • the background replacement client receives the replaced image T6 and sends it to the interactive display module for display.
  • the interactive display module receives the replaced image T6 and displays the replaced image T6.
  • the above-mentioned background replacement client, representative gesture library and background replacement engine can be deployed on the same electronic device, or on different electronic devices.
  • the background replacement client can be deployed on one electronic device
  • the representative posture library and background replacement engine can be deployed on another electronic device, etc., and this application does not limit this.
  • Figure 16 shows a schematic diagram of a background replacement system provided by an embodiment of the present application.
  • the background replacement system includes an electronic device 200 and a server 300 .
  • a communication connection may exist between the electronic device 200 and the server 300, enabling data communication between the two. in,
  • the electronic device 200 is used to obtain the first image and send the first image to the server;
  • the server 300 is configured to receive the first image, input the first image into the first image segmentation lightweight model, and determine the area where the target object is located and the area where the first background content is located in the first image; Replace the first background content with the second background content, obtain the second image, and send the second image to the electronic device;
  • the electronic device 200 is used to acquire a second image and display the second image; in response to the user's background replacement training operation, display a first posture map, and the first posture map is used to instruct the user to make the first posture;
  • the electronic device 200 is configured to obtain a third image, and send the third image to the server, where the user's posture in the third image is the first posture;
  • the server 300 is used to obtain the third image, and train the first image segmentation lightweight model based on the third image to obtain the target image segmentation lightweight model; input the first image into the target image segmentation lightweight model to determine the target image segmentation lightweight model.
  • the electronic device 200 is configured to receive the fourth image and display the fourth image.
  • the electronic device 200 can also be used to perform any one of the above steps S201, S202, S206, S208, S209, S210, and S211.
  • the replacement method will not be described again here.
  • the server 300 may also be used to perform the background replacement method that may be implemented in any one of the above steps S203, S205, S207, S213, and S214, which will not be described again here.
  • the electronic device 200 can also obtain the image T3, send the image T3 to the server 300, the server 300 receives the image T3, and uses the image T3 and the full image segmentation model to train the image segmentation lightweight model M2 until the model is satisfied.
  • the end condition of training is to obtain a lightweight model for target image segmentation.
  • the electronic device 200 may include an interactive display module, a collection module, and a background replacement client; the server 300 may include a background replacement engine and a representative gesture library.
  • the interactive display module 301 can also be used to perform any of the possible implementations of steps 1 to 2, step 6, step 16, and step 29 in the embodiment of FIG. 15 .
  • the collection module can also be used to perform any of the possible background replacement methods in steps 7, 17, and 23 in the embodiment of FIG. 15, which will not be described again here.
  • the background replacement client can also be used to perform steps 3, 5, 8, 12, and The method of performing any possible background replacement in step 13, step 15, step 18, step 22, step 24, and step 28 will not be described again here.
  • the background replacement engine 304 can also be used to perform any of the steps 9 to 11 and 19 to 21 in the above embodiment of FIG. 15 to achieve background replacement. Again.
  • the representative posture library can also be used to perform any possible background replacement method in steps 4 and 14 in the embodiment of FIG. 15 , which will not be described again here.
  • the background replacement method provided by the embodiment of the present application is described in detail above with reference to FIGS. 2A to 15 .
  • the background replacement device and electronic equipment provided by the embodiment of the present application will be described with reference to FIGS. 17A, 17B and 18 .
  • FIG 17A is a schematic diagram of a background replacement device provided by an embodiment of the present application.
  • the background replacement device 400 includes a display unit 401 and an acquisition unit 402, where,
  • the display unit 401 is used to display the second image obtained by performing the first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and the first background. The area where the content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;
  • the display unit 401 is also configured to, in response to the user's background replacement training operation, the electronic device display a first posture map, and the first posture map is used to instruct the user to make the first posture;
  • the display unit 401 is also used by the electronic device to obtain a third image, where the user's posture in the third image is the first posture;
  • the display unit 401 is also used for the electronic device to display the fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is trained based on the third image. Obtained; the fourth image is obtained by replacing the first background content in the first image with the second background content.
  • the background replacement device 400 in the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • the PLD can be a complex program.
  • Logic device complex programmable logical device, CPLD
  • field-programmable gate array field-programmable gate array
  • FPGA field-programmable gate array
  • GAL general array logic
  • the background replacement device also includes an image segmentation unit 403 , a determination unit 404 , a model training unit 405 , and a background replacement unit 406 . in,
  • the image segmentation unit 403 is used to input the third image into the full image segmentation model to obtain the first segmentation result, and is also used to input the third image into the first image segmentation lightweight model to obtain the second segmentation result; in the full image segmentation model
  • the number of model parameters is greater than the number of model parameters in the first image segmentation lightweight model; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the background content is located in the first image.
  • the model training unit 405 is used to train the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain the second image segmentation lightweight model;
  • the image segmentation unit 403 is also used to input the third image to the second image segmentation lightweight model to obtain the third segmentation result;
  • Determining unit 404 When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result.
  • the second posture map is used to instruct the user to make a second gesture. Posture, the second posture map includes the first limb, the area where the first limb is located in the first segmentation result and the area where the first limb is located in the third segmentation result The areas are different.
  • the model training unit 405 is configured to train the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.
  • the background replacement unit 406 is configured to: replace the initial background content in the area where the background content is located in the third image with the preset background content in the second segmentation result to obtain the seventh image;
  • the operation of the background replacement device 400 to implement background replacement may refer to the related operations of the electronic device in the above method embodiment, and will not be described in detail here.
  • the above display unit 401, acquisition unit 402, image segmentation unit 403, determination unit 404, model training unit 405, and background replacement unit 406 may correspond to the above electronic device 100, and may perform the above method implementation. The operations performed by the electronic device 100 in the example will not be described again here.
  • the above-mentioned display unit 401, acquisition unit 402, and determination unit 404 may correspond to the above-mentioned electronic device 200, and the above-mentioned image segmentation unit 403, model training unit 405, and background replacement unit 406 may correspond to the above-mentioned server. 300.
  • FIG. 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 10 includes: a processor 11, a communication interface 12 and a memory 13.
  • the processor 11, the communication interface 12 and the memory 13 are connected to each other through a bus 14.
  • the processor 11 is used to execute instructions stored in the memory 13 .
  • the memory 13 stores program codes, and the processor 11 can call the program codes stored in the memory 13 to perform the following operations:
  • the second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and where the first background content is located. area; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;
  • the fourth image obtained by performing the second background replacement on the first image is displayed; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is obtained based on the training of the third image; the fourth image is obtained by replacing the first image. Obtained by replacing the first background content in the image with the second background content.
  • the processor 11 can have a variety of specific implementation forms.
  • the processor 11 can be any one or a combination of CPU, GPU, TPU or NPU.
  • the processor 11 can also be a single processor. core processor or multi-core processor.
  • the processor 11 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip.
  • the above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the processor 11 can also be implemented solely using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).
  • DSP digital signal processor
  • the communication interface 12 can be a wired interface or a wireless interface for communicating with other modules or devices.
  • the wired interface can be an Ethernet interface, a controller area network (controller area network, CAN) interface or a local interconnect network. LIN) interface, the wireless interface can be a cellular network interface or a wireless LAN interface, etc.
  • the memory 13 may be a non-volatile memory, such as a read-only memory (ROM), a programmable ROM (PROM), an erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable programmable read-only memory
  • electrically erasable programmable read-only memory electrically erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory electrically erasable programmable read-only memory
  • the memory 13 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • Memory 13 may also be used to store instructions and data. Additionally, the electronic device 10 may include more or fewer components than shown in FIG. 18 , or may have components configured differently.
  • the bus 14 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 18, but it does not mean that there is only one bus or one type of bus.
  • the electronic device 10 may also include an input/output interface 15 connected with an input/output device for receiving input information and outputting operation results.
  • the electronic device 10 in the embodiment of the present application may correspond to the background replacement device 400 in the above embodiment, and may perform the operations performed by the electronic device 100 in the above method embodiment, which will not be described again.
  • the electronic device 10 may be the above-mentioned electronic device 100 or the above-mentioned electronic device 200.
  • FIG 19 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 20 includes: a processor 21, a communication interface 22, and a memory 23.
  • the processor 21, the communication interface 22, and the memory 23 are connected to each other through a bus 24.
  • the processor 21 is used to execute instructions stored in the memory 23 .
  • the memory 23 stores program codes, and the processor 21 can call the program codes stored in the memory 23 to perform the following operations:
  • Receive the first image input the first image into the first image segmentation lightweight model, and determine the area where the target object in the first image is located and the area where the first background content is located;
  • the background content is replaced with the second background content, and the resulting second image is,
  • the fourth image is obtained by replacing the first background content in the first image with the second background content.
  • the processor 21 can have a variety of specific implementation forms.
  • the processor 21 can be any one or a combination of CPU, GPU, TPU or NPU.
  • the processor 21 can also be a single processor. core processor or multi-core processor.
  • the processor 21 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip.
  • the above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the processor 21 can also be implemented solely by using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).
  • DSP digital signal processor
  • the communication interface 22 may be a wired interface or a wireless interface, used for communicating with other modules or devices.
  • the wired interface can be an Ethernet interface, a controller area network (CAN) interface or a local interconnect network (LIN) interface
  • the wireless interface can be a cellular network interface or use a wireless LAN interface, etc.
  • the memory 23 may be a non-volatile memory, such as read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), Electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • Electrically erasable programmable read-only memory electrically EPROM, EEPROM
  • flash memory electrically erasable programmable read-only memory
  • the memory 23 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), It is used as an external cache.
  • RAM random access memory
  • Memory 23 may also be used to store instructions and data.
  • the electronic device 20 may include more or fewer components than shown in FIG. 19 , or may have different component configurations.
  • the bus 24 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 19, but it does not mean that there is only one bus or one type of bus.
  • the electronic device 20 may also include an input/output interface 25.
  • the input/output interface 25 is connected with an input/output device and is used for receiving input information and outputting operation results.
  • the electronic device 20 may be the above-mentioned electronic device 100 or the above-mentioned server 300 .
  • Embodiments of the present application also provide a non-transitory computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the computer program is run on a processor, the execution of the electronic device in the above method embodiment can be realized.
  • the specific implementation of the method steps when the processor of the computer storage medium performs the above method steps can refer to the specific operations of the electronic device in the above method embodiment, which will not be described again here.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media.
  • the semiconductor medium may be a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided in the present application are a background replacement method, the method comprising: segmenting a first image on the basis of an image segmentation lightweight model, and determining an area where a target object in the first image is located and an area where the background content is located, and when a user is not satisfied with the background replacement effect, the user training the image segmentation lightweight model by means of a background replacement training operation until a user-satisfied background replacement effect is achieved. Therefore, when the user is not satisfied with the replacement effect of background replacement performed on the basis of the current image segmentation lightweight model, training of the image segmentation lightweight model can be started by user operation, so as to raise the segmentation accuracy of the image segmentation lightweight model, thereby raising the accuracy of background replacement, reducing the probability of replacement errors, and improving the user experience when background replacement is performed.

Description

背景替换的方法和电子设备Background replacement methods and electronic devices
本申请要求于2022年03月18日提交中国专利局、申请号为202210271308.4、申请名称为“背景替换的方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on March 18, 2022, with the application number 202210271308.4 and the application title "Background replacement method and electronic device", the entire content of which is incorporated into this application by reference. .
技术领域Technical field
本申请涉及终端领域,尤其涉及背景替换的方法和电子设备。The present application relates to the field of terminals, and in particular to background replacement methods and electronic devices.
背景技术Background technique
为了降低应用的使用环境限制,保护用户的隐私,背景替换在很多视频应用中成为刚需,例如,视频通话、视频会议、视频客服等。其中,背景替换是一种将视频或图像中包含的背景区域的内容替换为指定背景内容的技术。在背景替换中,较核心的步骤是图像分割,即通过图像分割模型将输入的图像分割成目标区域和背景区域。In order to reduce application usage environment restrictions and protect user privacy, background replacement has become a necessity in many video applications, such as video calls, video conferencing, video customer service, etc. Among them, background replacement is a technology that replaces the content of the background area contained in a video or image with specified background content. In background replacement, the core step is image segmentation, that is, the input image is segmented into a target area and a background area through the image segmentation model.
目前,图像分割模型的训练是预先收集目标对象的大量历史图像或视频数据构建图像数据集,然后利用该图像数据集对图像分割模型进行训练,得到针对目标对象的优化好的图像分割模型。因此图像分割的准确性依赖于在图像分割模型的训练过程中使用的图像数据集的质量,而预先收集的图像数据集无法囊括所有个体特征,当图像数据集质量较低,即不包含目标对象的某一些个体特征时,在使用训练好的图像分割模型对包含目标对象的这些个体特征的图像进行分割时,该图像的目标区域中包含这些个体特征的区域会被当作背景区域,这样会导致图像分割的准确度不佳,进而在图像背景替换时,会导致替换出错,影响用户体验。Currently, the training of image segmentation models is to collect a large amount of historical image or video data of the target object in advance to build an image data set, and then use the image data set to train the image segmentation model to obtain an optimized image segmentation model for the target object. Therefore, the accuracy of image segmentation depends on the quality of the image dataset used in the training process of the image segmentation model, and the pre-collected image dataset cannot include all individual features. When the image dataset is of low quality, that is, it does not contain the target object. When using a trained image segmentation model to segment an image containing these individual features of the target object, the area containing these individual features in the target area of the image will be regarded as the background area, which will This results in poor image segmentation accuracy, which in turn causes replacement errors when the image background is replaced, affecting the user experience.
发明内容Contents of the invention
本申请提供了一种背景替换的方法和电子设备,实施该方法能够提升背景分割模型的图像分割准确率,进而提升背景替换的准确度,提升用户体验。This application provides a background replacement method and electronic device. Implementing this method can improve the image segmentation accuracy of the background segmentation model, thereby improving the accuracy of background replacement and improving user experience.
第一方面,本申请实施例提供了一种背景替换的方法,该方法包括,电子设备显示对第一图像进行第一背景替换得到的第二图像;第一背景替换基于第一图像分割轻量模型进行;第一图像中包括目标对象所在的区域和第一背景内容所在的区域;第二图像为将第一图像中的第一背景内容替换为第二背景内容得到的;第二背景内容与所述第一背景内容不同;响应于用户的背景替换训练操作,电子设备显示第一姿态图,第一姿态图用于指示用户做出第一姿态;电子设备获取第三图像,第三图像中用户的姿态为所述第一姿态;电子设备显示对第一图像进行第二背景替换得到的第四图像;第二背景替换基于目标图像分割轻量模型进行,目标图像分割轻量模型是基于第三图像训练得到;第四图像为将第一图像中的第一背景内容替换为第二背景内容得到的。In a first aspect, embodiments of the present application provide a method for background replacement. The method includes: an electronic device displays a second image obtained by performing a first background replacement on a first image; the first background replacement is based on lightweight segmentation of the first image. The model is performed; the first image includes the area where the target object is located and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is the same as The first background content is different; in response to the user's background replacement training operation, the electronic device displays a first gesture image, and the first gesture image is used to instruct the user to make the first gesture; the electronic device acquires a third image, and in the third image The user's posture is the first posture; the electronic device displays the fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is based on the third It is obtained by training on three images; the fourth image is obtained by replacing the first background content in the first image with the second background content.
可以理解的是,电子设备在检测到背景替换训练操作后,显示第一姿态图指示用户做出第一姿态,接着电子设备获取到包括用户的第三图像,该第三图像中用户的姿态为第一姿态。电子设备使用该第三图像作为第一图像分割轻量模型的训练数据,可以获取到用户的最新的个体特征。这样,使用该第三图像训练得到的目标图像分割轻量模型的对图像的分割准确度更高。也就是说,目标图像分割轻量模型可以更为准确地从图像中分割出目标对象和背景内 容。因此,对于基于第一图像分割轻量模型对第一图像进行第一背景替换得到的第二图像,和基于目标图像分割轻量模型对第一图像进行第一背景替换得到的第四图像来说,目标对象所在的区域的大小在第四图像中相比在第二图像中,更接近在第一图像中目标对象所在的区域的大小。即,第四图像的背景替换的准确度高于第二图像的背景替换的准确度。目标图像分割轻量模型的分割准确度高于第一图像分割轻量模型的分割准确度。It can be understood that after detecting the background replacement training operation, the electronic device displays the first gesture image to instruct the user to make the first gesture, and then the electronic device obtains a third image including the user, and the user's gesture in the third image is The first gesture. The electronic device uses the third image as training data for the first image segmentation lightweight model, and can obtain the latest individual characteristics of the user. In this way, the target image segmentation lightweight model trained using the third image can segment the image more accurately. In other words, the target image segmentation lightweight model can more accurately segment the target object and the background within the image. Allow. Therefore, for the second image obtained by performing the first background replacement on the first image based on the first image segmentation lightweight model, and the fourth image obtained by performing the first background replacement on the first image based on the target image segmentation lightweight model , the size of the area where the target object is located in the fourth image is closer to the size of the area where the target object is located in the first image than in the second image. That is, the accuracy of background replacement of the fourth image is higher than the accuracy of background replacement of the second image. The segmentation accuracy of the target image segmentation lightweight model is higher than the segmentation accuracy of the first image segmentation lightweight model.
实施第一方面的方法,电子设备使用图像分割轻量模型对第一图像进行分割,确定出第一图像中的目标对象所在的区域和背景内容所在的区域,当用户不满意背景替换的效果时,用户可以通过背景替换训练操作,对图像分割轻量模型进行训练,直到达到用户满足的背景替换效果。这样,用户在不满意基于当前图像分割轻量模型进行背景替换的效果时,可以通过用户操作开启图像分割轻量模型的训练,提高图像分割轻量模型的分割准确度,从而提高背景替换的准确度,提升用户体验。To implement the method of the first aspect, the electronic device uses an image segmentation lightweight model to segment the first image, and determines the area where the target object is located and the area where the background content is located in the first image. When the user is not satisfied with the effect of background replacement , users can train the image segmentation lightweight model through the background replacement training operation until the background replacement effect that satisfies the user is achieved. In this way, when users are dissatisfied with the effect of background replacement based on the current lightweight model for image segmentation, they can start the training of the lightweight model for image segmentation through user operations to improve the segmentation accuracy of the lightweight model for image segmentation, thereby improving the accuracy of background replacement. degree and improve user experience.
结合第一方面,在一些实现方式中,在电子设备显示对所述第一图像进行第二背景替换得到的第四图像之前,上述方法还包括:电子设备将第三图像输入至图像分割全量模型,得到第一分割结果;电子设备将第三图像输入至第一图像分割轻量模型,得到第二分割结果;图像分割全量模型中模型参数的数量大于第一图像分割轻量模型中模型参数的数量;第一分割结果和第二分割结果用于指示第三图像中的目标对象所在的区域和第三背景内容所在的区域;电子设备基于第一分割结果和第二分割结果训练第一图像分割轻量模型,得到第二图像分割轻量模型;电子设备将第三图像输入至第二图像分割轻量模型,得到第三分割结果;In connection with the first aspect, in some implementations, before the electronic device displays the fourth image obtained by performing the second background replacement on the first image, the above method further includes: the electronic device inputs the third image to the full image segmentation model. , obtain the first segmentation result; the electronic device inputs the third image into the first image segmentation lightweight model to obtain the second segmentation result; the number of model parameters in the full image segmentation model is greater than the number of model parameters in the first image segmentation lightweight model Quantity; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image; the electronic device trains the first image segmentation based on the first segmentation result and the second segmentation result. A lightweight model is used to obtain a second image segmentation lightweight model; the electronic device inputs the third image into the second image segmentation lightweight model to obtain a third segmentation result;
在第一分割结果和第三分割结果不相同的情况下,电子设备基于第一分割结果和第三分割结果确定第二姿态图,第二姿态图用于指示用户做出第二姿态,第二姿态图中包括第一肢体,第一分割结果中第一肢体所在的区域与第三分割结果中的第一肢体所在的区域不相同;When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result. The second posture map is used to instruct the user to make the second posture. The posture map includes the first limb, and the area where the first limb is located in the first segmentation result is different from the area where the first limb is located in the third segmentation result;
电子设备基于第五图像训练第二图像分割轻量模型,得到目标图像分割轻量模型,第五图像中的目标对象的姿态为第二姿态。The electronic device trains a second image segmentation lightweight model based on the fifth image to obtain a target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.
其中,图像分割轻量模型是基于图像分割全量模型经过裁剪、量化得到的,图像分割全量模型中模型参数的数量大于图像分割轻量模型中模型参数的数量。一般来说,图像分割全量模型的分割结果相对于图像分割轻量模型的分割结果来说更准确,但是图像分割全量模型的计算量比较大。为了减少模型计算量,一般在电子设备上部署模型参数更少的图像分割轻量模型用来进行图像分割,但是图像分割轻量模型的分割效果不如图像分割全量模型的分割结果,因此,在图像分割轻量模型的训练过程中,利用图像分割全量模型对待训练的图像轻量模型进行指导训练,使得图像分割轻量模型的性能接近于图像分割全量模型的性能,即图像分割轻量模型的分割效果与图像分割全量模型的分割效果接近。也就是说,电子设备分别使用图像分割全量模型和图像分割轻量模型对第三图像进行分割,得到第一分割结果和第二分割结果。基于第一分割结果和第二分割结果之间的误差,调整图像分割轻量模型的模型参数,使得图像轻量模型的分割效果更接近于图像分割全量模型。这样,可以在保证图像分割的效果的情况下,又减少电子设备的计算量。Among them, the image segmentation lightweight model is obtained by cropping and quantifying based on the image segmentation full model. The number of model parameters in the image segmentation full model is greater than the number of model parameters in the image segmentation lightweight model. Generally speaking, the segmentation results of the full image segmentation model are more accurate than the segmentation results of the lightweight image segmentation model, but the calculation amount of the full image segmentation model is relatively large. In order to reduce the amount of model calculations, an image segmentation lightweight model with fewer model parameters is generally deployed on electronic devices for image segmentation. However, the segmentation effect of the image segmentation lightweight model is not as good as the segmentation results of the image segmentation full model. Therefore, in the image During the training process of the segmentation lightweight model, the image segmentation full model is used to guide the training of the image lightweight model to be trained, so that the performance of the image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the segmentation of the image segmentation lightweight model The effect is close to that of the full image segmentation model. That is to say, the electronic device uses the full image segmentation model and the image segmentation lightweight model to segment the third image, respectively, to obtain the first segmentation result and the second segmentation result. Based on the error between the first segmentation result and the second segmentation result, the model parameters of the image segmentation lightweight model are adjusted so that the segmentation effect of the image lightweight model is closer to that of the full image segmentation model. In this way, the calculation amount of the electronic device can be reduced while ensuring the effect of image segmentation.
这样,电子设备使用第一分割结果和第二分割结果训练的到第二图像分割轻量模型后,使用第三图像对训练得到的第二图像分割轻量模型进行验证,在第三分割结果和第一分割结果不相同的情况下,电子设备基于分割结果确定姿态图,指示用户做出姿态图中示出的姿态,作为指导下一轮训练的训练数据。这样对训练述数据进行了主动筛选,提高了训练数据的质量。在使用经过筛选后的训练数据对模型进行训练,提高了模型的训练效果,进而提高了模 型的分割准确度。In this way, after the electronic device uses the first segmentation result and the second segmentation result to train the second image segmentation lightweight model, it uses the third image to verify the trained second image segmentation lightweight model. After the third segmentation result and When the first segmentation results are different, the electronic device determines the posture map based on the segmentation results, and instructs the user to make the posture shown in the posture map as training data to guide the next round of training. In this way, the training data is actively screened and the quality of the training data is improved. Using the filtered training data to train the model improves the training effect of the model, thereby improving the model's performance. type of segmentation accuracy.
结合第一方面,在一些实现方式中,电子设备基于第五图像训练所述第二图像分割轻量模型,得到目标图像分割轻量模型,具体包括:Combined with the first aspect, in some implementations, the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, which specifically includes:
电子设备获取所述第五图像;电子设备将第五图像输入至图像分割全量模型,得到第四分割结果;电子设备将第五图像输入至第二图像分割轻量模型,得到第五分割结果;第四分割结果和第五分割结果用于指示所第五图像中的所述目标对象所在的区域和第四背景内容所在的区域;所电子设备基于所第四分割结果和第五分割结果训练第一图像分割轻量模型,得到第三图像分割轻量模型;电子设备将第五图像输入至第三图像分割轻量模型,得到第六分割结果;The electronic device acquires the fifth image; the electronic device inputs the fifth image into the full image segmentation model to obtain a fourth segmentation result; the electronic device inputs the fifth image into the second image segmentation lightweight model to obtain the fifth segmentation result; The fourth segmentation result and the fifth segmentation result are used to indicate the area where the target object is located and the area where the fourth background content is located in the fifth image; the electronic device trains the third segmentation result based on the fourth segmentation result and the fifth segmentation result. An image segmentation lightweight model is used to obtain a third image segmentation lightweight model; the electronic device inputs the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;
在第四分割结果和第六分割结果不满足第一预设条件的情况下,第三图像分割轻量模型为目标图像分割轻量模型。When the fourth segmentation result and the sixth segmentation result do not meet the first preset condition, the third image segmentation lightweight model is the target image segmentation lightweight model.
这样,电子设备在训练得到第二图像分割轻量模型后,使用第五图像对第二图像分割轻量模型进行验证,判断第二图像分割轻量模型的分割结果和图像分割全量模型的分割结果师傅满足第一预设条件,在不满足第一预设条件的情况下,电子设备确定第二图像分割轻量模型为目标图像分割轻量模型,停止训练。也就是说,在一轮训练后,电子设备会判断图像分割轻量模型的分割结果是否满足要求,在满足要求的情况下,停止训练,这样可以避免不必要的训练。In this way, after the electronic device obtains the second image segmentation lightweight model through training, it uses the fifth image to verify the second image segmentation lightweight model, and determines the segmentation results of the second image segmentation lightweight model and the segmentation results of the full image segmentation model. The master satisfies the first preset condition. If the first preset condition is not met, the electronic device determines that the second image segmentation lightweight model is the target image segmentation lightweight model and stops training. That is to say, after a round of training, the electronic device will determine whether the segmentation results of the lightweight image segmentation model meet the requirements. If the requirements are met, the training will be stopped, thus avoiding unnecessary training.
结合第一方面,在一些实现方式中,电子设备将第五图像输入至第三图像分割轻量模型,得到第六分割结果之后,该方法还包括:在第四分割结果和所第六分割结果满足第一预设条件的情况下,电子设备基于第四分割结果和第六分割结果确定第三姿态图,第三姿态图用于指示用户做出第三姿态,第三姿态图中包括第二肢体,第四分割结果中第二肢体与第六分割结果中的第二肢体不相同;Combined with the first aspect, in some implementations, the electronic device inputs the fifth image to the third image segmentation lightweight model, and after obtaining the sixth segmentation result, the method further includes: between the fourth segmentation result and the sixth segmentation result When the first preset condition is met, the electronic device determines a third posture map based on the fourth segmentation result and the sixth segmentation result. The third posture map is used to instruct the user to make the third posture. The third posture map includes the second posture map. Limb, the second limb in the fourth segmentation result is different from the second limb in the sixth segmentation result;
电子设备基于第六图像训练第三图像分割轻量模型,得到目标图像分割轻量模型,第六图像中的目标对象的姿态为第三姿态。The electronic device trains a third image segmentation lightweight model based on the sixth image to obtain a target image segmentation lightweight model, and the posture of the target object in the sixth image is the third posture.
这样,在第四分割结果和第五分割结果不满足要求的情况下,也就是第三图像分割轻量模型的分割结果和图像分割全量模型的分割结果不满足要求的情况下,电子设备第三图像分割轻量模型的分割结果和图像分割全量模型的分割结果确定第三姿态图,用于指示用户做出第三姿态。这样,电子设备通过每一轮的训练,不断筛选训练数据,直到模型的分割结果满足要求。这样,可以提升训练数据的质量,提升模型的训练效果,还可以避免训练数据的浪费,避免因无效训练数据而导致训练时间过长。In this way, when the fourth segmentation result and the fifth segmentation result do not meet the requirements, that is, when the segmentation result of the third image segmentation lightweight model and the segmentation result of the third image segmentation full model do not meet the requirements, the third electronic device The segmentation results of the image segmentation lightweight model and the segmentation results of the image segmentation full model determine the third posture map, which is used to instruct the user to make the third posture. In this way, the electronic device continuously filters the training data through each round of training until the segmentation results of the model meet the requirements. In this way, the quality of training data can be improved, the training effect of the model can be improved, the waste of training data can be avoided, and the training time caused by invalid training data can be avoided.
结合第一方面,在一些实现方式中,在所述第一分割结果和第三分割结果不相同的情况下,电子设备基于第一分割结果和第三分割结果确定第二姿态图,包括:In conjunction with the first aspect, in some implementations, when the first segmentation result and the third segmentation result are different, the electronic device determines the second posture map based on the first segmentation result and the third segmentation result, including:
在第一分割结果和第三分割结果的差值满足第一预设条件的情况下,电子设备基于第一分割结果和第三分割结果的差值确定第一图像的目标区域;电子设备基于目标区域确定所述第二姿态图。When the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, the electronic device determines the target area of the first image based on the difference between the first segmentation result and the third segmentation result; the electronic device determines the target area of the first image based on the target The region determines the second pose map.
其中,目标区域为分割不佳的区域,电子设备基于第一分割结果和第三分割结果确定第三分割结果中是否存在的分割不佳的区域。如果第三分割结果中存在分割不佳的区域,电子设备从代表姿态库中确定出包含该分割不佳的区域的姿态图,用于指示用户做出该姿态图中 包含的姿态,即分割不加的区域包含的姿态。这样,在下一轮图像分割轻量模型的训练时,获取包含这些分割不佳的区域的用户图像,重点训练这些分割不佳的区域,即针对目标对象的个体特征进行个性化训练,提高了模型的训练效果,进而提高了模型的分割准确度。The target area is a poorly segmented area, and the electronic device determines whether there is a poorly segmented area in the third segmentation result based on the first segmentation result and the third segmentation result. If there is a poorly segmented area in the third segmentation result, the electronic device determines a posture map containing the poorly segmented area from the representative posture library, and is used to instruct the user to make the posture map. The included pose is the pose included in the area that is not divided. In this way, in the next round of training of the image segmentation lightweight model, user images containing these poorly segmented areas are obtained and focused on training these poorly segmented areas, that is, personalized training is conducted based on the individual characteristics of the target object, which improves the model training effect, thereby improving the segmentation accuracy of the model.
结合第一方面,在一些实现方式中,第一分割结果、第三分割结果包括第三图像中像素点的像素信息;Combined with the first aspect, in some implementations, the first segmentation result and the third segmentation result include pixel information of pixels in the third image;
在第一分割结果和第三分割结果的差值满足第一预设条件的情况下,电子设备基于第一分割结果和所述第三分割结果的差值确定第一图像的目标区域,具体包括:电子设备基于所述第一分割结果中像素点的像素信息和第三分割结果中像素点的像素信息的差值,确定第三图像中第一目标像素点,第一目标像素点为第一分割结果与第三分割结果的中像素信息差值大于第一阈值的像素点;电子设备确定第三图像中目标对象的一个或多个肢体所在的区域,一个或多个肢体中包括第三肢体;在第三图像中第三肢体所在的区域中第一目标像素点的数量大于第二阈值的情况下,电子设备确定第三图像中第三肢体所在的区域为目标区域。When the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, the electronic device determines the target area of the first image based on the difference between the first segmentation result and the third segmentation result, specifically including: : The electronic device determines the first target pixel point in the third image based on the difference between the pixel information of the pixel point in the first segmentation result and the pixel information of the pixel point in the third segmentation result, and the first target pixel point is the first The difference between the mid-pixel information between the segmentation result and the third segmentation result is a pixel point greater than the first threshold; the electronic device determines the area where one or more limbs of the target object in the third image are located, and the one or more limbs include the third limb. ; When the number of first target pixels in the area where the third limb is located in the third image is greater than the second threshold, the electronic device determines that the area where the third limb is located in the third image is the target area.
在一些实现方式中,肢体可以包括头、颈、右肩、右大臂、右小臂、右手、左肩、左大臂、左小臂、左手、躯干、右胯、右大腿、右小腿、右脚、左胯、左大腿、左小腿、左脚等肢体。In some implementations, the limbs may include head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right Feet, left hip, left thigh, left calf, left foot and other limbs.
结合第一方面,在一些实现方式中,电子设备基于目标区域确定第二姿态图,具体包括:电子设备确定出目标区域中包含的目标对象的第三肢体;电子设备确定出包含第三肢体的第二姿态图。Combined with the first aspect, in some implementations, the electronic device determines the second posture map based on the target area, specifically including: the electronic device determines the third limb of the target object contained in the target area; the electronic device determines the third limb containing the third limb. Second pose diagram.
这样,电子设备可以筛选出包含分割不佳的肢体的姿态图,在下一轮的训练中,重点训练该分割不佳的区域,可以提高模型的训练效果。In this way, the electronic device can filter out posture images containing poorly segmented limbs, and in the next round of training, focus on training the poorly segmented areas, which can improve the training effect of the model.
结合第一方面,在一些实现方式中,电子设备确定出包含第三肢体的第二姿态图,包括:电子设备确定出包含第三肢体的多个姿态图;电子设备从多个姿态图中确定出第二姿态图,第二姿态图为多个姿态图中第三肢体所在的区域包含第一目标像素点最多的姿态图。Combined with the first aspect, in some implementations, the electronic device determines a second posture map including the third limb, including: the electronic device determines multiple posture maps including the third limb; the electronic device determines from the multiple posture maps A second posture image is generated, and the second posture image is the posture image in which the area where the third limb is located contains the most pixels of the first target among the plurality of posture images.
在一些实现方式中,姿态图包括轮廓图和人体区域图;在电子设备检测到第一用户操作之前,该方法还包括:电子设备获取训练数据集,训练数据集中包括多个图像;电子设备将所述图像数据集输入人体姿态估计模型,得到图像数据集对应的多个人体姿态向量;电子设备将所述图像数据集对应的多个人体姿态向量输入聚类模型,得到一个或多个代表姿态向量;电子设备将一个或多个代表姿态向量输入人体轮廓检测模型,得到一个或多个代表姿态向量对应的轮廓图;所电子设备将图像数据集输入人体区域检测模型,得到图像数据集对应的一个或多个肢体区域图。In some implementations, the posture map includes a contour map and a human body area map; before the electronic device detects the first user operation, the method further includes: the electronic device obtains a training data set, the training data set includes a plurality of images; the electronic device The image data set is input into the human body posture estimation model to obtain multiple human body posture vectors corresponding to the image data set; the electronic device inputs the multiple human body posture vectors corresponding to the image data set into the clustering model to obtain one or more representative postures vector; the electronic device inputs one or more representative posture vectors into the human body contour detection model, and obtains a contour map corresponding to one or more representative posture vectors; the electronic device inputs the image data set into the human body area detection model, and obtains the corresponding contour map of the image data set One or more limb region maps.
结合第一方面,在一些实现方式中,在第一分割结果和第三分割结果不满足第一预设条件的情况下,电子设备基于第三分割结果将第三图像中背景内容所在的区域中的第三背景内容替换为预设背景内容,得到第七图像;Combined with the first aspect, in some implementations, when the first segmentation result and the third segmentation result do not meet the first preset condition, the electronic device divides the area where the background content in the third image is located based on the third segmentation result. The third background content is replaced with the preset background content to obtain the seventh image;
电子设备显示第七图像、第一控件和第一提示信息,第一提示信息用于提示训练第二图像分割轻量模型。The electronic device displays the seventh image, the first control and the first prompt information, and the first prompt information is used to prompt training of the second image segmentation lightweight model.
这样,在图像分割轻量模型的分割结果满足要求的情况下,电子设备可以将对该分割结果中的背景内容替换为预设的背景内容,得到替换后的图像,并在显示屏上显示替换后的图。并且可以显示重新训练的控件,在用户不满意背景替换的效果时,用户可以通过该控件进行 重新训练图像分割轻量模型。可以提升用户体验。In this way, when the segmentation result of the image segmentation lightweight model meets the requirements, the electronic device can replace the background content in the segmentation result with the preset background content, obtain the replaced image, and display the replacement on the display screen The picture after. And the retraining control can be displayed. When the user is not satisfied with the effect of background replacement, the user can use this control to Retrain lightweight models for image segmentation. Can improve user experience.
结合第一方面,在一些实现方式中,电子设备显示第七图像、第一控件和第一提示信息之后,该方法还包括:电子设备检测到作用于第一控件的操作,电子设备确定第三阈值,第三阈值小于所述第一阈值;With reference to the first aspect, in some implementations, after the electronic device displays the seventh image, the first control and the first prompt information, the method further includes: the electronic device detects an operation acting on the first control, and the electronic device determines the third Threshold, the third threshold is smaller than the first threshold;
在第一分割结果与第三分割结果的差值满足第二预设条件的情况下,电子设备基于第一分割结果和第三分割结果确定第四姿态图,第四姿态图用于指示第四姿态;第二预设条件为:第一图像中第四肢体所在的区域中第二目标像素点的数目大于第二阈值;第二目标像素点为第一分割结果中像素点的像素信息和第二分割结果中像素点的像素信息的差值大于第三阈值的像素点;电子设备基于第八图像训练第二图像分割轻量模型,得到目标图像分割轻量模型,第八图像中的目标对象的姿态为第四姿态。When the difference between the first segmentation result and the third segmentation result satisfies the second preset condition, the electronic device determines a fourth posture map based on the first segmentation result and the third segmentation result, and the fourth posture map is used to indicate the fourth posture map. posture; the second preset condition is: the number of second target pixels in the area where the fourth limb is located in the first image is greater than the second threshold; the second target pixel is the pixel information of the pixel in the first segmentation result and the second The difference in pixel information of the pixels in the second segmentation result is greater than the third threshold pixel; the electronic device trains the second image segmentation lightweight model based on the eighth image to obtain the target image segmentation lightweight model, and the target object in the eighth image The posture is the fourth posture.
结合第一方面,在一些实现方式中,响应于用户的背景替换训练操作,电子设备显示第一姿态图之前,该方法还包括:Combined with the first aspect, in some implementations, in response to the user's background replacement training operation, before the electronic device displays the first posture map, the method further includes:
在电子设备检测到第一图像分割轻量模型的使用时长大于第一时长的情况下,电子设备显示第二提示信息和第二控件,第二提示信息用于提示训练所述第一图像分割轻量模型;背景替换训练操作为作用于第二控件的操作。When the electronic device detects that the usage time of the first image segmentation lightweight model is longer than the first time duration, the electronic device displays second prompt information and a second control. The second prompt information is used to prompt training of the first image segmentation lightweight model. quantity model; the background replacement training operation is an operation that acts on the second control.
第二方面,本申请实施例提供一种背景替换装置,包括用于执行第一方面或第一方面任一种可能实现方式中的背景替换的方法的各个单元。In a second aspect, embodiments of the present application provide a background replacement device, including various units for performing the background replacement method in the first aspect or any possible implementation of the first aspect.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括一个或多个处理器和一个或多个存储器;其中,一个或多个存储器与一个或多个处理器耦合,一个或多个存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当一个或多个处理器执行计算机指令时,使得电子设备执行如第一方面以及第一方面中任一可能的实现方式描述的方法。In a third aspect, embodiments of the present application provide an electronic device, which includes one or more processors and one or more memories; wherein one or more memories are coupled to one or more processors, and one or more The plurality of memories are used to store computer program codes. The computer program codes include computer instructions. When one or more processors execute the computer instructions, the electronic device performs as described in the first aspect and any possible implementation manner of the first aspect. method.
第四方面,本申请实施例提供了一种芯片系统,该芯片系统应用于电子设备,该芯片系统包括一个或多个处理器,该处理器用于调用计算机指令以使得该电子设备执行如第一方面以及第一方面中任一可能的实现方式描述的方法。In a fourth aspect, embodiments of the present application provide a chip system, which is applied to an electronic device. The chip system includes one or more processors, and the processor is used to call computer instructions to cause the electronic device to execute the first step. aspect and the method described in any possible implementation manner in the first aspect.
第五方面,本申请实施例提供一种计算机可读存储介质,包括指令,当上述指令在电子设备上运行时,使得上述电子设备执行如第一方面以及第一方面中任一可能的实现方式描述的方法。In a fifth aspect, embodiments of the present application provide a computer-readable storage medium that includes instructions. When the instructions are run on an electronic device, the electronic device causes the electronic device to execute the first aspect and any possible implementation of the first aspect. described method.
可以理解地,上述第二方面提供的背景替换装置、第三方面提供的电子设备、第四方面提供的芯片系统和第五方面提供的计算机存储介质均用于执行本申请实施例所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。It can be understood that the background replacement device provided by the second aspect, the electronic device provided by the third aspect, the chip system provided by the fourth aspect, and the computer storage medium provided by the fifth aspect are all used to execute the method provided by the embodiments of the present application. . Therefore, the beneficial effects it can achieve can be referred to the beneficial effects in the corresponding methods, and will not be described again here.
附图说明Description of the drawings
图1A-图1B是本申请实施例提供的电子设备的视频会议的用户界面示意图;1A-1B are schematic user interface diagrams of video conferencing on electronic devices provided by embodiments of the present application;
图2A是本申请实施例提供的背景替换的方法的流程图;Figure 2A is a flow chart of a background replacement method provided by an embodiment of the present application;
图2B是本申请实施例提供的背景替换的方法的流程图; Figure 2B is a flow chart of the background replacement method provided by the embodiment of the present application;
图3A-图3F是本申请实施例提供的一些用户界面示意图;Figures 3A-3F are schematic diagrams of some user interfaces provided by embodiments of the present application;
图4是本申请实施例提供的分割结果的示意图;Figure 4 is a schematic diagram of the segmentation results provided by the embodiment of the present application;
图5是本申请实施例提供的图像分割轻量模型的训练过程示意图;Figure 5 is a schematic diagram of the training process of the image segmentation lightweight model provided by the embodiment of the present application;
图6是本申请实施例提供的电子设备确定目标区域的过程流程图;Figure 6 is a process flow chart of the electronic device determining the target area provided by the embodiment of the present application;
图7是本申请实施例提供的另一用户界面示意图;Figure 7 is a schematic diagram of another user interface provided by an embodiment of the present application;
图8是本申请实施例提供的电子设备用户使用图像分割轻量模型1和目标图像分割轻量模型对原始图像进行分割,然后进行替换,得到替换后的图像的示意图;Figure 8 is a schematic diagram of an electronic device user provided by an embodiment of the present application using the image segmentation lightweight model 1 and the target image segmentation lightweight model to segment the original image and then replace it to obtain the replaced image;
图9是本申请实施例提供的电子设备构建代表姿态库的流程图;Figure 9 is a flow chart for an electronic device to construct a representative posture library provided by an embodiment of the present application;
图10是本申请实施例提供的一组骨骼关键点数据示意图;Figure 10 is a schematic diagram of a set of skeletal key point data provided by the embodiment of the present application;
图11是本申请实施例提供的电子设备聚类得到代表姿态向量的过程示意图;Figure 11 is a schematic diagram of the process of clustering electronic devices to obtain representative posture vectors according to an embodiment of the present application;
图12是本申请实施例提供的代表姿态向量得到轮廓图的示意图;Figure 12 is a schematic diagram of a contour diagram obtained from a representative posture vector provided by an embodiment of the present application;
图13是本申请实施例提供的肢体区域的示意图;Figure 13 is a schematic diagram of the limb region provided by the embodiment of the present application;
图14是本申请实施例提供的电子设备100的软件结构示意图;Figure 14 is a schematic diagram of the software structure of the electronic device 100 provided by the embodiment of the present application;
图15是本申请实施例提供的电子设备中的各个模块在本申请实施例中的协作关系;Figure 15 shows the cooperation relationship between various modules in the electronic device provided by the embodiment of the present application in the embodiment of the present application;
图16是本申请实施例提供的背景替换系统的示意图;Figure 16 is a schematic diagram of the background replacement system provided by the embodiment of the present application;
图17A是本申请实施例提供的一种背景替换装置的示意图;Figure 17A is a schematic diagram of a background replacement device provided by an embodiment of the present application;
图17B是本申请实施例提供的另一种背景替换装置的示意图;Figure 17B is a schematic diagram of another background replacement device provided by an embodiment of the present application;
图18是本申请实施例提供的一种电子设备的结构示意图;Figure 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图19是本申请实施例提供的另一种电子设备的结构示意图。Figure 19 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described clearly and in detail below with reference to the accompanying drawings. Among them, in the description of the embodiments of this application, unless otherwise stated, "/" means or, for example, A/B can mean A or B; "and/or" in the text is only a way to describe related objects. The association relationship means that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiment of the present application , "plurality" means two or more than two.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms “first” and “second” are used for descriptive purposes only and shall not be understood as implying or implying relative importance or implicitly specifying the quantity of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of this application, unless otherwise specified, “plurality” The meaning is two or more.
当前,在用户通过电子设备进行视频会议的时候,电子设备获取包含用户的视频并实时显示视频图像帧,该视频图像帧中包括用户图像以及用户周围所处的环境背景图像,这样在视频会议时可能会导致用户隐私泄露,为了保护用户的隐私,用户可以选择对视频图像帧中的环境背景图像替换为预设的背景图像。Currently, when a user conducts a video conference through an electronic device, the electronic device obtains the video containing the user and displays the video image frame in real time. The video image frame includes the user image and the environment background image around the user, so that during the video conference It may lead to the leakage of user privacy. In order to protect the user's privacy, the user can choose to replace the environmental background image in the video image frame with a preset background image.
示例性,图1A示例性示出了本申请实施例中电子设备的视频会议的用户界面110。如图1A所示,该视频会议的用户界面110中可以包括实时显示的视频图像帧111和背景替换控件112。其中,每一帧视频图像帧中可以包括背景所在的区域和目标对象所在的区域。目标对象所在的区域可以被称为前景区域,背景内容所在的区域可以被称为背景区域。如图1A中(a)图所示,视频图像帧111中包括区域1111和区域1112。如图1A中(b)图所示,区域1111为背景区域,区域1112为前景区域,即目标对象所在的区域。用户可以点击背景替换控件112, 响应于该用户操作,电子设备可以将背景区域中的背景内容进行替换,并显示用户界面120。Exemplarily, FIG. 1A illustrates a user interface 110 for video conferencing of an electronic device in an embodiment of the present application. As shown in FIG. 1A , the user interface 110 of the video conference may include a real-time displayed video image frame 111 and a background replacement control 112 . Each video image frame may include an area where the background is located and an area where the target object is located. The area where the target object is located may be called the foreground area, and the area where the background content is located may be called the background area. As shown in (a) of FIG. 1A , the video image frame 111 includes an area 1111 and an area 1112. As shown in (b) of Figure 1A , area 1111 is the background area, and area 1112 is the foreground area, that is, the area where the target object is located. The user can click on the background replacement control 112, In response to the user operation, the electronic device may replace the background content in the background area and display the user interface 120 .
如图1B所示,电子设备显示用户界面120。用户界面120中可以包括替换后的视频图像帧121和背景替换控件122。其中,视频图像帧121中包括的区域1211位替换后的背景区域,区域1212为前景区域。可以看出,视频图像帧121中的区域1211A和区域1211B中目标对象的头发凸起部分不存在,也就是说,电子设备在进行背景替换时,目标对象的一些肢体部位或者一些肢体部位的一部分被当作背景区域替换掉了,导致替换出错。其中,在背景替换中,首先是用图像分割模型对视频或图像中的前景区域和背景区域分割开,得到目标对象和初始背景内容,然后再对视频或图像中的背景区域的初始背景内容替换为预设的背景内容,得到替换后的图像或视频。在将前景区域和背景区域分割开的过程中,由于图像分割模型误将目标对象的一些肢体部位或者一些肢体部位的一部分内容当成初始背景内容,即将目标对象的一些肢体部位或者一些肢体部位的一部分分割为背景区域,导致图像分割不准确。进而,在背景替换过程中,目标对象的一些肢体部位或者一些肢体部位的一部分被当作背景内容被替换掉。这样,会导致替换后的图像中的目标对象的肢体部位不完整,即目标对象的一些肢体部位缺失或一些肢体部位的一部分缺失。例如,图1B中区域1211A和区域1211B的头发凸起部分。As shown in Figure 1B, the electronic device displays user interface 120. The user interface 120 may include the replaced video image frame 121 and the background replacement control 122. Among them, area 1211 included in the video image frame 121 is the replaced background area, and area 1212 is the foreground area. It can be seen that the hair bulges of the target object do not exist in the area 1211A and the area 1211B in the video image frame 121. That is to say, when the electronic device performs background replacement, some limbs or parts of some limbs of the target object It was replaced as a background area, resulting in a replacement error. Among them, in background replacement, the image segmentation model is first used to segment the foreground area and background area in the video or image to obtain the target object and the initial background content, and then the initial background content of the background area in the video or image is replaced. Get the replaced image or video for the preset background content. In the process of dividing the foreground area and the background area, because the image segmentation model mistakenly regards some body parts or parts of some body parts of the target object as the initial background content, that is, some body parts or parts of some body parts of the target object Segmented into background areas, resulting in inaccurate image segmentation. Furthermore, during the background replacement process, some body parts or parts of some body parts of the target object are replaced as background content. In this way, the limb parts of the target object in the replaced image will be incomplete, that is, some limb parts of the target object are missing or part of some limb parts are missing. For example, the hair raised portions of area 1211A and area 1211B in Figure 1B.
在背景替换技术中,图像分割的准确率会影响背景替换的效果。一般,图像分割越准确,电子设备在进行背景替换时,将目标对象区域中的内容被错误替换成预设背景内容的概率就越小。In background replacement technology, the accuracy of image segmentation will affect the effect of background replacement. Generally, the more accurate the image segmentation, the smaller the probability that the electronic device will mistakenly replace the content in the target object area with the preset background content when performing background replacement.
目前,图像分割是预先训练一个图像分割模型,然后使用该图像分割模型对视频图像帧进行分割。具体地,首先,电子设备需要收集大量的包括目标对象的图像或视频来构建图像数据集。然后,电子设备再利用该图像数据集对初始的图像分割模型进行训练,得到优化好的图像分割模型。该优化好的图像分割模型可以更为准确地从目标图像中分割出目标对象。图像分割的准确性依赖于在图像分割模型的训练过程中使用的图像数据集的质量。一般来说,预先收集的图像数据集无法囊括所有个体特征。例如,目标对象的发型、头饰、外形打扮等,这些特征可以随着时间的变化会发生变化。Currently, image segmentation involves pre-training an image segmentation model and then using the image segmentation model to segment video image frames. Specifically, first, the electronic device needs to collect a large number of images or videos including target objects to build an image data set. Then, the electronic device uses the image data set to train the initial image segmentation model to obtain an optimized image segmentation model. The optimized image segmentation model can segment the target object from the target image more accurately. The accuracy of image segmentation depends on the quality of the image dataset used during the training of the image segmentation model. Generally speaking, pre-collected image datasets cannot capture all individual characteristics. For example, the target object's hairstyle, headwear, appearance, etc., these characteristics can change over time.
当训练时的图像数据集质量较低,即不包含目标对象的某一些个体特征时,在使用训练好的图像分割模型对包含目标对象的这些个体特征的图像进行分割时,该图像的目标对象所在的区域中包含的这些个体特征的区域会被当作背景区域,这样会导致图像分割的准确度不佳,影响用户体验。例如,在图像分割模型训练时,收集目标对象的图像是短头发的情况,随着时间的推移,目标对象的发型会发生变化,这时在使用原来的图像分割模型进行图像分割时,容易将目标对象的发型发生改变的区域误分割成背景区域,导致分割不准确,进而替换出错,影响用户体验。When the quality of the image data set during training is low, that is, it does not contain some individual features of the target object, when using the trained image segmentation model to segment the image containing these individual features of the target object, the target object of the image The areas containing these individual features will be regarded as background areas, which will lead to poor image segmentation accuracy and affect the user experience. For example, when training the image segmentation model, collect images of the target object with short hair. As time goes by, the hairstyle of the target object will change. At this time, when using the original image segmentation model for image segmentation, it is easy to segment the target object. The area where the target object's hairstyle has changed is mistakenly segmented into the background area, resulting in inaccurate segmentation and subsequent replacement errors, affecting the user experience.
因此,本申请实施例提供一种背景替换的方法,在该方法中:首先,电子设备可以先基于第一图像分割轻量模型对获取的第一图像进行替换,将第一图像中的第一背景内容替换为第二背景内容,得到第二图像。电子设备在显示屏上显示针对第一图像进行背景替换后的第二图像。当用户认为第二图像的背景替换准确度不高的情况下,用户可以选择对第一图像分割轻量模型重新训练,即电子设备可以响应于用户的背景替换训练操作,电子设备在显示屏上显示第一姿态图,第一姿态图用于指示第一姿态,该第一姿态图用于指导用户根据该第一姿态图中包含的第一姿态进行录制视频或者图像。之后,电子设备可以获取到包含该用户的第三图像,电子设备可以基于第三图像重新训练第一图像分割轻量模型,得到目标图像分割轻量模型。最后,电子设备使用目标图像分割轻量模型对第一图像进行分割,得到分割结果, 将分割结果中的第一背景内容替换为第二背景内容,得到第四图像。也就是说,电子设备使用图像分割轻量模型对第一图像进行分割,确定出第一图像中的目标对象所在的区域和背景内容所在的区域,当用户不满意背景替换的效果时,用户可以通过背景替换训练操作,对图像分割轻量模型进行训练,直到达到用户满足的背景替换效果。这样,用户在不满意基于当前图像分割轻量模型进行背景替换的效果时,可以通过用户操作开启图像分割轻量模型的训练,提高图像分割轻量模型的分割准确度,从而在进行背景替换时,提高背景替换的准确度,提升用户体验。Therefore, embodiments of the present application provide a background replacement method. In this method: first, the electronic device can first replace the acquired first image based on the first image segmentation lightweight model, and replace the first image in the first image. The background content is replaced with the second background content to obtain the second image. The electronic device displays the second image after background replacement for the first image on the display screen. When the user thinks that the background replacement accuracy of the second image is not high, the user can choose to retrain the first image segmentation lightweight model, that is, the electronic device can respond to the user's background replacement training operation, and the electronic device can display the image on the display screen A first gesture diagram is displayed, the first gesture diagram is used to indicate the first gesture, and the first gesture diagram is used to guide the user to record a video or image according to the first gesture included in the first gesture diagram. Afterwards, the electronic device can obtain a third image containing the user, and the electronic device can retrain the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model. Finally, the electronic device uses the target image segmentation lightweight model to segment the first image and obtains the segmentation result, Replace the first background content in the segmentation result with the second background content to obtain a fourth image. That is to say, the electronic device uses an image segmentation lightweight model to segment the first image, and determines the area where the target object is located and the area where the background content is located in the first image. When the user is not satisfied with the effect of background replacement, the user can Through the background replacement training operation, the image segmentation lightweight model is trained until the background replacement effect that satisfies the user is achieved. In this way, when users are dissatisfied with the effect of background replacement based on the current image segmentation lightweight model, they can start the training of the image segmentation lightweight model through user operations to improve the segmentation accuracy of the image segmentation lightweight model, so that when performing background replacement , improve the accuracy of background replacement and enhance user experience.
在一些实现方式中,电子设备训练图像分割轻量模型可以是,电子设备在显示屏上显示姿态图P1,该姿态图P1用于指导目标对象根据该姿态图P1中包含的姿态G1进行拍摄视频或者图像。之后,电子设备获取到包含该目标对象的图像T1,电子设备分别将图像T1输入图像分割全量模型和图像分割轻量模型M1,得到分割结果1和分割结果2。电子设备基于分割结果1和分割结果2训练图像分割轻量模型M1,得到图像分割轻量模型M2。电子设备在将图像T1输入图像分割轻量模型M2,得到分割结果3。接着,电子设备判断分割结果1和分割结果3之间的误差是否预设条件D2,若满足,则该图像分割轻量模型M2为目标图像分割轻量模型,即该图像分割轻量模型M2可用于后续的图像分割;若不满足,电子设备则基于分割结果1和分割结果3确定分割结果3中分割不佳的区域,该分割不佳的区域可以作为下一轮需要重点训练的区域,继而确定包含分割不佳区域的姿态图P2。电子设备在显示屏上显示姿态图P2,该姿态图P2用于指导用户根据该姿态图P2中包含的姿态G2进行拍摄视频或者图像。电子设备获取到图像T2,按照上述步骤,电子设备使用图像T2是图像分割轻量模型M2继续训练,直到图像分割全量模型的分割结果和图像轻量模型的分割结果满足预设条件D2。In some implementations, the electronic device may train the image segmentation lightweight model by displaying a pose map P1 on the display screen. The pose map P1 is used to guide the target object to shoot a video according to the pose G1 contained in the pose map P1. Or image. Afterwards, the electronic device acquires the image T1 containing the target object, and inputs the image T1 into the image segmentation full model and the image segmentation lightweight model M1, respectively, to obtain segmentation result 1 and segmentation result 2. The electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2, and obtains the image segmentation lightweight model M2. The electronic device inputs the image T1 into the image segmentation lightweight model M2 and obtains the segmentation result 3. Next, the electronic device determines whether the error between segmentation result 1 and segmentation result 3 is the preset condition D2. If it is met, the image segmentation lightweight model M2 is the target image segmentation lightweight model, that is, the image segmentation lightweight model M2 is available. for subsequent image segmentation; if not satisfied, the electronic device determines the poorly segmented area in segmentation result 3 based on segmentation result 1 and segmentation result 3. The poorly segmented area can be used as the area that needs to be focused on training in the next round, and then Determine the pose map P2 that contains poorly segmented regions. The electronic device displays the posture diagram P2 on the display screen, and the posture diagram P2 is used to guide the user to shoot a video or image according to the posture G2 contained in the posture diagram P2. The electronic device obtains the image T2. According to the above steps, the electronic device uses the image T2 which is the image segmentation lightweight model M2 to continue training until the segmentation results of the full image segmentation model and the segmentation results of the image lightweight model meet the preset condition D2.
其中,图像分割轻量模型是基于图像分割全量模型经过裁剪、量化得到的,图像分割全量模型中模型参数的数量大于图像分割轻量模型中模型参数的数量。一般来说,图像分割全量模型的分割结果相对于图像分割轻量模型的分割结果来说更准确,但是图像分割全量模型的计算量比较大。为了减少模型计算量,一般在电子设备上部署模型参数更少的图像分割轻量模型用来进行图像分割,但是图像分割轻量模型的分割效果不如图像分割全量模型的分割结果,因此,在图像分割轻量模型的训练过程中,利用图像分割全量模型对待训练的图像轻量模型进行指导训练,使得图像分割轻量模型的性能接近于图像分割全量模型的性能,即图像分割轻量模型的分割效果与图像分割全量模型的分割效果接近。这样,可以在保证图像分割的效果的情况下,又减少电子设备的计算量。关于图像分割轻量模型的训练具体参见下述实施例,此处不再赘述。Among them, the image segmentation lightweight model is obtained by cropping and quantifying based on the image segmentation full model. The number of model parameters in the image segmentation full model is greater than the number of model parameters in the image segmentation lightweight model. Generally speaking, the segmentation results of the full image segmentation model are more accurate than the segmentation results of the lightweight image segmentation model, but the calculation amount of the full image segmentation model is relatively large. In order to reduce the amount of model calculations, an image segmentation lightweight model with fewer model parameters is generally deployed on electronic devices for image segmentation. However, the segmentation effect of the image segmentation lightweight model is not as good as the segmentation results of the image segmentation full model. Therefore, in the image During the training process of the segmentation lightweight model, the image segmentation full model is used to guide the training of the image lightweight model to be trained, so that the performance of the image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the segmentation of the image segmentation lightweight model The effect is close to that of the full image segmentation model. In this way, the calculation amount of the electronic device can be reduced while ensuring the effect of image segmentation. Regarding the training of the lightweight model for image segmentation, please refer to the following embodiments for details and will not be described again here.
其中,分割结果1和分割结果3用于指示图像T1中目标对象所在的区域和背景内容所在的区域。分割结果1中指示的背景内容所在的区域与分割结果3中指示的背景内容所在的区域不同。Among them, segmentation result 1 and segmentation result 3 are used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the background content indicated in segmentation result 1 is located is different from the area where the background content indicated in segmentation result 3 is located.
其中,电子设备基于分割结果1和分割结果3确定分割不佳的区域,即基于上一轮图像分割轻量模型的分割结果确定分割不佳的区域,然后在代表姿态库中筛选出包含分割不佳的区域的姿态图,作为指导下一轮训练的训练数据,这样对训练数据进行了主动筛选,提高了训练数据的质量。在下一轮图像分割轻量模型的训练时,获取包含这些分割不佳的区域的用户图像,重点训练这些分割不佳的区域,即针对目标对象的个体特征进行个性化训练,提高了模型的训练效果,进而提高了模型的分割准确度。从而在进行背景替换时,减少了替换出错的概率。另外,在一轮训练后,电子设备会判断图像分割轻量模型的分割结果是否满足预 设条件,在满足预设条件的情况下,停止训练,这样可以避免不必要的训练。Among them, the electronic device determines the poorly segmented area based on segmentation result 1 and segmentation result 3, that is, based on the segmentation result of the previous round of image segmentation lightweight model, it determines the poorly segmented area, and then filters out the areas containing poorly segmented images in the representative posture library. The posture map of the best area is used as training data to guide the next round of training. In this way, the training data is actively screened and the quality of the training data is improved. In the next round of training of the image segmentation lightweight model, user images containing these poorly segmented areas are obtained and focused on training these poorly segmented areas, that is, personalized training is conducted based on the individual characteristics of the target object, which improves the training of the model. effect, thus improving the segmentation accuracy of the model. This reduces the probability of substitution errors when performing background replacement. In addition, after a round of training, the electronic device will determine whether the segmentation results of the image segmentation lightweight model meet the predetermined Set conditions and stop training when the preset conditions are met, thus avoiding unnecessary training.
在本申请实施例中,上述姿态图P1也可以被称为第一姿态图,姿态G1也可以被称为第一姿态,图像T1也可以被称为第三图像,图像分割轻量模型1也可以被称为第一图像分割轻量模型,分割结果1也可以被称为第一分割结果,分割结果2也可以被称为第二分割结果,分割结果3也可以被称为第三分割结果,图像分割轻量模型2可以被称为第二图像分割轻量模型。In the embodiment of the present application, the above-mentioned posture image P1 can also be called the first posture image, the posture G1 can also be called the first posture, the image T1 can also be called the third image, and the image segmentation lightweight model 1 can also be called It can be called the first image segmentation lightweight model, segmentation result 1 can also be called the first segmentation result, segmentation result 2 can also be called the second segmentation result, and segmentation result 3 can also be called the third segmentation result. , the image segmentation lightweight model 2 can be called the second image segmentation lightweight model.
下面结合图2A-图13示例性介绍本申请一实施例提供的背景替换的方法。The following is an exemplary introduction to a background replacement method provided by an embodiment of the present application with reference to FIGS. 2A to 13 .
图2A示出了本申请一实施例的背景替换方法流程图。如图2A所示,该方法包括步骤S101至步骤S104。Figure 2A shows a flow chart of a background replacement method according to an embodiment of the present application. As shown in Figure 2A, the method includes steps S101 to S104.
S101,电子设备显示对第一图像进行第一背景替换得到的第二图像。S101. The electronic device displays a second image obtained by replacing the first background with the first image.
其中,图像T1可以是用户上传的,例如,用户需要对某张图片的背景进行替换,用户可以在电子设备上传需要替换背景的图片。也可以是电子设备拍摄的图像或视频帧中的一帧图像。例如,用户在使用视频会议软件进行视频会议时,或者用户在跟其他人视频时,电子设备可以拍摄到用户的图像。The image T1 may be uploaded by the user. For example, if the user needs to replace the background of a certain picture, the user may upload the picture whose background needs to be replaced on the electronic device. It may also be an image captured by an electronic device or an image in a video frame. For example, when a user is using video conferencing software for a video conference, or when the user is video chatting with other people, the electronic device can capture the user's image.
具体地,第一图像中包括目标对象所在的区域和第一背景所在的区域。电子设备可以基于第一图像分割轻量模型确定出第一图像中的目标对象所在的区域和第一背景内容所在的区域,电子设备将第一图像中第一背景内容所在的区域的第一背景内容替换为第二背景内容得到第二图像。其中,第一背景内容和第二背景内容不同,第二背景内容可以是用户预先设置的背景内容,也可以是电子设备出厂设置的默认背景内容。Specifically, the first image includes an area where the target object is located and an area where the first background is located. The electronic device can determine the area where the target object in the first image is located and the area where the first background content is located based on the first image segmentation lightweight model, and the electronic device determines the first background of the area where the first background content is located in the first image. The content is replaced with the second background content to obtain the second image. The first background content and the second background content are different, and the second background content may be background content preset by the user, or may be default background content set by the electronic device at the factory.
S102,响应于用户的背景替换训练操作,电子设备显示第一姿态图。S102. In response to the user's background replacement training operation, the electronic device displays the first posture image.
其中,第一姿态图用于指示用户做出第一姿态。背景替换训练操作可以是作用于背景替换训练控件的操作。例如,电子设备显示第二图像后,在显示屏上显示背景替换训练控件。用户不满意基于第一图像分割轻量模型进行第一背景替换得到的第二图像的替换效果时,可以点击替换训练控件,开启第一图像分割轻量模型训练。背景替换训练操作还可以是语音指令或按压按键的操作。本申请对此不作限定。The first gesture diagram is used to instruct the user to make the first gesture. The background replacement training operation may be an operation acting on the background replacement training control. For example, after the electronic device displays the second image, the background replacement training control is displayed on the display screen. When the user is dissatisfied with the replacement effect of the second image obtained by replacing the first background based on the first image segmentation lightweight model, he can click the replacement training control to start the first image segmentation lightweight model training. The background replacement training operation can also be a voice command or a button pressing operation. This application does not limit this.
在一些实施例中,响应于用户的背景替换训练操作,在视频会议场景中,也可以解释成是点击了背景替换训练控件,然后再结束视频会议一连串操作。例如,电子设备检测到用户作用于背景替换训练控件后,电子设备可以在视频会议结束后,显示第一姿态图,再开启对于图像分割轻量模型的训练。In some embodiments, in response to the user's background replacement training operation, in the video conference scenario, it can also be interpreted as clicking the background replacement training control, and then ending the series of operations of the video conference. For example, after the electronic device detects that the user has acted on the background replacement training control, the electronic device can display the first pose image after the video conference ends, and then start training the image segmentation lightweight model.
S103,电子设备获取第三图像,其中,第三图像中的用户姿态为第一姿态。S103. The electronic device acquires a third image, where the user's posture in the third image is the first posture.
其中,第三图像可以是用户按照第一姿态图中指示的第一姿态拍摄的图像。可以理解的是,第三图像还可以是视频帧中的一帧图像,目标对象按照第一姿态图中指示的第一姿态1录制视频,电子设备获取到该视频,截取视频中的一帧图像作为第三图像。Wherein, the third image may be an image taken by the user according to the first posture indicated in the first posture diagram. It can be understood that the third image can also be an image in the video frame. The target object records the video according to the first posture 1 indicated in the first posture diagram. The electronic device obtains the video and intercepts a frame of image in the video. as the third image.
在一些可选的实施例中,第三图像可以是包含多张图像的图像集合。例如,电子设备一次可以拍摄多张图像,在录制的视频中截取多帧图像,该多帧图像中的一帧图像可作为第三图像。In some optional embodiments, the third image may be an image set including multiple images. For example, the electronic device can capture multiple images at a time, intercept multiple frames of images from the recorded video, and one of the multiple frames of images can be used as a third image.
S104,电子设备显示对第一图像进行第二背景替换的第四图像。 S104: The electronic device displays a fourth image that replaces the first image with a second background.
具体地,电子设备可以基于第三图像训练第一图像分割轻量模型,得到目标图像分割轻量模型。将第一图像输入目标图像分割轻量模型进行图像分割,可以确定出第一图像中的目标对象所在的区域和第一背景内容所在的区域。电子设备将第一图像中第一背景内容所在的区域的第一背景替换为第二背景内容,得到替换后的第四图像并显示该第四图像。Specifically, the electronic device can train the first image segmentation lightweight model based on the third image to obtain the target image segmentation lightweight model. By inputting the first image into the target image segmentation lightweight model for image segmentation, the area where the target object is located and the area where the first background content is located in the first image can be determined. The electronic device replaces the first background of the area where the first background content is located in the first image with the second background content, obtains a replaced fourth image, and displays the fourth image.
可以理解的是,电子设备在检测到背景替换训练操作后,显示第一姿态图指示用户做出第一姿态,接着电子设备获取到包括用户的第三图像,该第三图像中用户的姿态为第一姿态。电子设备使用该第三图像作为第一图像分割轻量模型的训练数据,可以获取到用户的最新的个体特征。这样,使用该第三图像训练得到的目标图像分割轻量模型的对图像的分割准确度更高。也就是说,目标图像分割轻量模型可以更为准确地从图像中分割出目标对象和背景内容。因此,对于基于第一图像分割轻量模型对第一图像进行第一背景替换得到的第二图像,和基于目标图像分割轻量模型对第一图像进行第一背景替换得到的第四图像来说,目标对象所在的区域的大小在第四图像中相比在第二图像中,更接近在第一图像中目标对象所在的区域的大小。即,第四图像的背景替换的准确度高于第二图像的背景替换的准确度。It can be understood that after detecting the background replacement training operation, the electronic device displays the first gesture image to instruct the user to make the first gesture, and then the electronic device obtains a third image including the user, and the user's gesture in the third image is The first gesture. The electronic device uses the third image as training data for the first image segmentation lightweight model, and can obtain the latest individual characteristics of the user. In this way, the target image segmentation lightweight model trained using the third image can segment the image more accurately. In other words, the target image segmentation lightweight model can more accurately segment the target object and background content from the image. Therefore, for the second image obtained by performing the first background replacement on the first image based on the first image segmentation lightweight model, and the fourth image obtained by performing the first background replacement on the first image based on the target image segmentation lightweight model , the size of the area where the target object is located in the fourth image is closer to the size of the area where the target object is located in the first image than in the second image. That is, the accuracy of background replacement of the fourth image is higher than the accuracy of background replacement of the second image.
其中,电子设备基于第三图像训练得到目标图像轻量模型的具体训练过程可以参考图2B中实施例中的描述,此处先不赘述。The specific training process for the electronic device to obtain the lightweight model of the target image based on the third image training may refer to the description in the embodiment in FIG. 2B and will not be described again here.
在一中可能的实现方式中,图2B示例性示出了电子设备如何训练得到目标图像分割轻量模型的过程。如图2B所示,电子设备训练得到目标图像分割轻量模型的过程如下:In a possible implementation, FIG. 2B schematically illustrates the process of how an electronic device trains to obtain a lightweight model for target image segmentation. As shown in Figure 2B, the process of training electronic equipment to obtain a lightweight model for target image segmentation is as follows:
S201,电子设备基于第一操作,电子设备显示姿态图P1,姿态图P1指示姿态G1。S201: Based on the first operation, the electronic device displays the posture map P1, and the posture map P1 indicates the posture G1.
在一种可能的实现方式中,在步骤S101之前,在电子设备满足预设条件D1的情况下,电子设备显示用户界面A,该用户界面A中可以显示提示框,该提示框用于提示目标对象是否重新训练图像分割轻量模型M1。其中,电子设备预先配置有图像分割轻量模型M1。预设条件D1可以包括:In a possible implementation, before step S101, when the electronic device satisfies the preset condition D1, the electronic device displays a user interface A, and a prompt box may be displayed in the user interface A, and the prompt box is used to prompt the target Object whether to retrain the image segmentation lightweight model M1. Among them, the electronic device is pre-configured with the image segmentation lightweight model M1. The preset condition D1 may include:
电子设备检测到图像分割轻量模型M1的使用时长超过预设时长,或,电子设备周期检测图像分割轻量模型M1的分割效果,当检测到图像分割轻量模型M1的分割效果满足预设条件D2。The electronic device detects that the usage time of the image segmentation lightweight model M1 exceeds the preset time, or the electronic device periodically detects the segmentation effect of the image segmentation lightweight model M1, and when it is detected that the segmentation effect of the image segmentation lightweight model M1 meets the preset conditions D2.
电子设备中可以配置预设时长,该预设时长可以是一个月、一年等。本申请实施例对预设时长具体取值不作限定。也就是说,当电子设备检测到图像分割轻量模型M1的使用时长大于一个月时,电子设备可重新启动图像分割轻量模型M1的训练。A preset time period can be configured in the electronic device, and the preset time period can be one month, one year, etc. The embodiments of this application do not limit the specific value of the preset duration. That is to say, when the electronic device detects that the image segmentation lightweight model M1 has been used for longer than one month, the electronic device can restart the training of the image segmentation lightweight model M1.
关于预设条件D2可以参见下述步骤S106中的描述,此处先不赘述。Regarding the preset condition D2, please refer to the description in step S106 below, which will not be described again here.
示例性地,该用户界面A可以如图3A中所示的用户界面210。如图3A所示,用户界面210可包括:状态栏211、日历指示符212、天气指示符213、提示框214。For example, the user interface A may be user interface 210 as shown in FIG. 3A. As shown in FIG. 3A , the user interface 210 may include: a status bar 211 , a calendar indicator 212 , a weather indicator 213 , and a prompt box 214 .
其中:状态栏211可包括移动通信信号的一个或多个信号强度指示符、无线高保真(wirelessfidelity,WiFi)信号的一个或多个信号强度指示符、电池状态指示符以及时间指示符。日历指示符212可用于指示当前时间。天气指示符213可用于指示天气类型。The status bar 211 may include one or more signal strength indicators of mobile communication signals, one or more signal strength indicators of wireless fidelity (WiFi) signals, a battery status indicator, and a time indicator. Calendar indicator 212 may be used to indicate the current time. Weather indicator 213 may be used to indicate weather type.
提示框214包括提示信息214a、确定控件214b和取消控件214c。其中,提示信息214a用于提示目标对象是否重新训练图像分割轻量模型M1,如图3A所示,提示信息214a可以是“是否开始背景替换引擎重新构建的流程?”,确定控件214b用于确定重新训练图像分割轻量模型M1,取消控件214c用于取消重新训练图像分割轻量模型M1。可以理解的是,提示框214可层叠显示于用户界面210。The prompt box 214 includes prompt information 214a, a confirmation control 214b and a cancel control 214c. Among them, the prompt information 214a is used to prompt the target object whether to retrain the image segmentation lightweight model M1. As shown in Figure 3A, the prompt information 214a can be "Do you want to start the process of rebuilding the background replacement engine?", and the determination control 214b is used to determine The image segmentation lightweight model M1 is retrained, and the cancel control 214c is used to cancel the retraining of the image segmentation lightweight model M1. It can be understood that the prompt box 214 can be displayed in a layered manner on the user interface 210 .
可以理解的是,本申请实施例对提示框214的形状和提示框214中的具体内容不作限定。 It can be understood that the embodiment of the present application does not limit the shape of the prompt box 214 and the specific content in the prompt box 214.
在一些可选的实施例中,该用户界面A可以如图3B中所示的用户界面220。如图3B所示,用户界面220包括提示框221。提示框221的相关描述参见上述提示框214的相关描述,在此不再赘述。In some optional embodiments, the user interface A may be user interface 220 as shown in Figure 3B. As shown in FIG. 3B , the user interface 220 includes a prompt box 221 . For the relevant description of the prompt box 221, please refer to the relevant description of the above-mentioned prompt box 214, which will not be described again here.
在一些可选的实施例中,该用户界面A可以如图3C中所示的用户界面230。如图3C所示,用户界面230包括日历指示符232、提示框231。其中:日历指示符232的相关描述参见上述日历指示符212的相关描述,在此不再赘述;提示框231的相关描述参见上述提示框214的相关描述,在此不再赘述。In some optional embodiments, the user interface A may be user interface 230 as shown in Figure 3C. As shown in FIG. 3C , the user interface 230 includes a calendar indicator 232 and a prompt box 231 . For the relevant description of the calendar indicator 232, please refer to the relevant description of the above-mentioned calendar indicator 212, which will not be described again here; for the relevant description of the prompt box 231, please refer to the relevant description of the above-mentioned prompt box 214, which will not be described again here.
其中,第一操作可以是作用于图3A中确定控件214b的触控操作,响应于该触控操作,电子设备显示姿态图P1。其中,姿态图P1用于指示姿态G1,姿态G1例如可以是双手整理耳机的姿态。可以理解的是,上述姿态G1仅为举例说明,在实际应用中,姿态G1还可以是其他姿态,例如,举手、抱头等,本申请具体的姿态形式不作限定。在本申请实施例中,第一操作可以被称为背景替换训练操作。在一种可选的实现方式中,第一操作还可以是作用于背景替换训练控件的操作。例如,电子设备在使用图像分割轻量模型M1对图像进行分割,背景替换,并显示包含该替换后的图像的用户界面,电子设备可以在该用户界面中显示背景替换训练控件,该背景替换训练控件用于训练图像分割轻量模型M1。The first operation may be a touch operation acting on the determination control 214b in FIG. 3A. In response to the touch operation, the electronic device displays the gesture map P1. The posture diagram P1 is used to indicate the posture G1. The posture G1 may be, for example, the posture of arranging the earphones with both hands. It can be understood that the above posture G1 is only an example. In practical applications, the posture G1 can also be other postures, such as raising hands, holding the head, etc. The specific posture form is not limited in this application. In the embodiment of the present application, the first operation may be called a background replacement training operation. In an optional implementation manner, the first operation may also be an operation acting on the background replacement training control. For example, when an electronic device uses the image segmentation lightweight model M1 to segment an image, replace the background, and display a user interface containing the replaced image, the electronic device can display a background replacement training control in the user interface, and the background replacement training The control is used to train the image segmentation lightweight model M1.
在一些可选的实施例中,基于第一操作,在视频会议场景中,也可以解释成是点击了背景替换训练控件,然后再结束视频会议一连串操作。例如,电子设备检测到用户作用于背景替换训练控件后,电子设备可以在视频会议结束后,显示第一姿态图,再开启对于图像分割轻量模型的训练。In some optional embodiments, based on the first operation, in the video conference scenario, it can also be interpreted as clicking the background replacement training control, and then ending the video conference series of operations. For example, after the electronic device detects that the user has acted on the background replacement training control, the electronic device can display the first pose image after the video conference ends, and then start training the image segmentation lightweight model.
示例性地,参见图3D,图3D示例性示出了电子设备显示姿态图P1的用户界面240。如图3D所示,用户界面240包括:录制指导框241、提示信息242、确定控件243、返回控件244。其中,录制指导框241用于显示姿态图P1,姿态图P1用于指示姿态1,姿态1为双手整理耳机的姿态。提示信息242用于提示目标对象完成姿态1相应的动作,例如,提示信息242可以是“动作要求:双手整理耳机”、“请在录制指导框中的白色区域内完成指定动作”。返回控件244用于退出当前用户界面240,返回到上一级用户界面,例如用户界面220。确定控件243用于获取电子设备拍摄的照片或视频。当电子设备检测到确定控件243的触控操作时,响应于该触控操作,电子设备显示用户界面250。Exemplarily, referring to FIG. 3D , FIG. 3D exemplarily shows a user interface 240 of the electronic device displaying the gesture map P1. As shown in FIG. 3D , the user interface 240 includes: recording guidance box 241 , prompt information 242 , confirmation control 243 , and return control 244 . Among them, the recording guidance frame 241 is used to display the posture picture P1, and the posture picture P1 is used to indicate posture 1, and posture 1 is the posture of arranging the earphones with both hands. The prompt information 242 is used to prompt the target object to complete the action corresponding to gesture 1. For example, the prompt information 242 may be "action requirement: arrange the earphones with both hands" or "please complete the specified action in the white area in the recording guidance box". The return control 244 is used to exit the current user interface 240 and return to the upper level user interface, such as the user interface 220. The determination control 243 is used to obtain photos or videos taken by the electronic device. When the electronic device detects the touch operation of the determination control 243, in response to the touch operation, the electronic device displays the user interface 250.
可以理解的是,上述录制指导框241和提示信息242仅为示例,本申请实施例对录制指导框241和提示信息242的形状和录制指导框241和提示信息242中的具体内容不作限定。It can be understood that the above-mentioned recording guidance frame 241 and prompt information 242 are only examples, and the embodiment of the present application does not limit the shape of the recording guidance frame 241 and the prompt information 242 and the specific contents of the recording guidance frame 241 and the prompt information 242 .
如图3E所示,用户界面250包括:录制效果预览框251、提示信息252、确定控件253、返回控件254。录制效果预览框251用于显示电子设备当前拍摄的照片或视频。返回控件254用于退出当前用户界面250,返回到上一级用户界面,例如,用户界面240。确定控件253用于获取电子设备拍摄的照片或视频。当电子设备检测到确定控件253的触控操作时,响应于该触控操作,电子设备获取到包含目标对象的图像,图像中包括目标对象所在的区域和背景所在的区域,目标对象的姿态为姿态图P1中显示的姿态。As shown in Figure 3E, the user interface 250 includes: recording effect preview box 251, prompt information 252, confirmation control 253, and return control 254. The recording effect preview frame 251 is used to display photos or videos currently taken by the electronic device. The return control 254 is used to exit the current user interface 250 and return to the upper level user interface, for example, the user interface 240. The determination control 253 is used to obtain photos or videos taken by the electronic device. When the electronic device detects the touch operation of the determination control 253, in response to the touch operation, the electronic device obtains an image containing the target object. The image includes the area where the target object is located and the area where the background is located. The posture of the target object is The pose shown in P1.
可以理解的是,上述录制效果预览框251和提示信息252仅为示例,本申请实施例对录制效果预览框251和提示信息252的形状和录制效果预览框251和提示信息252中的具体内容不作限定。It can be understood that the above-mentioned recording effect preview box 251 and prompt information 252 are only examples, and the embodiment of the present application does not make any reference to the shapes of the recording effect preview box 251 and prompt information 252 or the specific contents of the recording effect preview box 251 and prompt information 252. limited.
在一些可选的实施例中,电子设备可在同一个用户界面显示录制指导框241和录制效果预览框251。示例性地,当电子设备检测到作用于确定控件214b的触控操作时,响应于该触控操作,电子设备显示用户界面260。如图3F所示,用户界面260包括:录制指导框261、 录制效果预览框262、提示信息263、返回控件264、确定控件265。其中:In some optional embodiments, the electronic device can display the recording guidance box 241 and the recording effect preview box 251 on the same user interface. For example, when the electronic device detects a touch operation acting on the determination control 214b, in response to the touch operation, the electronic device displays the user interface 260. As shown in Figure 3F, the user interface 260 includes: a recording guidance box 261, Recording effect preview box 262, prompt information 263, return control 264, and confirmation control 265. in:
录制指导框261的相关描述参见上述录制指导框241的相关描述,在此不再赘述。For the relevant description of the recording guidance box 261, please refer to the relevant description of the above-mentioned recording guidance box 241, which will not be described again here.
录制效果预览框262的相关描述参见上述录制效果预览框251的相关描述,在此不再赘述。提示信息263的相关描述参见上述提示信息252的相关描述,在此不再赘述。返回控件264用于返回到上一级用户界面,确定控件265用于获取电子设备拍摄的照片或视频。For the relevant description of the recording effect preview box 262, please refer to the relevant description of the above-mentioned recording effect preview box 251, which will not be described again here. For the relevant description of the prompt information 263, please refer to the relevant description of the above-mentioned prompt information 252, which will not be described again here. The return control 264 is used to return to the previous level user interface, and the determination control 265 is used to obtain photos or videos taken by the electronic device.
S202,电子设备获取图像T1,图像T1中的目标对象的姿态为姿态图P1中指示的姿态G1。S202. The electronic device acquires the image T1. The posture of the target object in the image T1 is the posture G1 indicated in the posture map P1.
具体地,图像T1是目标对象按照姿态图P1中的姿态G1进行拍摄的图像,该图像T1中包括目标对象的姿态为姿态图P1中指示的姿态G1。例如,图像T1可以是上述图3F实施例中的录制效果预览框251中的图像。Specifically, the image T1 is an image of the target object photographed according to the posture G1 in the posture diagram P1, and the posture of the target object included in the image T1 is the posture G1 indicated in the posture diagram P1. For example, the image T1 may be the image in the recording effect preview box 251 in the above-mentioned embodiment of FIG. 3F.
可以理解的是,图像T1还可以是视频帧中的一帧图像,目标对象按照姿态图P1中指示的姿态G1录制视频,电子设备获取到该视频,截取视频中的一帧图像作为图像T1。It can be understood that the image T1 can also be an image in a video frame. The target object records a video according to the posture G1 indicated in the posture diagram P1. The electronic device obtains the video and intercepts a frame of image in the video as the image T1.
在一些可选的实施例中,图像T1可以是包含多张图像的图像集合。例如,电子设备一次可以拍摄多张图像,在录制的视频中截取多帧图像,该多帧图像中的一帧图像可作为图像T1。In some optional embodiments, the image T1 may be an image set including multiple images. For example, the electronic device can capture multiple images at one time, intercept multiple frames of images from the recorded video, and one frame of the multiple frames of images can be used as the image T1.
S203,电子设备将图像T1输入图像分割全量模型,得到分割结果1,和将图像T1输入图像分割轻量模型M1,得到分割结果2。S203. The electronic device inputs the image T1 into the full image segmentation model to obtain segmentation result 1, and inputs the image T1 into the image segmentation lightweight model M1 to obtain segmentation result 2.
其中,图像全量模型是预先训练好的图像分割准确性高的机器学习模型。也就是说,图像分割全量模型是基于初始图像分割全量模型训练收敛后得到的模型。图像分割轻量模型M1是基于初始图像分割轻量模型训练得到的模型。可以理解的是,图像分割全量模型和初始图像分割轻量模型可以在电子设备上预先训练得到的,也可以是电子设备预先配置的,本申请对此不作限定。Among them, the full image model is a pre-trained machine learning model with high image segmentation accuracy. In other words, the full image segmentation model is a model obtained after training and convergence based on the initial full image segmentation model. The image segmentation lightweight model M1 is a model trained based on the initial image segmentation lightweight model. It can be understood that the full image segmentation model and the initial image segmentation lightweight model can be pre-trained on the electronic device, or can be pre-configured by the electronic device, which is not limited in this application.
在一些实施例中,初始图像分割轻量模型是基于初始图像分割全量模型经过裁剪、量化得到的,初始图像分割全量模型中模型参数的数量大于初始图像分割轻量模型中模型参数的数量。其中,图像分割全量模型用于对待训练的图像轻量模型进行指导训练,得到目标图像分割轻量模型,使得目标图像分割轻量模型的性能接近于图像分割全量模型的性能,即目标图像分割轻量模型的分割效果与图像分割全量模型的分割效果接近或者一致。In some embodiments, the initial image segmentation lightweight model is obtained by cropping and quantizing the initial image segmentation full model, and the number of model parameters in the initial image segmentation full model is greater than the number of model parameters in the initial image segmentation lightweight model. Among them, the image segmentation full model is used to guide the training of the image lightweight model to be trained to obtain the target image segmentation lightweight model, so that the performance of the target image segmentation lightweight model is close to the performance of the image segmentation full model, that is, the target image segmentation lightweight model is The segmentation effect of the quantitative model is close to or consistent with the segmentation effect of the full-quantity model for image segmentation.
具体地,分割结果用于指示被分割图像中目标对象所在的区域和背景内容所在的区域,分割结果中可以包括被分割图像中像素点的像素信息。也就是说,分割结果1用于指示图像T1中目标对象所在的区域和背景内容所在的区域,分割结果1中指示的目标对象所在的区域和背景内容所在的区域是经过图像分割全量模型分割后的,分割结果1中包括被图像T1中像素点的像素信息1。分割结果2用于指示图像T1中目标对象所在的区域和背景内容所在的区域,分割结果2中指示的目标对象所在的区域和背景内容所在的区域是经过图像分割轻量模型1分割后的,分割结果2中包括被图像T1中像素点的像素信息2。Specifically, the segmentation result is used to indicate the area where the target object is located and the area where the background content is located in the segmented image, and the segmentation result may include pixel information of the pixels in the segmented image. That is to say, the segmentation result 1 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the target object is located and the area where the background content is located indicated in the segmentation result 1 is after segmentation by the full image segmentation model. , the segmentation result 1 includes the pixel information 1 of the pixels in the image T1. Segmentation result 2 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. The area where the target object is located and the area where the background content is located indicated in the segmentation result 2 are segmented by the image segmentation lightweight model 1. The segmentation result 2 includes pixel information 2 of the pixels in the image T1.
可以理解的是,不同的图像分割轻量模型对同一个图像的分割结果不同,也就是说,像素信息1和像素信息2不同,即分割结果1和分割结果2中包括的图像T1中像素点的像素信息不同。It can be understood that different image segmentation lightweight models have different segmentation results for the same image, that is to say, pixel information 1 and pixel information 2 are different, that is, the pixel points in image T1 included in segmentation results 1 and 2 The pixel information is different.
在一些实施例中,像素点的像素信息可以是像素点为前景像素点的概率值。具体地,分割结果可以是图像T1的预测前景概率结果,该预测前景概率结果包括当图像T1中各个像素点为前景像素点的概率值,其中,概率值为0到1之间的实数。例如,当分割结果中前景区 域中像素点为前景像素点的概率值为1,背景区域中像素点为前景像素点的概率值为0。In some embodiments, the pixel information of a pixel may be a probability value that the pixel is a foreground pixel. Specifically, the segmentation result may be a predicted foreground probability result of the image T1. The predicted foreground probability result includes a probability value when each pixel in the image T1 is a foreground pixel, where the probability value is a real number between 0 and 1. For example, when the foreground area in the segmentation result The probability value of a pixel in the domain being a foreground pixel is 1, and the probability value of a pixel in the background area being a foreground pixel is 0.
在另一些实施例中,像素点的像素信息还可以是像素点的像素值,例如,RGB值、灰度值等。分割结果可以是图像T1对应的二值化图像,用于区分前景区域和背景区域,其中,前景区域中像素点的像素值为255,背景区域中像素点的像素值为0。或者,前景区域中的像素点的像素值可以为0,背景区域中的像素点的像素值可以为255。In other embodiments, the pixel information of the pixel point may also be the pixel value of the pixel point, for example, RGB value, grayscale value, etc. The segmentation result can be a binary image corresponding to the image T1, used to distinguish the foreground area and the background area, where the pixel value of the pixel in the foreground area is 255 and the pixel value of the pixel in the background area is 0. Alternatively, the pixel value of the pixels in the foreground area may be 0, and the pixel value of the pixels in the background area may be 255.
示例性地,如图4所示,图4示例性示出了分割结果,其中,黑色部分为背景区域,其包含的像素点的像素值为255,白色部分为前景区域,即目标对象所在的区域,其包含的像素点的像素值为0。Exemplarily, as shown in Figure 4, Figure 4 exemplarily shows the segmentation result, in which the black part is the background area, the pixel value of the pixels it contains is 255, and the white part is the foreground area, that is, where the target object is located. Area containing pixels with a pixel value of 0.
在另一些实施例中,像素点的像素信息还可以是像素点的前景标签,前景标签可以是数值,例如,为数值1或数值0。例如,当图像输入图像分割轻量模型后,预测像素点为前景像素点时,则给该像素点添加前景标签为1,当像素点为背景像素点时,则该为像素点添加标签为0等。In other embodiments, the pixel information of the pixel can also be the foreground label of the pixel, and the foreground label can be a numerical value, for example, a numerical value of 1 or a numerical value of 0. For example, when the image is input into the image segmentation lightweight model and the predicted pixel is a foreground pixel, a foreground label of 1 is added to the pixel; when the pixel is a background pixel, a label of 0 is added to the pixel. wait.
S204,电子设备基于分割结果1和分割结果2训练图像分割轻量模型M1,得到图像分割轻量模型M2。S204. The electronic device trains the image segmentation lightweight model M1 based on the segmentation result 1 and the segmentation result 2, and obtains the image segmentation lightweight model M2.
具体地,电子设备计算分割结果1和分割结果2之间的误差值,利用该误差值对图像分割轻量模型1进行训练,以对该图像分割轻量模型1的模型参数进行调整,得到图像分割轻量模型2。Specifically, the electronic device calculates the error value between the segmentation result 1 and the segmentation result 2, and uses the error value to train the image segmentation lightweight model 1 to adjust the model parameters of the image segmentation lightweight model 1 to obtain the image Split lightweight model 2.
具体地,图5示例性示出了图像分割轻量模型的训练过程。如图5所示,将图像T1分割输入图像分割全量模型和图像分割轻量模型M1,得到分割结果1和分割结果2。之后确定分割结果1和分割结果2之间的误差,利用误差对图像分割轻量模型M1的模型参数进行修正,得到训练修正后的轻量模型,即图像分割轻量模型M2。Specifically, Figure 5 exemplarily shows the training process of the image segmentation lightweight model. As shown in Figure 5, the image T1 is segmented and input into the image segmentation full model and the image segmentation lightweight model M1, and segmentation results 1 and 2 are obtained. Then the error between segmentation result 1 and segmentation result 2 is determined, and the error is used to correct the model parameters of the image segmentation lightweight model M1 to obtain the trained and corrected lightweight model, that is, the image segmentation lightweight model M2.
示例性地,图像分割全量模型、图像分割轻量模型可以是深度神经网络模型、卷积神经网络模型等等,本申请实施例对此不作限定。举例来说,图像分割全量模型可以是一个深度神经网络模型A1,图像分割轻量模型可以由深度神经网络模型A1裁剪得到的深度神经网络模型A2。深度神经网络模型A2的模型参数小于深度神经网络模型A1的模型参数。For example, the image segmentation full model and the image segmentation lightweight model may be a deep neural network model, a convolutional neural network model, etc., which are not limited in the embodiments of the present application. For example, the full image segmentation model can be a deep neural network model A1, and the image segmentation lightweight model can be a deep neural network model A2 obtained by cropping the deep neural network model A1. The model parameters of the deep neural network model A2 are smaller than the model parameters of the deep neural network model A1.
S205,电子设备将图像T1输入图像分割轻量模型M2,输出分割结果3。S205, the electronic device inputs the image T1 into the image segmentation lightweight model M2, and outputs the segmentation result 3.
具体地,电子设备在第一轮训练得到图像分割轻量模型M2之后,电子设备对图像分割轻量模型M2进行测试。即电子设备将图像T1输入到图像分割轻量模型M2,得到输出结果3。分割结果3用于指示图像T1中目标对象所在的区域和背景内容所在的区域,关于分割结果3指示图像T1中目标对象所在的区域和背景内容所在的区域的相关描述参见上述分割结果1和分割结果2中的相关描述,在此不再赘述。Specifically, after the electronic device obtains the image segmentation lightweight model M2 in the first round of training, the electronic device tests the image segmentation lightweight model M2. That is, the electronic device inputs the image T1 to the image segmentation lightweight model M2 and obtains the output result 3. Segmentation result 3 is used to indicate the area where the target object is located and the area where the background content is located in the image T1. For the relevant description of the segmentation result 3 indicating the area where the target object is located and the area where the background content is located in the image T1, please refer to the above segmentation result 1 and segmentation. The relevant descriptions in Result 2 will not be repeated here.
S206,电子设备判断分割结果1和分割结果3是否满足预设条件D2,若否,则执行步骤S207;若是,则执行步骤S209。S206: The electronic device determines whether segmentation result 1 and segmentation result 3 satisfy the preset condition D2. If not, step S207 is executed; if yes, step S209 is executed.
其中,预设条件D2为分割结果3中存在目标区域,即分割不佳的区域。也就是说,以分割结果1为标签,电子设备先计算分割结果1和分割结果3之间的误差,当分割结果3相对于分割结果1存在分割不佳的区域,即确定分割结果1和分割结果3之间的差值满足预设条件。Among them, the preset condition D2 is that there is a target area in the segmentation result 3, that is, a poorly segmented area. That is to say, with segmentation result 1 as the label, the electronic device first calculates the error between segmentation result 1 and segmentation result 3. When segmentation result 3 has a poorly segmented area relative to segmentation result 1, it determines segmentation result 1 and segmentation result 3. The difference between results 3 meets the preset conditions.
具体地,电子设备先计算分割结果1和分割结果3之间像素信息的差值,即,电子设备 计算分割结果1和分割结果3中同一个像素点的像素信息的差值,确定像素信息的差值大于第一阈值的像素点为第一目标像素点,然后将第一目标像素点与图像T2中目标对象的肢体区域进行匹配,判断目标对象的肢体区域中的第一目标像素点的数目是否大于第二阈值,若目标对象的肢体区域中的一个肢体区域中第一目标像素点的数目大于第二阈值,则该肢体区域为目标区域,即分割不佳的区域,电子设备则确定分割结果1和分割结果3满足预设条件D2。在本申请实施例中,目标区域中包含的一个或多个肢体可以被称为第三肢体。Specifically, the electronic device first calculates the difference in pixel information between segmentation result 1 and segmentation result 3, that is, the electronic device Calculate the difference between the pixel information of the same pixel in segmentation result 1 and segmentation result 3, determine the pixel whose pixel information difference is greater than the first threshold as the first target pixel, and then compare the first target pixel with image T2 Match the limb area of the target object, and determine whether the number of first target pixel points in the limb area of the target object is greater than the second threshold. If the number of first target pixel points in a limb area of the target object is greater than second threshold, the limb area is the target area, that is, a poorly segmented area, and the electronic device determines that segmentation result 1 and segmentation result 3 satisfy the preset condition D2. In this embodiment of the present application, one or more limbs included in the target area may be called a third limb.
在一些可选的实施例中,电子设备可以一次拍摄多张图像或者视频,将该多张图像输入图像分割全量模型和图像分割轻量模型,得到图像分割全量模型的多个分割结果和图像分割轻量模型的多个分割结果,电子设备判断图像分割全量模型的多个分割结果和图像分割轻量模型的多个分割结果的多个差值是否满足预设条件D2,在多个差值中满足预设条件D2的差值的数目大于预设数目阈值的情况下,则认为分割结果满足预设条件D3。In some optional embodiments, the electronic device can capture multiple images or videos at one time, input the multiple images into the full image segmentation model and the lightweight image segmentation model, and obtain multiple segmentation results of the full image segmentation model and the image segmentation results. Multiple segmentation results of the lightweight model, the electronic device determines whether the multiple differences between the multiple segmentation results of the full image segmentation model and the multiple segmentation results of the image segmentation lightweight model satisfy the preset condition D2, among the multiple differences When the number of differences satisfying the preset condition D2 is greater than the preset number threshold, the segmentation result is considered to satisfy the preset condition D3.
可以理解的是,在本申请实施例中,上述预设条件D2也可以被称为第一预设条件。It can be understood that in the embodiment of the present application, the above-mentioned preset condition D2 may also be called the first preset condition.
示例性地,下面结合图6,以分割结果为是图像T1中各个像素点为前景像素点的概率值为例,介绍本申请实施例中确定目标区域的具体过程。Illustratively, with reference to Figure 6, the specific process of determining the target area in the embodiment of the present application will be introduced below, taking the segmentation result as the probability value that each pixel in the image T1 is a foreground pixel as an example.
S2061,电子设备确定第一目标像素点,第一目标像素点为分割结果3中分割不佳的像素点。S2061. The electronic device determines the first target pixel, which is the poorly segmented pixel in segmentation result 3.
具体地,分割结果1中像素点i为前景的概率值为Y1i,分割结果3中像素点i为前景的概率值为Z1i,计算分割结果1和分割结果3中像素点i之间的差值的绝对值为Hi:Specifically, the probability value that pixel point i in segmentation result 1 is the foreground is Y1i, and the probability value that pixel point i in segmentation result 3 is the foreground is Z1i. Calculate the difference between pixel point i in segmentation result 1 and segmentation result 3. The absolute value of is Hi:
Hi=|Y1i–Z1i|Hi=|Y1i–Z1i|
当像素点i的差值的绝对值大于第一阈值,像素点i为分割不佳的像素点,即第一目标像素点。When the absolute value of the difference between pixel i is greater than the first threshold, pixel i is a poorly segmented pixel, that is, the first target pixel.
S2062,电子设备将图像T1输入肢体区域检测模型,得到图像T1中目标对象对应的肢体区域图,肢体区域图包括图像T1中目标对象的一个或多个肢体所在的区域。S2062. The electronic device inputs the image T1 into the limb area detection model to obtain a limb area map corresponding to the target object in the image T1. The limb area map includes the area where one or more limbs of the target object in the image T1 are located.
在一些实施例中,肢体区域检测模型也可以称为人体区域检测模型,肢体区域指肢体所在的区域,肢体可以包括头、颈、右肩、右大臂、右小臂、右手、左肩、左大臂、左小臂、左手、躯干、右胯、右大腿、右小腿、右脚、左胯、左大腿、左小腿、左脚等肢体。In some embodiments, the limb area detection model may also be called the human body area detection model. The limb area refers to the area where the limb is located. The limb may include the head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left Upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right foot, left hip, left thigh, left calf, left foot and other limbs.
可以理解的是,上述肢体的划分进行示例说明,在实际应用中,还可以有其他的划分方式,本申请对此不限定。It can be understood that the above-mentioned division of limbs is used as an example. In actual applications, other division methods are possible, and this application is not limited thereto.
S2063,电子设备将分割结果3与肢体区域图进行匹配,确定目标对象的多个肢体所在的区域中第一目标像素点数目。S2063: The electronic device matches the segmentation result 3 with the limb area map, and determines the number of first target pixels in the area where multiple limbs of the target object are located.
具体地,电子设备将分割结果3中的像素点与图像T1中目标对象的多个肢体所在的区域匹配,得到目标对象的多个肢体所在的区域中对应的第一目标像素点的数目。Specifically, the electronic device matches the pixel points in the segmentation result 3 with the areas where the multiple limbs of the target object are located in the image T1, and obtains the number of corresponding first target pixel points in the area where the multiple limbs of the target object are located.
S2064,电子设备基于目标对象的多个肢体所在的区域中第一目标像素点的数目,确定目标区域。S2064: The electronic device determines the target area based on the number of first target pixels in the area where the multiple limbs of the target object are located.
具体地,当多个肢体中的一个肢体所在的区域中的第一目标像素点的数量大于第二阈值时,则该肢体所在的区域为目标区域。例如,预设数值可以是1000,当左手所在的区域包含的目标像素点的数量大于1000,则认为左手所在区域为目标区域,即左手所在区域为分割不佳的区域。 Specifically, when the number of first target pixels in the area where one of the limbs is located is greater than the second threshold, then the area where the limb is located is the target area. For example, the preset value may be 1000. When the number of target pixels contained in the area where the left hand is located is greater than 1000, the area where the left hand is located is considered to be the target area, that is, the area where the left hand is located is a poorly segmented area.
S207,电子设备将图像T1中初始背景内容替换为预设背景内容,得到替换后的图像T2。S207: The electronic device replaces the initial background content in the image T1 with the preset background content to obtain the replaced image T2.
具体地,在分割结果1和分割结果3满足预设条件的情况下,电子设备确定图像分割轻量模型M2为目标图像分割轻量模型,目标图像分割轻量模型为用于后续的图像分割的模型。电子设备基于分割结果3,将将图像T1中初始背景内容替换为预设背景内容,得到替换后的图像T2。Specifically, when segmentation result 1 and segmentation result 3 satisfy the preset conditions, the electronic device determines that the image segmentation lightweight model M2 is the target image segmentation lightweight model, and the target image segmentation lightweight model is used for subsequent image segmentation. Model. Based on the segmentation result 3, the electronic device replaces the initial background content in the image T1 with the preset background content to obtain the replaced image T2.
其中,预设背景内容可以是目标对象预先设置好的背景内容,目标对象可以选择自己喜欢的图像背景内容作为预设背景内容,预设背景内容还可以是电子设备出厂设置的默认背景内容。Among them, the preset background content can be the background content preset by the target object. The target object can choose the image background content he likes as the preset background content. The preset background content can also be the default background content set by the factory of the electronic device.
在本申请实施例中,图像T2可以被称为第七图像。图像T1中初始背景内容可以被称为第三背景内容。In this embodiment of the present application, image T2 may be called the seventh image. The initial background content in image T1 may be called third background content.
S208,电子设备显示替换后的图像T2。S208, the electronic device displays the replaced image T2.
示例性地,如图7所示,电子设备可显示用户界面270,用户界面270包括:替换效果框271、提示信息272、返回控件273、确定控件274。其中,替换效果框271用于显示替换后的图像,提示信息272用于提示目标对象是否对图像分割轻量模型2的分割效果满意,返回控件273,确定控件274用于确定图像分割轻量模型2为目标图像分割轻量模型。当目标对象对替换效果满意时,目标对象可以点击确定控件274,这时,电子设备结束图像分割轻量模型M2的训练,图像分割轻量模型2为目标图像分割轻量模型,也就是说,图像分割轻量模型用于后续的图像分割。这样,可以避免多余的训练。当目标对象对替换效果不满意时,目标对象可以点击返回控件273,电子设备则继续下一轮训练。For example, as shown in FIG. 7 , the electronic device can display a user interface 270 , which includes: a replacement effect box 271 , prompt information 272 , a return control 273 , and a confirmation control 274 . Among them, the replacement effect box 271 is used to display the replaced image, the prompt information 272 is used to prompt the target object whether he is satisfied with the segmentation effect of the image segmentation lightweight model 2, the return control 273 is used, and the determination control 274 is used to determine the image segmentation lightweight model. 2 is a lightweight model for target image segmentation. When the target object is satisfied with the replacement effect, the target object can click the OK control 274. At this time, the electronic device ends the training of the image segmentation lightweight model M2, and the image segmentation lightweight model 2 is the target image segmentation lightweight model, that is, Image segmentation lightweight model is used for subsequent image segmentation. In this way, redundant training can be avoided. When the target subject is not satisfied with the replacement effect, the target subject can click return control 273, and the electronic device continues the next round of training.
在一些可能的实现方式中,当电子设备检测到作用于返回控件273的触控操作,响应于该触控操作,电子设备将第一阈值修改为第三阈值。其中,第三阈值小于第一阈值。然后,电子设备判断分割结果1和分割结果3是否满足第二预设条件,在分割结果1和分割结果3之间的差值不满足第二预设条件的情况下,基于分割结果1和分割结果3确定姿态图P,使用该姿态图P,继续训练图像分割轻量模型。其中,电子设备基于分割结果1和分割结果3确定姿态图P的相关描述参见下述实施例中的相关描述,此处先不赘述。In some possible implementations, when the electronic device detects a touch operation on the return control 273, in response to the touch operation, the electronic device modifies the first threshold to a third threshold. Wherein, the third threshold is smaller than the first threshold. Then, the electronic device determines whether the segmentation result 1 and the segmentation result 3 satisfy the second preset condition. If the difference between the segmentation result 1 and the segmentation result 3 does not satisfy the second preset condition, the electronic device determines whether the segmentation result 1 and the segmentation result 3 satisfy the second preset condition. Result 3 determines the pose map P, and uses this pose map P to continue training the image segmentation lightweight model. For the relevant description of the electronic device determining the posture image P based on the segmentation result 1 and the segmentation result 3, please refer to the relevant description in the following embodiments, and will not be described again here.
其中,第二预设条件为所述第一图像中第三肢体所在的区域中所述目标像素点的数量大于第二阈值;目标像素点为所述第一分割结果中像素点的像素信息和所述第二分割结果中像素点的像素信息的差值大于第三阈值的像素点。Wherein, the second preset condition is that the number of the target pixel points in the area where the third limb is located in the first image is greater than the second threshold; the target pixel point is the sum of the pixel information of the pixel points in the first segmentation result. The difference in pixel information of the pixels in the second segmentation result is greater than the third threshold.
示例性地,电子设备计算分割结果1和分割结果3中的像素点的像素信息,电子设备确定分割结果1中像素点的像素信息和分割结果3中像素点的像素信息的差值大于第三阈值的像素点,这些像素点为分割不佳的像素点。电子设备将这些分割不佳的像素点与图像T2中目标对象的肢体区域进行匹配,判断目标对象的肢体区域中的分割不佳的像素点的数目是个大于第二阈值,若目标对象的肢体区域中的一个肢体区域中分割不佳的像素点的数目大于第二阈值,则该肢体区域为分割不佳的区域,电子设备则确定分割结果1和分割结果3满足预设条件D3。For example, the electronic device calculates the pixel information of the pixels in the segmentation result 1 and the segmentation result 3, and the electronic device determines that the difference between the pixel information of the pixels in the segmentation result 1 and the pixel information in the segmentation result 3 is greater than the third Threshold pixels, these pixels are poorly segmented pixels. The electronic device matches these poorly segmented pixels with the limb area of the target object in the image T2, and determines whether the number of poorly segmented pixels in the limb area of the target object is greater than the second threshold. If the limb area of the target object If the number of poorly segmented pixels in a limb region is greater than the second threshold, then the limb region is a poorly segmented region, and the electronic device determines that segmentation results 1 and 3 satisfy the preset condition D3.
S209,电子设备基于分割结果1和分割结果3确定姿态图P2。S209: The electronic device determines the posture map P2 based on the segmentation result 1 and the segmentation result 3.
具体地,电子设备对比分割结果1和分割结果3,可以确定分割结果3中的分割不佳的区域,电子设备根据分割结果3中的分割不佳的区域,从代表姿态库中选出包含分割不佳区 域对应的肢体的姿态图。其中,电子设备可配置代表姿态库,代表姿态库中包括多个姿态图。代表姿态库可以是电子设备预先构建的,代表姿态库的构建具体参见后续图9实施例中的描述,在此不再赘述。Specifically, the electronic device compares the segmentation result 1 and the segmentation result 3 to determine the poorly segmented area in the segmentation result 3. The electronic device selects the segmented image from the representative posture library based on the poorly segmented area in the segmentation result 3. bad area The pose graph of the limb corresponding to the domain. Wherein, the electronic device can be configured with a representative posture library, and the representative posture library includes multiple posture images. The representative posture library may be pre-constructed by the electronic device. For details on the construction of the representative posture library, please refer to the subsequent description in the embodiment of FIG. 9 , which will not be described again here.
电子设备基于分割结果1和分割结果3确定姿态图P2的过程具体可以包括:The process of the electronic device determining the posture map P2 based on the segmentation result 1 and the segmentation result 3 may specifically include:
1,电子设备基于分割结果1和分割结果3确定目标区域。1. The electronic device determines the target area based on segmentation result 1 and segmentation result 3.
其中,电子设备基于分割结果1和分割结果3确定目标区域的相关操作具体可参见步骤S106中基于分割结果1和分割结果3确定目标区域的相关操作,在此不再赘述。For the relevant operations of the electronic device to determine the target area based on the segmentation result 1 and the segmentation result 3, please refer to the related operations of determining the target area based on the segmentation result 1 and the segmentation result 3 in step S106, which will not be described again here.
2,电子设备基于目标区域确定姿态图P2。2. The electronic device determines the posture map P2 based on the target area.
具体地,目标区域对应着目标对象的肢体,目标区域可以对应目标对象的一个或多个肢体。Specifically, the target area corresponds to the limbs of the target object, and the target area may correspond to one or more limbs of the target object.
当目标区域中包含目标对象的肢体为一个时,电子设备先从代表姿态库中确定出包含该肢体的一个或多个姿态图。当包含该肢体的姿态图为一个时,则该姿态图为姿态图P2。当包含该肢体的姿态图为多个时,电子设备从这多个姿态图中随机选取一个姿态图作为姿态图P2,或者,电子设备确定出多个姿态图中该肢体所在的区域面积最大的姿态图为姿态图P2。在一些实施例中,姿态图P2包含的肢体可以称为第一肢体。When there is one limb containing the target object in the target area, the electronic device first determines one or more posture diagrams containing the limb from the representative posture library. When there is only one pose graph containing the limb, the pose graph is pose graph P2. When there are multiple pose images containing the limb, the electronic device randomly selects one pose image from the multiple pose images as the pose image P2, or determines the one with the largest area where the limb is located among the multiple pose images. The posture diagram is posture diagram P2. In some embodiments, the limb included in the pose diagram P2 may be called the first limb.
当目标区域中包含目标对象的肢体为多个时,电子设备先从代表姿态库中确定出包含该多个肢体的一个或多个姿态图。同样地,当包含该多个肢体的姿态图为一个时,则该姿态图为姿态图P2。当包含该多个肢体的姿态图为多个时,电子设备从这多个姿态图中随机选取一个姿态图作为姿态图P2,或者,电子设备确定出多个姿态图中该多个肢体所在的区域面积最大的姿态图为姿态图P2。具体地,电子设备计算多个姿态图中该多个肢体所在区域中对应的目标像素点的个数,其中,该多个肢体所在区域中对应的目标像素点的个数相加最多的姿态图作为姿态图P2。在一些实施例中,姿态图P2包含的多个肢体可以称为第一肢体。When the target area contains multiple limbs of the target object, the electronic device first determines one or more posture diagrams containing the multiple limbs from the representative posture library. Similarly, when there is one pose graph including the multiple limbs, the pose graph is the pose graph P2. When there are multiple posture diagrams containing the multiple limbs, the electronic device randomly selects one posture diagram from the multiple posture diagrams as the posture diagram P2, or the electronic device determines the location of the multiple limbs in the multiple posture diagrams. The pose map with the largest area is pose map P2. Specifically, the electronic device calculates the number of target pixels corresponding to the areas where the multiple limbs are located in multiple posture images, wherein the attitude image has the largest number of corresponding target pixels in the area where the multiple limbs are located. As pose diagram P2. In some embodiments, the multiple limbs included in the posture diagram P2 may be called first limbs.
在本申请实施例中,姿态图P2可以被称为第二姿态图。In the embodiment of the present application, the posture graph P2 may be called the second posture graph.
S210,电子设备显示姿态图P2,姿态图P2指示姿态G2。S210, the electronic device displays the attitude map P2, and the attitude map P2 indicates the attitude G2.
具体地,电子设备在显示屏上显示姿态图P2,姿态图P2用于指示用户做出姿态G2。姿态图P2可以是上述图3A所示的姿态图。姿态G2可以是图2B实施例中的双手整理耳机的姿态。可以理解的是,上述姿态G2仅为举例说明,在实际应用中,姿态G2还可以是其他姿态,例如,举手、抱头等,本申请具体的姿态形式不作限定。Specifically, the electronic device displays the gesture image P2 on the display screen, and the gesture image P2 is used to instruct the user to make the gesture G2. The posture diagram P2 may be the posture diagram shown in FIG. 3A above. The posture G2 may be the posture of arranging the earphones with both hands in the embodiment of FIG. 2B. It can be understood that the above posture G2 is only an example. In practical applications, the posture G2 can also be other postures, such as raising hands, holding the head, etc. The specific posture form is not limited in this application.
在本申请实施例中,姿态G2也可以被称为第二姿态。In this embodiment of the present application, posture G2 may also be called the second posture.
S211,电子设备获取图像T3,图像T3中的目标对象的姿态为姿态图P2中指示的姿态G2。S211. The electronic device acquires the image T3. The posture of the target object in the image T3 is the posture G2 indicated in the posture diagram P2.
具体地,图像T3是目标对象按照姿态图P2中所指示的姿态拍摄的图像或者录制的视频中的一帧图像。图像T3中包含目标对象。Specifically, the image T3 is an image taken by the target object in the posture indicated in the posture diagram P2 or a frame of the video recorded. Image T3 contains the target object.
在本申请实施例中,图像T3可以被称为第五图像。In this embodiment of the present application, image T3 may be called the fifth image.
S212,按照上述步骤S203-S209,电子设备基于图像T3和图像分割全量模型训练图像分割轻量模型M2,直到满足模型训练的结束条件,得到目标图像分割轻量模型。S212, according to the above steps S203-S209, the electronic device trains the image segmentation lightweight model M2 based on the image T3 and the full image segmentation model until the end condition of the model training is met and the target image segmentation lightweight model is obtained.
具体地,按照上述步骤对图像分割轻量模型进行多轮迭代训练,每轮迭代训练是通过调 整本轮的初始图像分割轻量模型的模型参数,模型逐步收敛,已得到目标图像分割轻量模型。Specifically, multiple rounds of iterative training are performed on the image segmentation lightweight model according to the above steps. Each round of iterative training is performed by adjusting Through the model parameters of the initial image segmentation lightweight model in this round, the model gradually converges, and the target image segmentation lightweight model has been obtained.
模型训练的结束条件可以是图像分割轻量模型的迭代训练次数达到预设的迭代次数,也可以是调整参数后的图像分割轻量模型的图像分割处理性能的指标达到预设指标。例如,预设指标可以图像分割轻量模型的分割结果与图像分割全量模型的分割结果满足预设条件D2。The end condition of model training can be that the number of iterative training of the image segmentation lightweight model reaches the preset number of iterations, or it can be that the image segmentation processing performance index of the image segmentation lightweight model after adjusting parameters reaches the preset index. For example, the preset index may be that the segmentation result of the image segmentation lightweight model and the segmentation result of the image segmentation full model satisfy the preset condition D2.
在一些实现方式中,电子设备将图像T3分别输入图像分割全量模型和图像分割轻量模型2,得到分割结果4和分割结果5,电子设备使用分割结果4和分割结果5训练图像分割轻量模型M2,得到图像分割轻量模型M3。然后电子设备将图像T3输入图像分割轻量模型M3,得到分割结果6,电子设备判断分割结果4和分割结果6是否满足预设条件D2,当分割结果4和分割结果6满足预设条件D2的情况下,则图像分割轻量模型M3为目标图像分割轻量模型。电子设备可以基于分割结果6将图像T3中背景内容所在的区域中的初始背景内容替换为预设背景内容。In some implementations, the electronic device inputs the image T3 into the full image segmentation model and the image segmentation lightweight model 2 respectively to obtain segmentation results 4 and 5, and the electronic device uses the segmentation results 4 and 5 to train the image segmentation lightweight model. M2, and obtain the image segmentation lightweight model M3. Then the electronic device inputs the image T3 into the image segmentation lightweight model M3 to obtain the segmentation result 6. The electronic device determines whether the segmentation results 4 and 6 satisfy the preset condition D2. When the segmentation result 4 and the segmentation result 6 satisfy the preset condition D2, In this case, the image segmentation lightweight model M3 is the target image segmentation lightweight model. The electronic device may replace the initial background content in the area where the background content is located in the image T3 with the preset background content based on the segmentation result 6 .
在一些实现方式中,在所述分割结果4和分割结果6不满足预设条件D2的情况下,电子设备基于分割结果4和分割结果6确定分割不佳的区域,然后再根据分割不佳的区域从代表姿态库确定姿态图P3,姿态图P3用于指示用户做出姿态G3。其中,电子设备基于分割结果4和分割结果6确定姿态图P3的相关操作参见上述步骤S109中的相关操作,在此不再赘述。In some implementations, when the segmentation result 4 and the segmentation result 6 do not meet the preset condition D2, the electronic device determines the poorly segmented area based on the segmentation result 4 and the segmentation result 6, and then determines the poorly segmented area based on the segmentation result 4 and the segmentation result 6. The region determines the posture map P3 from the representative posture library, and the posture map P3 is used to instruct the user to make the posture G3. The relevant operations for the electronic device to determine the posture map P3 based on the segmentation result 4 and the segmentation result 6 refer to the relevant operations in the above-mentioned step S109, which will not be described again here.
电子设备在确定姿态图P3之后,可以显示姿态图P3。其中,电子设备显示姿态图P3的相关描述可以参见上述图3D-图3F图实施例中的相关描述,在此不再赘述。After determining the posture map P3, the electronic device can display the posture map P3. For the relevant description of the electronic device display posture diagram P3, please refer to the relevant description in the above-mentioned embodiment of FIGS. 3D to 3F, and will not be described again here.
电子设备获取图像T4,其中,图像T4是用户按照姿态图P3中的姿态G3拍摄图像或者视频帧中的一帧图像,图像T4中目标对象的姿态为姿态图P3中指示的姿态G3。The electronic device acquires an image T4, where the image T4 is an image taken by the user according to the posture G3 in the posture diagram P3 or an image in a video frame, and the posture of the target object in the image T4 is the posture G3 indicated in the posture diagram P3.
电子设备可以基于图像T4训练图像分割轻量模型M3,得到图像分割轻量模型M4。电子设备使用图像T4对图像分割轻量模型M4的分割效果进行测试,当图像分割轻量模型M4的分割结果满足预设条件D2的情况下,则图像分割轻量模型M4为目标图像分割轻量模型。如果图像分割轻量模型M4的分割结果不满足预设条件D2,则电子设备根据分割结果重新确定姿态图,获取图像训练图像分割轻量模型M4,直到图像分割轻量模型M4的分割结果满足预设条件D2。The electronic device can train the image segmentation lightweight model M3 based on the image T4 to obtain the image segmentation lightweight model M4. The electronic device uses image T4 to test the segmentation effect of the image segmentation lightweight model M4. When the segmentation result of the image segmentation lightweight model M4 meets the preset condition D2, the image segmentation lightweight model M4 is the target image segmentation lightweight model. Model. If the segmentation result of the image segmentation lightweight model M4 does not meet the preset condition D2, the electronic device re-determines the pose map based on the segmentation result, obtains the image to train the image segmentation lightweight model M4, until the segmentation result of the image segmentation lightweight model M4 meets the preset condition. Assume condition D2.
在本申请实施例中,分割结果4可以被称为第四分割结果,分割结果5也可以被称为第五分割结果,图像分割轻量模型M3可以被称为第三图像分割轻量模型,分割结果6也可以被称为第六分割结果。姿态图P3可以被称为第三姿态图,姿态G3可以被称为第三姿态。图像T4也可以被称为第六图像。In the embodiment of the present application, segmentation result 4 may be called the fourth segmentation result, segmentation result 5 may also be called the fifth segmentation result, and the image segmentation lightweight model M3 may be called the third image segmentation lightweight model. Segmentation result 6 may also be called the sixth segmentation result. The pose graph P3 may be called the third pose graph, and the pose G3 may be called the third pose. Image T4 may also be called the sixth image.
S213,电子设备获取原始图像T5,并将原始图像T5输入目标图像分割轻量模型,确定出原始图像中的目标对象所在的区域和原始背景内容。S213, the electronic device obtains the original image T5, inputs the original image T5 into the target image segmentation lightweight model, and determines the area where the target object is located and the original background content in the original image.
其中,原始图像可以是目标对象上传的图像或视频中的一帧图像,也可以是电子设备拍摄的包括目标对象的图像或视频中的一帧图像。The original image may be an image or a frame of a video uploaded by the target object, or it may be a frame of an image or video captured by the electronic device including the target object.
在本申请实施例中,原始图像T5也可以被称为第一图像,原始背景内容也可以被称为第一背景内容。In this embodiment of the present application, the original image T5 may also be called the first image, and the original background content may also be called the first background content.
S214,对原始图像T5中的原始背景内容区域的原始背景内容进行替换,得到替换后的图像T6。 S214: Replace the original background content of the original background content area in the original image T5 to obtain the replaced image T6.
具体地,电子设备将原始图像T5中原始背景内容所在的区域和目标对象所在的区域分割开来以后,将目标对象所在的区域和预设的背景合成新的图像,即替换后的图像T6。其中,替换后的图像中的背景内容与原始背景内容不同。预设的背景可以是目标对象自己设置的,也可以是电子设备默认设置的,例如可以是风景图像等。Specifically, after the electronic device separates the area where the original background content is located and the area where the target object is located in the original image T5, it synthesizes the area where the target object is located and the preset background into a new image, that is, the replaced image T6. Among them, the background content in the replaced image is different from the original background content. The preset background can be set by the target object itself, or it can be the default setting of the electronic device, for example, it can be a landscape image, etc.
在本申请实施例中,图像T6也可以被称为第四图像。In the embodiment of the present application, image T6 may also be called the fourth image.
示例性地,图8示出了电子设备用户使用图像分割轻量模型M1和目标图像分割轻量模型对原始图像进行分割,然后进行替换,得到替换后的图像的示意图。Exemplarily, FIG. 8 shows a schematic diagram in which an electronic device user uses the image segmentation lightweight model M1 and the target image segmentation lightweight model to segment the original image, and then performs replacement to obtain a replaced image.
如图8中(a)所示,当用户点击背景替换控件112,电子设备对视频图像帧811中的前景区域和背景区域进行分割,可以将前景区域和背景区域分割开来。如图8中(b)所示,经过图像分割轻量模型M1进行对视频图像帧811进行分割,可以得到分割结果7,为了更好的示意,图8中(b)所示分割结果7中前景区域和背景区域分别用不同颜色进行区分,其中,白色区域表示经过图像分割轻量模型M1进行分割后的前景区域,黑色区域表示经过图像分割轻量模型M1进行分割后的背景区域。图8中(b)所示的分割结果中,可以看到,图像分割轻量模型M1误将目标对象的边缘区域当作背景区域,即将图中的区域8111B和区域8111A中的头发凸起部分,当作区域811中的背景内容。如图8中(c)所示,再使用预设的背景内容对视频图像帧811中的背景内容进行替换后,得到替换后的视频图像帧821,其区域8111B和区域8111A中的头发凸起部分被当作背景内容被替换掉了,在替换后的视频图像帧821中的区域8211A和8211B中不存在头发凸起部分。As shown in (a) of Figure 8, when the user clicks the background replacement control 112, the electronic device divides the foreground area and the background area in the video image frame 811, and can separate the foreground area and the background area. As shown in (b) in Figure 8, the video image frame 811 is segmented through the image segmentation lightweight model M1, and the segmentation result 7 can be obtained. For a better illustration, the segmentation result 7 shown in (b) in Figure 8 is The foreground area and background area are distinguished by different colors. The white area represents the foreground area segmented by the image segmentation lightweight model M1, and the black area represents the background area segmented by the image segmentation lightweight model M1. In the segmentation result shown in (b) in Figure 8, it can be seen that the image segmentation lightweight model M1 mistakenly regards the edge area of the target object as the background area, that is, the hair bulge in area 8111B and area 8111A in the figure. , as the background content in area 811. As shown in (c) of Figure 8, after using the preset background content to replace the background content in the video image frame 811, the replaced video image frame 821 is obtained, with the hair in the area 8111B and the area 8111A bulging. The portion is replaced as background content, and there is no hair protruding portion in areas 8211A and 8211B in the replaced video image frame 821.
如图8中(d)所示,当用户点击背景替换控件812,电子设备对视频图像帧811中的前景区域和背景区域进行分割,可以将前景区域和背景区域分割开来。如图8中(e)所示,经过目标图像分割轻量模型进行对视频图像帧811进行分割,可以得到分割结果8。可以看出,目标图像分割轻量模型可以很好的将区域8111A和区域8111B中的目标对象的头发凸起部分与背景区域区分开来。因此,如图8中(f)所示,在对视频图像帧811进行替换后,得到视频图像帧821,视频图像帧821中区域8111B和区域8111A中的头发凸起部分保留了下来。As shown in (d) of Figure 8, when the user clicks the background replacement control 812, the electronic device divides the foreground area and the background area in the video image frame 811, and can separate the foreground area and the background area. As shown in (e) of Figure 8 , the video image frame 811 is segmented through the target image segmentation lightweight model, and a segmentation result 8 can be obtained. It can be seen that the target image segmentation lightweight model can well distinguish the hair bulge of the target object in area 8111A and area 8111B from the background area. Therefore, as shown in (f) of FIG. 8 , after replacing the video image frame 811, a video image frame 821 is obtained, and the hair bulges in the area 8111B and the area 8111A in the video image frame 821 are retained.
值得说明的是,在步骤S101之前,电子设备还可以构建代表姿态库。It is worth noting that before step S101, the electronic device can also build a representative posture library.
示例性地,如图9所示,电子设备构建代表姿态库可以包括如下步骤:For example, as shown in Figure 9, the electronic device may construct a representative gesture library including the following steps:
S301,电子设备获取图像数据集。S301, the electronic device obtains the image data set.
其中,图像数据集可以是预先收集的大量的目标对象的图像,也可以是预先收集的目标对象的视频数据中包含的图像帧。本申请实施例对此不作限定。The image data set may be a large number of pre-collected images of the target object, or may be image frames contained in pre-collected video data of the target object. The embodiments of the present application do not limit this.
在一个可选的实施例中,预设图像集可以为公开的网站爬取的或者大型公开的图像数据库中获取的。In an optional embodiment, the preset image set may be crawled from a public website or obtained from a large public image database.
图像数据集中包含用户的姿态特征和轮廓特征,姿态特征是指用户的动作行为,例如转头、身体转动、起身坐下等。用户的轮廓特征是指构成用户的外缘的线条。The image data set contains the user's posture features and contour features. Posture features refer to the user's action behaviors, such as turning the head, turning the body, standing up and sitting down, etc. The outline feature of a user refers to the lines that make up the outer edge of the user.
S302,电子设备将图像数据集输入人体姿态估计模型,得到图像数据集对应的多个人体姿态向量。S302. The electronic device inputs the image data set into the human posture estimation model and obtains multiple human posture vectors corresponding to the image data set.
具体地,人体姿态估计模型可以识别出图像中的人体的骨骼关键点,以及骨骼关键节点所构成的肢体向量。其中,人体的骨骼关键用于表示人体的骨骼信息,可以用来描述人体姿态。 Specifically, the human posture estimation model can identify the key points of the human body's bones in the image, as well as the limb vectors composed of the key nodes of the bones. Among them, the skeleton of the human body is mainly used to represent the skeletal information of the human body and can be used to describe the posture of the human body.
其中,骨骼关键点的数目和种类由人体姿态估计模型确定,不同的人体姿态估计模型输出的骨骼关键点数目和种类不同。在本申请实施例中,以将人体的骨骼关键点划分为15个骨骼关键点为例进行示例性说明,在实际应用中,人体的骨骼关键点还可以划分为9个、17个等,本申请对此不作限定。其中,15个骨骼关键点可以连接形成14个肢体向量,肢体向量可通上述15个骨骼关键点的坐标位置计算得出。Among them, the number and type of skeletal key points are determined by the human posture estimation model, and different human posture estimation models output different numbers and types of skeletal key points. In the embodiment of this application, the skeletal key points of the human body are divided into 15 skeletal key points as an example for illustrative explanation. In practical applications, the skeletal key points of the human body can also be divided into 9, 17, etc. There are no restrictions on this application. Among them, 15 skeletal key points can be connected to form 14 limb vectors, and the limb vectors can be calculated from the coordinate positions of the above 15 skeletal key points.
示例性地,图10示例性示出了一组骨骼关键点数据,图10中仅示出部分骨骼关键点和部分肢体向量。如图10所示,图中的圆形点为一个骨骼关键点,每一个骨骼关键点使用坐标(X,Y)表示,相邻的关键点连接形成一个肢体向量,一个图像中目标对象的对个肢体向量可称为一个姿态向量。例如,骨骼关键点3的坐标为(X3,Y3),骨骼关键点4的坐标为(X4,Y4),骨骼关键点3和骨骼关键点4可以连接形成一个肢体向量(X3-X4,Y3-Y4),该肢体向量表示一个肢体,可以称为左肩。Exemplarily, FIG. 10 illustrates a set of skeletal key point data, and only some skeletal key points and some limb vectors are shown in FIG. 10 . As shown in Figure 10, the circular point in the figure is a bone key point. Each bone key point is represented by coordinates (X, Y). The adjacent key points are connected to form a limb vector, a pair of target objects in the image. A limb vector can be called a posture vector. For example, the coordinates of bone key point 3 are (X3, Y3), and the coordinates of bone key point 4 are (X4, Y4). Bone key point 3 and bone key point 4 can be connected to form a limb vector (X3-X4, Y3- Y4), this limb vector represents a limb, which can be called the left shoulder.
S303,电子设备将图像数据集对应的多个人体姿态向量输入聚类模型,得到一个或多个代表姿态向量。S303. The electronic device inputs multiple human posture vectors corresponding to the image data set into the clustering model to obtain one or more representative posture vectors.
具体地,一个图像可以得到多个肢体向量,该图像的多个肢体向量可以形成一个姿态向量,图像数据集中包括多个图像,则可以得到多个姿态向量。电子设备将该多个姿态向量映射到一个向量空间,一个姿态向量为向量空间的一个点,然后计算各个像素点之间的相似度,相似度高的姿态向量便聚集在一起形成一个簇,选出这个簇中心(即聚类中心)的向量作为代表姿态向量。Specifically, multiple limb vectors can be obtained from an image, and multiple limb vectors of the image can form a posture vector. If the image data set includes multiple images, multiple posture vectors can be obtained. The electronic device maps the multiple posture vectors to a vector space, where a posture vector is a point in the vector space, and then calculates the similarity between each pixel point. The posture vectors with high similarity are gathered together to form a cluster. The vector out of the center of this cluster (i.e., the cluster center) is used as the representative attitude vector.
示例性地,图11示例性示出了电子设备聚类得到代表姿态向量的过程示意图。如图11中(a)示例性示出4个簇,每个簇中一个圆形点表示一个姿态向量,即表示一个人体的姿态,例如,可以是双手握耳机、单手握耳机等姿态。簇中的黑色的五角星表示簇的簇中心点,即聚类中心,选取每个簇的簇中心向量作为代表姿态向量。如图11中(b)所示,簇1的簇中心向量表示的姿态为单手握耳机的姿态;如图11中(c)所示,簇2的簇中心向量表示的姿态为双手握耳机的姿态。Exemplarily, FIG. 11 illustrates a schematic diagram of the process of clustering electronic devices to obtain representative posture vectors. As shown in (a) of Figure 11, 4 clusters are exemplarily shown. A circular point in each cluster represents a posture vector, that is, it represents the posture of a human body. For example, it can be a posture such as holding headphones with both hands or holding the headset with one hand. The black five-pointed star in the cluster represents the cluster center point of the cluster, that is, the cluster center. The cluster center vector of each cluster is selected as the representative attitude vector. As shown in (b) of Figure 11, the posture represented by the cluster center vector of cluster 1 is the posture of holding the headset with one hand; as shown in (c) of Figure 11, the posture represented by the cluster center vector of cluster 2 is the posture of holding the headset with both hands. posture.
S304,电子设备将一个或多个代表姿态向量输入人体轮廓检测模型,得到一个或多个代表姿态向量对应的轮廓图。S304. The electronic device inputs one or more representative posture vectors into the human body contour detection model, and obtains a contour map corresponding to one or more representative posture vectors.
示例性地,参见图12,图12示例性示出代表姿态向量得到轮廓图的示意图。图12中(a)和(c)示出了两个代表姿态向量,其表示的姿态分别为单手握耳机的姿态和双手握耳机的姿态。图12中(b)和(d)示出了基于两个代表姿态向量得到的轮廓图。Exemplarily, referring to FIG. 12 , FIG. 12 exemplarily shows a schematic diagram of obtaining a contour diagram representing a posture vector. Figure 12 (a) and (c) show two representative posture vectors, which represent the posture of holding the earphones with one hand and the posture of holding the earphones with both hands respectively. Figure 12 (b) and (d) show the contour images obtained based on two representative posture vectors.
S305,电子设备将训练数据集输入肢体区域检测模型,得到图像数据集对应的一个或多个肢体区域图。S305. The electronic device inputs the training data set into the limb region detection model to obtain one or more limb region maps corresponding to the image data set.
示例性地,如图13所示,图13示例性示出了肢体区域的示意图。如图13所示,图中不同的颜色区域表示不同的肢体区域,例如,深灰色表示头所在的区域、浅灰色表示左手所在的区域等。Exemplarily, as shown in FIG. 13 , FIG. 13 exemplarily shows a schematic diagram of a limb region. As shown in Figure 13, different color areas in the figure represent different limb areas. For example, dark gray represents the area where the head is located, light gray represents the area where the left hand is located, etc.
其中,肢体可包括头、颈、右肩、右大臂、右小臂、右手、左肩、左大臂、左小臂、左手、躯干、右胯、右大腿、右小腿、右脚、左胯、左大腿、左小腿、左脚。The limbs may include head, neck, right shoulder, right upper arm, right forearm, right hand, left shoulder, left upper arm, left forearm, left hand, torso, right hip, right thigh, right calf, right foot, left hip , left thigh, left calf, left foot.
S306,电子设备将图像数据集对应的一个或多个肢体区域图和一个或多个代表姿态向量 对应的轮廓图匹配,得到一个或多个姿态图。S306, the electronic device converts one or more limb region maps and one or more representative posture vectors corresponding to the image data set The corresponding contour images are matched to obtain one or more pose images.
其中,代表姿态库可以包括图像数据集对应的一个或多个姿态图。The representative posture library may include one or more posture images corresponding to the image data set.
需要说明的是,对于上述方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明所必须的。It should be noted that for the above method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the present invention is not limited by the described action sequence. Secondly, Those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions involved are not necessarily necessary for the present invention.
需要说明的是,上述实施例中所涉及的电子设备可以称为电子设备100,电子设备100可以包括手机、可折叠电子设备、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备、或智慧城市设备中的至少一种。本申请实施例对电子设备100的具体类型不作特殊限制。It should be noted that the electronic device involved in the above embodiments may be called an electronic device 100, and the electronic device 100 may include a mobile phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, and a notebook computer. , ultra-mobile personal computer (UMPC), netbook, cellular phone, personal digital assistant (PDA), augmented reality (AR) device, virtual reality (VR) device , at least one of artificial intelligence (AI) devices, wearable devices, vehicle-mounted devices, smart home devices, or smart city devices. The embodiment of the present application does not place any special restrictions on the specific type of the electronic device 100 .
接下来介绍本申请实施例中电子设备100的软件架构。Next, the software architecture of the electronic device 100 in the embodiment of the present application is introduced.
图14示出了本申请实施例提供的电子设备100的软件结构示意图。FIG. 14 shows a schematic software structure diagram of the electronic device 100 provided by the embodiment of the present application.
如图14所示,电子设备100的软件结构可以包括:背景替换引擎、背景替换客户端、采集模块、显示模块、代表姿态库。其中:As shown in Figure 14, the software structure of the electronic device 100 may include: a background replacement engine, a background replacement client, a collection module, a display module, and a representative posture library. in:
交互显示模块,用于接收用户操作、显示姿态图以及显示进行背景替换前的图像和进行背景替换后的图像。例如,交互显示模块可以接收第一操作,并基于第一操作显示姿态图P1,例如,可以显示上述图3A-图3F中的用户界面等。具体地,交互显示模块用于显示姿态图P1。又例如,交互显示模块可以显示第一图像、显示基于第一图像分割轻量模型进行背景替换的第二图像或者显示基于目标图像分割轻量模型进行背景替换的第四图像。The interactive display module is used to receive user operations, display posture images, and display images before and after background replacement. For example, the interactive display module may receive the first operation and display the gesture diagram P1 based on the first operation. For example, the user interface in the above-mentioned FIGS. 3A-3F may be displayed, etc. Specifically, the interactive display module is used to display the posture map P1. For another example, the interactive display module may display a first image, a second image with background replacement based on the first image segmentation lightweight model, or a fourth image with background replacement based on the target image segmentation lightweight model.
采集模块,用于获取目标对象的图像或视频。例如,可以获取步骤S201中的图像T1,其中,图像T1是目标对象按照姿态图P1中的姿态G1拍摄的图像,或者目标对象按照姿态图P1中的姿态G1拍摄的视频中的一帧图像。Acquisition module, used to obtain images or videos of target objects. For example, the image T1 in step S201 can be obtained, where the image T1 is an image taken by the target object according to the posture G1 in the posture diagram P1, or a frame image in the video taken by the target object according to the posture G1 in the posture diagram P1.
背景替换客户端,用于接收第一操作后,根据预设配置从代表姿态库中获取姿态图P1,并发送至交互显示模块进行显示。The background replacement client is used to obtain the posture image P1 from the representative posture library according to the preset configuration after receiving the first operation, and send it to the interactive display module for display.
用于接收采集模块发送的包括目标对象的图像,并发送给背景替换引擎。例如,包括目标对象的图像可以是上述步骤S202中的图像T1、步骤S211中的图像T2等。Used to receive the image including the target object sent by the acquisition module and send it to the background replacement engine. For example, the image including the target object may be the image T1 in step S202 described above, the image T2 in step S211, etc.
还用于接收背景替换引擎发送的图像分割全量模型的分割结果和图像分割轻量模型的分割结果,并对分割结果进行解析,基于得到下一轮训练所需要的姿态图信息。接着将姿态图信息发送给代表姿态库,以获得对应的姿态图,发送给显示模块进行显示。It is also used to receive the segmentation results of the full image segmentation model and the segmentation results of the image segmentation lightweight model sent by the background replacement engine, and analyze the segmentation results to obtain the pose map information required for the next round of training. Then the attitude map information is sent to the representative attitude database to obtain the corresponding attitude map, which is sent to the display module for display.
背景替换引擎,用于从背景替换客户端获取目标对象的图像,并使用该目标对象的图像和预置图像分割全量模型,训练图像分割轻量模型。具体地,首先,背景替换引擎接收背景替换客户端发送的目标对象的图像T1,背景替换引擎利用图像T1和预置的图像分割全量模型,训练图像分割轻量模型M1,得到图像分割轻量模型M2。接着,使用预置的图像分割全量模型和图像分割轻量模型M2对该目标对象的图像T1进行分割,得到分割结果1和分割结果3,然后判断分割结果1和分割结果3的差异是否满足预设条件D2,如果满足预设条件D2,则存储该图像分割轻量模型M2,用于后续的图像分割;如果不满足预设条件D2,则将分割结果1和分割结果3发送给背景替换客户端,以供背景替换客户端确定下一轮训练需要 在用户界面显示的姿态。The background replacement engine is used to obtain the image of the target object from the background replacement client, and use the image of the target object and the preset image segmentation full model to train the image segmentation lightweight model. Specifically, first, the background replacement engine receives the image T1 of the target object sent by the background replacement client. The background replacement engine uses the image T1 and the preset full image segmentation model to train the image segmentation lightweight model M1 to obtain the image segmentation lightweight model. M2. Then, use the preset image segmentation full model and image segmentation lightweight model M2 to segment the image T1 of the target object, obtain segmentation result 1 and segmentation result 3, and then determine whether the difference between segmentation result 1 and segmentation result 3 meets the predetermined Assume condition D2. If the preset condition D2 is met, the image segmentation lightweight model M2 is stored for subsequent image segmentation; if the preset condition D2 is not met, segmentation result 1 and segmentation result 3 are sent to the background replacement customer. client for background replacement to determine the next round of training needs The gesture displayed in the user interface.
代表姿态库,用于存储姿态图,接收背景替换客户端的请求指令,向背景替换客户端发送该请求指令对应的姿态图。Represents the posture library, which is used to store posture images, receive the request instructions from the background replacement client, and send the posture images corresponding to the request instructions to the background replacement client.
下面以图14实施例为例,详细说明电子设备100中的各个模块在本申请实施例中的协作关系,请参考图15,图15示例性示出了电子设备100中的各个模块在本申请实施例中的协作关系。如图15所示,电子设备100包括:背景替换引擎、背景替换客户端、采集模块、显示模块、代表姿态库。图15实施例中,以图像分割轻量模型训练的两轮训练得到目标图像分割轻量模型为例,下面具体描述:The following takes the embodiment of FIG. 14 as an example to describe in detail the cooperation relationship of each module in the electronic device 100 in this embodiment of the present application. Please refer to FIG. Collaborative relationships in embodiments. As shown in Figure 15, the electronic device 100 includes: a background replacement engine, a background replacement client, a collection module, a display module, and a representative posture library. In the embodiment of Figure 15, taking two rounds of image segmentation lightweight model training to obtain the target image segmentation lightweight model as an example, the following is a detailed description:
1.交互显示模块检测到第一操作。其中,第一操作可以是作用于图3A中确定控件214b的触控操作。1. The interactive display module detects the first operation. The first operation may be a touch operation acting on the determination control 214b in FIG. 3A.
2.交互显示模块向背景替换客户端发送背景替换指令。2. The interactive display module sends background replacement instructions to the background replacement client.
3.背景替换客户端响应于交互显示模块发送的背景替换指令,向代表姿态库发送请求获取姿态图的指令。3. The background replacement client responds to the background replacement instruction sent by the interactive display module and sends an instruction requesting to obtain the posture image to the representative posture library.
4.代表姿态库响应于背景替换客户端的请求获取姿态图的指令,代表姿态库向背景替换客户端发送姿态图P1。4. The representative posture library responds to the request of the background replacement client to obtain instructions for the posture map, and the representative posture library sends the posture map P1 to the background replacement client.
5.背景替换客户端接收代表姿态库发送的姿态图P1,并向交互显示模块发送姿态图P1。5. The background replacement client receives the posture picture P1 sent on behalf of the posture library, and sends the posture picture P1 to the interactive display module.
6.交互显示模块接收姿态图P1,并显示姿态图P1。其中,姿态图P1指示姿态G1。6. The interactive display module receives the posture map P1 and displays the posture map P1. Among them, the posture graph P1 indicates the posture G1.
7.采集模块获取图像T1,并将图像T1发送给背景替换客户端。7. The acquisition module obtains the image T1 and sends the image T1 to the background replacement client.
8.背景替换客户端接收图像T1,并发送给背景替换引擎。8. The background replacement client receives the image T1 and sends it to the background replacement engine.
9-11.使用图像T1和图像分割全量模型训练图像分割轻量模型M1,得到图像分割轻量模型M2。并将图像T1分别输入图像分割轻量模型M2和图像分割全量模型,得到分割结果1和分割结果3。然后将分割结果1和分割结果3发送给背景替换客户端。9-11. Use image T1 and the full image segmentation model to train the image segmentation lightweight model M1 to obtain the image segmentation lightweight model M2. The image T1 is input into the image segmentation lightweight model M2 and the image segmentation full model respectively, and segmentation result 1 and segmentation result 3 are obtained. Then segmentation result 1 and segmentation result 3 are sent to the background replacement client.
12.背景替换客户端判断分割结果1和分割结果3是否满足预设条件D2,在不满足预设D2条件的情况下,基于分割结果1和分割结果3确定姿态图P2。其中,姿态图P2用于指示姿态G2。12. The background replacement client determines whether segmentation result 1 and segmentation result 3 meet the preset condition D2. If the preset D2 condition is not met, the pose map P2 is determined based on segmentation result 1 and segmentation result 3. Among them, the posture map P2 is used to indicate the posture G2.
13.背景替换客户端想代表姿态库发送请求获取姿态图P2的指令。13. The background replacement client wants to send a request to obtain the pose image P2 on behalf of the pose library.
14.代表姿态库响应于请求获取姿态图P2的指令,向背景替换客户端发送姿态图P2。14. The representative posture library responds to the instruction to obtain the posture map P2 and sends the posture map P2 to the background replacement client.
15.背景替换客户端接收姿态图P2,并发送给交互显示模块。15. The background replacement client receives the posture image P2 and sends it to the interactive display module.
16.交互显示模块接收姿态图P2,并显示姿态图P2。16. The interactive display module receives the posture map P2 and displays the posture map P2.
17.采集模块获取图像T3,并将图像T3发送给背景替换客户端。其中,图像T2中目标对象的姿态为姿态图P3中的姿态G2。17. The acquisition module obtains image T3 and sends image T3 to the background replacement client. Among them, the posture of the target object in the image T2 is the posture G2 in the posture diagram P3.
18.背景替换客户端接收图像T3,并发送给背景替换引擎。18. The background replacement client receives image T3 and sends it to the background replacement engine.
19-21.使用图像T3和图像分割全量模型训练图像分割轻量模型M2,得到图像分割轻量模型M3。并将图像T3分别输入图像分割轻量模型M3和图像分割全量模型,得到分割结果4和分割结果6。然后将分割结果4和分割结果6发送给背景替换客户端。19-21. Use image T3 and the full image segmentation model to train the image segmentation lightweight model M2, and obtain the image segmentation lightweight model M3. The image T3 is input into the image segmentation lightweight model M3 and the image segmentation full model respectively, and segmentation results 4 and 6 are obtained. Then segmentation result 4 and segmentation result 6 are sent to the background replacement client.
22.背景替换客户端接收分割结果4和分割结果6,判断分割结果4和分割结果6是否满足预设条件点D2,在满足预设条件D2的情况下,确定M图像分割轻量模型M3为目标图像分割轻量模型。22. The background replacement client receives segmentation result 4 and segmentation result 6, determines whether segmentation result 4 and segmentation result 6 satisfy the preset condition point D2, and when the preset condition D2 is met, determines the M image segmentation lightweight model M3 as A lightweight model for target image segmentation.
23.采集模块获取原始图像T5,并将原始图像T5发送给背景替换客户端。23. The acquisition module obtains the original image T5 and sends the original image T5 to the background replacement client.
24.背景替换客户端接收原始图像T5后,将原始图像T5发送至背景替换引擎。 24. After receiving the original image T5, the background replacement client sends the original image T5 to the background replacement engine.
25-27.背景替换引擎接收原始图像T5,使用目标图像分割轻量模型对原始图像T5进行分割,得到前景区域和背景区域。然后,对原始图像T5中的背景区域使用预设背景进行替换,得到替换后的图像T6。并将替换后的图像T6发送至背景替换客户端。25-27. The background replacement engine receives the original image T5, uses the target image segmentation lightweight model to segment the original image T5, and obtains the foreground area and background area. Then, the background area in the original image T5 is replaced with the preset background to obtain the replaced image T6. And send the replaced image T6 to the background replacement client.
28.背景替换客户端接收替换后的图像T6,并发送至交互显示模块进行显示。28. The background replacement client receives the replaced image T6 and sends it to the interactive display module for display.
29.交互显示模块接收替换后的图像T6,并显示替换后的图像T6。29. The interactive display module receives the replaced image T6 and displays the replaced image T6.
值得说明的是,上述背景替换客户端、代表姿态库以及背景替换引擎可以部署在同一个电子设备上,还可以部署在不同的电子设备上,例如,背景替换客户端可以部署在一个电子设备,代表姿态库和背景替换引擎可以部署在另一个电子设备等,本申请对此不作限定。It is worth noting that the above-mentioned background replacement client, representative gesture library and background replacement engine can be deployed on the same electronic device, or on different electronic devices. For example, the background replacement client can be deployed on one electronic device, The representative posture library and background replacement engine can be deployed on another electronic device, etc., and this application does not limit this.
接下来介绍本申请实施例中提供的一种背景替换系统。Next, a background replacement system provided in the embodiment of this application is introduced.
图16示出了本申请实施例提供的背景替换系统的示意图。如图16所示,该背景替换系统包括电子设备200和服务器300。电子设备200和服务器300之间可存在通信连接,可实现二者之间的数据通信。其中,Figure 16 shows a schematic diagram of a background replacement system provided by an embodiment of the present application. As shown in FIG. 16 , the background replacement system includes an electronic device 200 and a server 300 . A communication connection may exist between the electronic device 200 and the server 300, enabling data communication between the two. in,
电子设备200用于获取第一图像,并将第一图像发送给所述服务器;The electronic device 200 is used to obtain the first image and send the first image to the server;
服务器300用于接收第一图像,并将第一图像输入第一图像分割轻量模型,确定出第一图像中的目标对象所在的区域和第一背景内容所在的区域;将第一图像中的所述第一背景内容替换为第二背景内容,得到的第二图像,并将第二图像发送给所述电子设备;The server 300 is configured to receive the first image, input the first image into the first image segmentation lightweight model, and determine the area where the target object is located and the area where the first background content is located in the first image; Replace the first background content with the second background content, obtain the second image, and send the second image to the electronic device;
电子设备200用于获取第二图像,并显示第二图像;响应于用户的背景替换训练操作,显示第一姿态图,第一姿态图用于指示用户做出第一姿态;The electronic device 200 is used to acquire a second image and display the second image; in response to the user's background replacement training operation, display a first posture map, and the first posture map is used to instruct the user to make the first posture;
电子设备200用于获取第三图像,并将第三图像发送至所述服务器,第三图像中用户的姿态为所述第一姿态;The electronic device 200 is configured to obtain a third image, and send the third image to the server, where the user's posture in the third image is the first posture;
服务器300用于获取第三图像,并基于第三图像训练第一图像分割轻量模型,得到目标图像分割轻量模型;将第一图像输入至目标图像分割轻量模型,确定出第一图像中的目标对象所在的区域和第一背景内容所在的区域;将第一图像中的所述第一背景内容替换为第二背景内容,得到的第四图像,并将第四图像发送给电子设备;The server 300 is used to obtain the third image, and train the first image segmentation lightweight model based on the third image to obtain the target image segmentation lightweight model; input the first image into the target image segmentation lightweight model to determine the target image segmentation lightweight model. The area where the target object is located and the area where the first background content is located; replace the first background content in the first image with the second background content, obtain a fourth image, and send the fourth image to the electronic device;
电子设备200用于接收第四图像,并显示第四图像。The electronic device 200 is configured to receive the fourth image and display the fourth image.
可选地,在一种可能的实现方式中,电子设备200还可以用于执行上述步骤S201、步骤S202、步骤S206、步骤S208、步骤S209、步骤S210、步骤S211中任一项可能实现的背景替换的方法,在此不再赘述。Optionally, in a possible implementation, the electronic device 200 can also be used to perform any one of the above steps S201, S202, S206, S208, S209, S210, and S211. The replacement method will not be described again here.
在一种可能的实现方式中,服务器300还可以用于执行上述步骤S203、步骤S205、步骤S207、步骤S213、步骤S214中任一项可能实现的背景替换的方法,在此不再赘述。In a possible implementation, the server 300 may also be used to perform the background replacement method that may be implemented in any one of the above steps S203, S205, S207, S213, and S214, which will not be described again here.
在一些可能的实现方式中,电子设备200还可以获取图像T3,将图像T3发送至服务器300,服务器300接收图像T3,使用图像T3和图像分割全量模型训练图像分割轻量模型M2,直到满足模型训练的结束条件,得到目标图像分割轻量模型。In some possible implementations, the electronic device 200 can also obtain the image T3, send the image T3 to the server 300, the server 300 receives the image T3, and uses the image T3 and the full image segmentation model to train the image segmentation lightweight model M2 until the model is satisfied. The end condition of training is to obtain a lightweight model for target image segmentation.
其中,电子设备200可以包括交互显示模块、采集模块、背景替换客户端;服务器300可以包括背景替换引擎和代表姿态库。The electronic device 200 may include an interactive display module, a collection module, and a background replacement client; the server 300 may include a background replacement engine and a representative gesture library.
可选地,在一种可能的实现方式中,交互显示模块301还可以用于执行上述图15实施例中步骤1-步骤2、步骤6、步骤16、步骤29中执行任一项可能实现的背景替换的方法。采集模块还可以用于执行上述图15实施例中步骤7、步骤17、步骤23中执行任一项可能实现的背景替换的方法,在此不再赘述。Optionally, in a possible implementation, the interactive display module 301 can also be used to perform any of the possible implementations of steps 1 to 2, step 6, step 16, and step 29 in the embodiment of FIG. 15 . Background replacement method. The collection module can also be used to perform any of the possible background replacement methods in steps 7, 17, and 23 in the embodiment of FIG. 15, which will not be described again here.
背景替换客户端还可以用于执行上述图15实施例中步骤3、步骤5、步骤8、步骤12、 步骤13、步骤15、步骤18、步骤22、步骤24、步骤28中执行任一项可能实现的背景替换的方法,在此不再赘述。The background replacement client can also be used to perform steps 3, 5, 8, 12, and The method of performing any possible background replacement in step 13, step 15, step 18, step 22, step 24, and step 28 will not be described again here.
在一种可能的实现方式中,背景替换引擎304还可以用于执行上述图15实施例中步骤9-步骤11、步骤19-步骤21中执行任一项可能实现背景替换的方法,在此不再赘述。In a possible implementation, the background replacement engine 304 can also be used to perform any of the steps 9 to 11 and 19 to 21 in the above embodiment of FIG. 15 to achieve background replacement. Again.
代表姿态库还可以用于执行上述图15实施例中步骤4、步骤14中执行任一项可能实现的背景替换的方法,在此不再赘述。The representative posture library can also be used to perform any possible background replacement method in steps 4 and 14 in the embodiment of FIG. 15 , which will not be described again here.
上文中结合图2A至图15详细描述了本申请实施例所提供的背景替换的方法,下面将结合图17A、图17B与图18,描述根据本申请实施例所提供的背景替换装置及电子设备。The background replacement method provided by the embodiment of the present application is described in detail above with reference to FIGS. 2A to 15 . Next, the background replacement device and electronic equipment provided by the embodiment of the present application will be described with reference to FIGS. 17A, 17B and 18 .
图17A是本申请实施例提供的一种背景替换装置的示意图,该背景替换装置400包括显示单元401、获取单元402,其中,Figure 17A is a schematic diagram of a background replacement device provided by an embodiment of the present application. The background replacement device 400 includes a display unit 401 and an acquisition unit 402, where,
显示单元401,用于显示对第一图像进行第一背景替换得到的第二图像;第一背景替换基于第一图像分割轻量模型进行;第一图像中包括目标对象所在的区域和第一背景内容所在的区域;第二图像为将第一图像中的第一背景内容替换为第二背景内容得到的;第二背景内容与第一背景内容不同;The display unit 401 is used to display the second image obtained by performing the first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and the first background. The area where the content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;
显示单元401,还用于响应于用户的背景替换训练操作,电子设备显示第一姿态图,第一姿态图用于指示用户做出第一姿态;The display unit 401 is also configured to, in response to the user's background replacement training operation, the electronic device display a first posture map, and the first posture map is used to instruct the user to make the first posture;
显示单元401,还用于所述电子设备获取第三图像,所述第三图像中用户的姿态为所第一姿态;The display unit 401 is also used by the electronic device to obtain a third image, where the user's posture in the third image is the first posture;
显示单元401,还用于电子设备显示对第一图像进行第二背景替换得到的第四图像;第二背景替换基于目标图像分割轻量模型进行,目标图像分割轻量模型是基于第三图像训练得到;第四图像为将第一图像中的第一背景内容替换为第二背景内容得到的。The display unit 401 is also used for the electronic device to display the fourth image obtained by performing the second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is trained based on the third image. Obtained; the fourth image is obtained by replacing the first background content in the first image with the second background content.
应理解的是,本申请实施例的背景替换装置400可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图2A至图13中所示的背景替换的方法时,背景替换装置400及其各个模块也可以为软件模块。It should be understood that the background replacement device 400 in the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The PLD can be a complex program. Logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. When the background replacement method shown in FIGS. 2A to 13 can also be implemented through software, the background replacement device 400 and its respective modules can also be software modules.
在一种可能的实现方式,图17B所示,背景替换装置还包括图像分割单元403、确定单元404、模型训练单元405、背景替换单元406。其中,In a possible implementation, as shown in FIG. 17B , the background replacement device also includes an image segmentation unit 403 , a determination unit 404 , a model training unit 405 , and a background replacement unit 406 . in,
图像分割单元403,用于将第三图像输入图像分割全量模型,得到第一分割结果,还用于将第三图像输入第一图像分割轻量模型,得到第二分割结果;图像分割全量模型中模型参数的数量大于第一图像分割轻量模型中模型参数的数量;第一分割结果、第二分割结果用于指示第一图像中的所述目标对象所在的区域和背景内容所在的区域。The image segmentation unit 403 is used to input the third image into the full image segmentation model to obtain the first segmentation result, and is also used to input the third image into the first image segmentation lightweight model to obtain the second segmentation result; in the full image segmentation model The number of model parameters is greater than the number of model parameters in the first image segmentation lightweight model; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the background content is located in the first image.
模型训练单元405,用于基于第一分割结果和第二分割结果训练第一图像分割轻量模型,得到第二图像分割轻量模型;The model training unit 405 is used to train the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain the second image segmentation lightweight model;
图像分割单元403,还用于将第三图像输入至第二图像分割轻量模型,得到第三分割结果;The image segmentation unit 403 is also used to input the third image to the second image segmentation lightweight model to obtain the third segmentation result;
确定单元404,在第一分割结果和第三分割结果不相同的情况下,电子设备基于第一分割结果和第三分割结果确定第二姿态图,第二姿态图用于指示用户做出第二姿态,第二姿态图中包括第一肢体,第一分割结果中第一肢体所在的区域与第三分割结果中的第一肢体所在 的区域不相同。Determining unit 404: When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result. The second posture map is used to instruct the user to make a second gesture. Posture, the second posture map includes the first limb, the area where the first limb is located in the first segmentation result and the area where the first limb is located in the third segmentation result The areas are different.
模型训练单元405,用于基于第五图像训练第二图像分割轻量模型,得到目标图像分割轻量模型,第五图像中的目标对象的姿态为第二姿态。The model training unit 405 is configured to train the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.
背景替换单元406,用于:第二分割结果将第三图像中背景内容所在的区域中的初始背景内容替换为预设背景内容,得到第七图像;The background replacement unit 406 is configured to: replace the initial background content in the area where the background content is located in the third image with the preset background content in the second segmentation result to obtain the seventh image;
具体的,上述背景替换装置400实现对背景替换的操作可参照上述方法实施例中电子设备的相关操作,在此不再具体描述。Specifically, the operation of the background replacement device 400 to implement background replacement may refer to the related operations of the electronic device in the above method embodiment, and will not be described in detail here.
在一些可选的实现方式中,上述显示单元401、获取单元402、图像分割单元403、确定单元404、模型训练单元405、背景替换单元406可对应于上述电子设备100,并可执行上述方法实施例中电子设备100执行的操作,在此不再赘述。In some optional implementations, the above display unit 401, acquisition unit 402, image segmentation unit 403, determination unit 404, model training unit 405, and background replacement unit 406 may correspond to the above electronic device 100, and may perform the above method implementation. The operations performed by the electronic device 100 in the example will not be described again here.
在一些可选的实现方式中,上述显示单元401、获取单元402、确定单元404、可对应于上述电子设备200,上述图像分割单元403、模型训练单元405、背景替换单元406可对应于上述服务器300。In some optional implementations, the above-mentioned display unit 401, acquisition unit 402, and determination unit 404 may correspond to the above-mentioned electronic device 200, and the above-mentioned image segmentation unit 403, model training unit 405, and background replacement unit 406 may correspond to the above-mentioned server. 300.
图18是本申请实施例提供的一种电子设备的结构示意图,该电子设备10包括:处理器11、通信接口12以及存储器13,处理器11、通信接口12以及存储器13通过总线14相互连接,其中,该处理器11用于执行该存储器13存储的指令。该存储器13存储程序代码,且处理器11可以调用存储器13中存储的程序代码执行以下操作:Figure 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device 10 includes: a processor 11, a communication interface 12 and a memory 13. The processor 11, the communication interface 12 and the memory 13 are connected to each other through a bus 14. The processor 11 is used to execute instructions stored in the memory 13 . The memory 13 stores program codes, and the processor 11 can call the program codes stored in the memory 13 to perform the following operations:
显示对第一图像进行第一背景替换得到的第二图像;第一背景替换基于第一图像分割轻量模型进行;所述第一图像中包括所述目标对象所在的区域和第一背景内容所在的区域;第二图像为将第一图像中的第一背景内容替换为第二背景内容得到的;第二背景内容与第一背景内容不同;Display the second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and where the first background content is located. area; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content is different from the first background content;
响应于用户的背景替换训练操作,显示第一姿态图,第一姿态图用于指示用户做出第一姿态;In response to the user's background replacement training operation, displaying a first posture diagram, where the first posture diagram is used to instruct the user to make the first posture;
获取第三图像,第三图像中用户的姿态为第一姿态;Obtain a third image, and the user's posture in the third image is the first posture;
显示对第一图像进行第二背景替换得到的第四图像;第二背景替换基于目标图像分割轻量模型进行,目标图像分割轻量模型是基于第三图像训练得到;第四图像为将第一图像中的第一背景内容替换为第二背景内容得到的。The fourth image obtained by performing the second background replacement on the first image is displayed; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is obtained based on the training of the third image; the fourth image is obtained by replacing the first image. Obtained by replacing the first background content in the image with the second background content.
在本申请实施例中处理器11可以有多种具体实现形式,例如处理器11可以为CPU、GPU、TPU或NPU等处理器中任意一种或多种的组合,处理器11还可以是单核处理器或多核处理器。处理器11可以由CPU(GPU、TPU或NPU)和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器11也可以单独采用内置处理逻辑的逻辑器件来实现,例如FPGA或数字信号处理器(digital signal processor,DSP)等。In the embodiment of the present application, the processor 11 can have a variety of specific implementation forms. For example, the processor 11 can be any one or a combination of CPU, GPU, TPU or NPU. The processor 11 can also be a single processor. core processor or multi-core processor. The processor 11 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip. The above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. The processor 11 can also be implemented solely using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).
通信接口12可以为有线接口或无线接口,用于与其他模块或设备进行通信,有线接口可以是以太接口、控制器局域网络(controller area network,CAN)接口或局域互联网络(local interconnect network,LIN)接口,无线接口可以是蜂窝网络接口或使用无线局域网接口等。The communication interface 12 can be a wired interface or a wireless interface for communicating with other modules or devices. The wired interface can be an Ethernet interface, a controller area network (controller area network, CAN) interface or a local interconnect network. LIN) interface, the wireless interface can be a cellular network interface or a wireless LAN interface, etc.
存储器13可以是非易失性存储器,例如,只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM, EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。存储器13也可以是易失性存储器,易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。The memory 13 may be a non-volatile memory, such as a read-only memory (ROM), a programmable ROM (PROM), an erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The memory 13 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), which is used as an external cache.
存储器13也可用于存储指令和数据。此外,电子设备10可能包含相比于图18展示的更多或者更少的组件,或者有不同的组件配置方式。Memory 13 may also be used to store instructions and data. Additionally, the electronic device 10 may include more or fewer components than shown in FIG. 18 , or may have components configured differently.
总线14可以分为地址总线、数据总线、控制总线等。为便于表示,图18中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 14 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 18, but it does not mean that there is only one bus or one type of bus.
可选地,该电子设备10还可以包括输入/输出接口15,输入/输出接口15连接有输入/输出设备,用于接收输入的信息,输出操作结果。Optionally, the electronic device 10 may also include an input/output interface 15 connected with an input/output device for receiving input information and outputting operation results.
在一些可能的实现方式中,本申请实施例的电子设备10可对应于上述实施例中的背景替换装置400,并可执行上述方法实施例中电子设备100执行的操作,在此不再赘述。In some possible implementations, the electronic device 10 in the embodiment of the present application may correspond to the background replacement device 400 in the above embodiment, and may perform the operations performed by the electronic device 100 in the above method embodiment, which will not be described again.
在一些可能的实现方式中,电子设备10可以是上述电子设备100,也可以上述电子设备200。In some possible implementations, the electronic device 10 may be the above-mentioned electronic device 100 or the above-mentioned electronic device 200.
图19是本申请实施例提供的一种电子设备的结构示意图,该电子设备20包括:处理器21、通信接口22以及存储器23,处理器21、通信接口22以及存储器23通过总线24相互连接,其中,该处理器21用于执行该存储器23存储的指令。该存储器23存储程序代码,且处理器21可以调用存储器23中存储的程序代码执行以下操作:Figure 19 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device 20 includes: a processor 21, a communication interface 22, and a memory 23. The processor 21, the communication interface 22, and the memory 23 are connected to each other through a bus 24. The processor 21 is used to execute instructions stored in the memory 23 . The memory 23 stores program codes, and the processor 21 can call the program codes stored in the memory 23 to perform the following operations:
接收第一图像,并将所述第一图像输入第一图像分割轻量模型,确定出第一图像中的目标对象所在的区域和第一背景内容所在的区域;将第一图像中的第一背景内容替换为第二背景内容,得到的第二图像,Receive the first image, input the first image into the first image segmentation lightweight model, and determine the area where the target object in the first image is located and the area where the first background content is located; The background content is replaced with the second background content, and the resulting second image is,
获取第三图像,并基于第三图像训练所述第一图像分割轻量模型,得到目标图像分割轻量模型;Obtain a third image, and train the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model;
将第一图像输入至目标图像分割轻量模型,确定出第一图像中的目标对象所在的区域和第一背景内容所在的区域;Input the first image to the target image segmentation lightweight model to determine the area where the target object is located and the area where the first background content is located in the first image;
将第一图像中的所述第一背景内容替换为所述第二背景内容,得到的第四图像。The fourth image is obtained by replacing the first background content in the first image with the second background content.
在本申请实施例中处理器21可以有多种具体实现形式,例如处理器21可以为CPU、GPU、TPU或NPU等处理器中任意一种或多种的组合,处理器21还可以是单核处理器或多核处理器。处理器21可以由CPU(GPU、TPU或NPU)和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器21也可以单独采用内置处理逻辑的逻辑器件来实现,例如FPGA或数字信号处理器(digital signal processor,DSP)等。In the embodiment of the present application, the processor 21 can have a variety of specific implementation forms. For example, the processor 21 can be any one or a combination of CPU, GPU, TPU or NPU. The processor 21 can also be a single processor. core processor or multi-core processor. The processor 21 may be a combination of a CPU (GPU, TPU or NPU) and a hardware chip. The above-mentioned hardware chip can be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD complex programmable logical device (CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. The processor 21 can also be implemented solely by using a logic device with built-in processing logic, such as an FPGA or a digital signal processor (DSP).
通信接口22可以为有线接口或无线接口,用于与其他模块或设备进行通信。有线接口可以是以太接口、控制器局域网络(controller area network,CAN)接口或局域互联网络(local interconnect network,LIN)接口,无线接口可以是蜂窝网络接口或使用无线局域网接口等。The communication interface 22 may be a wired interface or a wireless interface, used for communicating with other modules or devices. The wired interface can be an Ethernet interface, a controller area network (CAN) interface or a local interconnect network (LIN) interface, and the wireless interface can be a cellular network interface or use a wireless LAN interface, etc.
存储器23可以是非易失性存储器,例如,只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。存储器23也可以是易失性存储器,易失性存储器可以是随机存取存储器(random access memory,RAM), 其用作外部高速缓存。The memory 23 may be a non-volatile memory, such as read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), Electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The memory 23 may also be a volatile memory, and the volatile memory may be a random access memory (RAM), It is used as an external cache.
存储器23也可用于存储指令和数据。此外,电子设备20可能包含相比于图19展示的更多或者更少的组件,或者有不同的组件配置方式。Memory 23 may also be used to store instructions and data. In addition, the electronic device 20 may include more or fewer components than shown in FIG. 19 , or may have different component configurations.
总线24可以分为地址总线、数据总线、控制总线等。为便于表示,图19中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The bus 24 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 19, but it does not mean that there is only one bus or one type of bus.
可选地,该电子设备20还可以包括输入/输出接口25,输入/输出接口25连接有输入/输出设备,用于接收输入的信息,输出操作结果。Optionally, the electronic device 20 may also include an input/output interface 25. The input/output interface 25 is connected with an input/output device and is used for receiving input information and outputting operation results.
在一些可能的实现方式中,电子设备20可以是上述电子设备100,也可以是上述服务器300。In some possible implementations, the electronic device 20 may be the above-mentioned electronic device 100 or the above-mentioned server 300 .
本申请实施例还提供一种非瞬态计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当计算机程序在处理器上运行时,可以实现上述方法实施例中电子设备执行的方法步骤,所述计算机存储介质的处理器在执行上述方法步骤的具体实现可参照上述方法实施例中电子设备的具体操作,在此不再赘述。Embodiments of the present application also provide a non-transitory computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When the computer program is run on a processor, the execution of the electronic device in the above method embodiment can be realized. The specific implementation of the method steps when the processor of the computer storage medium performs the above method steps can refer to the specific operations of the electronic device in the above method embodiment, which will not be described again here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or a data center that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive (SSD).
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并或删减;本申请实施例装置中的模块可以根据实际需要进行划分、合并或删减。The steps in the methods of the embodiments of this application can be sequentially adjusted, merged or deleted according to actual needs; the modules in the devices of the embodiments of this application can be divided, merged or deleted according to actual needs.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。 The embodiments of the present application have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method and the core idea of the present application; at the same time, for Those of ordinary skill in the art will have changes in the specific implementation and application scope based on the ideas of the present application. In summary, the content of this description should not be understood as a limitation of the present application.

Claims (26)

  1. 一种背景替换的方法,其特征在于,所述方法包括:A method of background replacement, characterized in that the method includes:
    电子设备显示对第一图像进行第一背景替换得到的第二图像;所述第一背景替换基于所述第一图像分割轻量模型进行;所述第一图像中包括所述目标对象所在的区域和第一背景内容所在的区域;所述第二图像为将所述第一图像中的所述第一背景内容替换为第二背景内容得到的;所述第二背景内容与所述第一背景内容不同;The electronic device displays a second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the area where the target object is located and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content and the first background The content is different;
    响应于用户的背景替换训练操作,所述电子设备显示第一姿态图,所述第一姿态图用于指示用户做出第一姿态;In response to the user's background replacement training operation, the electronic device displays a first posture diagram, where the first posture diagram is used to instruct the user to make a first gesture;
    所述电子设备获取第三图像,所述第三图像中用户的姿态为所述第一姿态;The electronic device acquires a third image, and the posture of the user in the third image is the first posture;
    所述电子设备显示对所述第一图像进行第二背景替换得到的第四图像;所述第二背景替换基于所述目标图像分割轻量模型进行,所述目标图像分割轻量模型是基于所述第三图像训练得到;所述第四图像为将所述第一图像中的所述第一背景内容替换为所述第二背景内容得到的。The electronic device displays a fourth image obtained by performing a second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model is based on the target image segmentation lightweight model. The third image is obtained by training; the fourth image is obtained by replacing the first background content in the first image with the second background content.
  2. 根据权利要求1所述的方法,其特征在于,在所述电子设备显示对所述第一图像进行第二背景替换得到的第四图像之前,所述方法还包括:The method according to claim 1, characterized in that, before the electronic device displays the fourth image obtained by replacing the second background with the first image, the method further includes:
    所述电子设备将所述第三图像输入至图像分割全量模型,得到第一分割结果;The electronic device inputs the third image into the full image segmentation model to obtain a first segmentation result;
    所述电子设备将所述第三图像输入至第一图像分割轻量模型,得到第二分割结果;所述图像分割全量模型中模型参数的数量大于所述第一图像分割轻量模型中模型参数的数量;所述第一分割结果和所述第二分割结果用于指示所述第三图像中的所述目标对象所在的区域和第三背景内容所在的区域;The electronic device inputs the third image into the first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the full image segmentation model is greater than the model parameters in the first image segmentation lightweight model. The number of; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image;
    所述电子设备基于所述第一分割结果和所述第二分割结果训练所述第一图像分割轻量模型,得到第二图像分割轻量模型;The electronic device trains the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model;
    所述电子设备将所述第三图像输入至所述第二图像分割轻量模型,得到所述第三分割结果;The electronic device inputs the third image to the second image segmentation lightweight model to obtain the third segmentation result;
    在所述第一分割结果和所述第三分割结果不相同的情况下,所述电子设备基于所述第一分割结果和所述第三分割结果确定第二姿态图,所述第二姿态图用于指示用户做出第二姿态,所述第二姿态图中包括第一肢体,所述第一分割结果中所述第一肢体所在的区域与所述第三分割结果中的所述第一肢体所在的区域不相同;When the first segmentation result and the third segmentation result are different, the electronic device determines a second posture map based on the first segmentation result and the third segmentation result, and the second posture map Used to instruct the user to make a second gesture, the second gesture diagram includes the first limb, the area where the first limb is located in the first segmentation result and the first segment in the third segmentation result The limbs are located in different areas;
    所述电子设备基于第五图像训练所述第二图像分割轻量模型,得到所述目标图像分割轻量模型,所述第五图像中的所述目标对象的姿态为所述第二姿态。The electronic device trains the second image segmentation lightweight model based on a fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the second posture.
  3. 根据权利要求2所述的方法,其特征在于,所述电子设备基于第五图像训练所述第二图像分割轻量模型,得到所述目标图像分割轻量模型,包括:The method of claim 2, wherein the electronic device trains the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, including:
    所述电子设备获取所述第五图像;The electronic device acquires the fifth image;
    所述电子设备将所述第五图像输入至所述图像分割全量模型,得到第四分割结果;The electronic device inputs the fifth image into the full image segmentation model to obtain a fourth segmentation result;
    所述电子设备将所述第五图像输入至所述第二图像分割轻量模型,得到第五分割结果;所述第四分割结果和所述第五分割结果用于指示所述第五图像中的所述目标对象所在的区域和第四背景内容所在的区域;The electronic device inputs the fifth image into the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used to indicate the content of the fifth image. The area where the target object is located and the area where the fourth background content is located;
    所述电子设备基于所述第四分割结果和所述第五分割结果训练所述第一图像分割轻量模 型,得到第三图像分割轻量模型;The electronic device trains the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result. type to obtain the third image segmentation lightweight model;
    所述电子设备将所述第五图像输入至所述第三图像分割轻量模型,得到第六分割结果;The electronic device inputs the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;
    在所述第四分割结果和所述第六分割结果不满足第一预设条件的情况下,所述第三图像分割轻量模型为所述目标图像分割轻量模型。When the fourth segmentation result and the sixth segmentation result do not satisfy the first preset condition, the third image segmentation lightweight model is the target image segmentation lightweight model.
  4. 根据权利要求3所述的方法,其特征在于,所述电子设备将所述第五图像输入至所述第三图像分割轻量模型,得到第六分割结果之后,所述方法还包括:The method according to claim 3, characterized in that, after the electronic device inputs the fifth image to the third image segmentation lightweight model and obtains a sixth segmentation result, the method further includes:
    在所述第四分割结果和所述第六分割结果满足第一预设条件的情况下,所述电子设备基于所述第四分割结果和所述第六分割结果确定第三姿态图,所述第三姿态图用于指示用户做出第三姿态,所述第三姿态图中包括第二肢体,所述第四分割结果中所述第二肢体与所述第六分割结果中的所述第二肢体不相同;When the fourth segmentation result and the sixth segmentation result satisfy the first preset condition, the electronic device determines a third posture map based on the fourth segmentation result and the sixth segmentation result, the The third posture diagram is used to instruct the user to make a third posture. The third posture diagram includes a second limb. The second limb in the fourth segmentation result is the same as the third limb in the sixth segmentation result. The two limbs are not the same;
    所述电子设备基于第六图像训练所述第三图像分割轻量模型,得到所述目标图像分割轻量模型,所述第六图像中的目标对象的姿态为所述第三姿态。The electronic device trains the third image segmentation lightweight model based on the sixth image to obtain the target image segmentation lightweight model, and the posture of the target object in the sixth image is the third posture.
  5. 根据权利要求2-4中任一项所述的方法,其特征在于,所述在所述第一分割结果和所述第三分割结果不相同的情况下,所述电子设备基于所述第一分割结果和所述第三分割结果确定第二姿态图,包括:The method according to any one of claims 2 to 4, characterized in that, in the case where the first segmentation result and the third segmentation result are different, the electronic device is based on the first segmentation result. The segmentation result and the third segmentation result determine a second pose map, including:
    在所述第一分割结果和所述第三分割结果的差值满足所述第一预设条件的情况下,所述电子设备基于所述第一分割结果和所述第三分割结果的差值确定所述第一图像的目标区域;When the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, the electronic device determines the difference between the first segmentation result and the third segmentation result based on the difference between the first segmentation result and the third segmentation result. Determine the target area of the first image;
    所述电子设备基于所述目标区域确定所述第二姿态图。The electronic device determines the second pose map based on the target area.
  6. 根据权利要求5所述的方法,其特征在于,所述第一分割结果、所述第三分割结果包括所述第三图像中像素点的像素信息;The method of claim 5, wherein the first segmentation result and the third segmentation result include pixel information of pixels in the third image;
    所述在所述第一分割结果和所述第三分割结果的差值满足第一预设条件的情况下,所述电子设备基于所述第一分割结果和所述第三分割结果的差值确定所述第一图像的目标区域,包括:In the case where the difference between the first segmentation result and the third segmentation result satisfies a first preset condition, the electronic device based on the difference between the first segmentation result and the third segmentation result Determining the target area of the first image includes:
    所述电子设备基于所述第一分割结果中像素点的像素信息和所述第三分割结果中像素点的像素信息的差值,确定所述第三图像中第一目标像素点,所述第一目标像素点为所述第一分割结果与所述第三分割结果的中像素信息差值大于第一阈值的像素点;The electronic device determines the first target pixel in the third image based on the difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the third segmentation result. A target pixel is a pixel whose difference in pixel information between the first segmentation result and the third segmentation result is greater than a first threshold;
    所述电子设备确定所述第三图像中目标对象的一个或多个肢体所在的区域,所述一个或多个肢体中包括第三肢体;The electronic device determines a region in the third image where one or more limbs of the target object are located, and the one or more limbs include a third limb;
    在所述第三图像中所述第三肢体所在的区域中所述第一目标像素点的数量大于第二阈值的情况下,所述电子设备确定所述第三图像中所述第三肢体所在的区域为所述目标区域。When the number of the first target pixels in the area where the third limb is located in the third image is greater than a second threshold, the electronic device determines where the third limb is located in the third image. The area is the target area.
  7. 根据权利要求6所述的方法,其特征在于,所述电子设备基于所述目标区域确定第二姿态图,包括:The method of claim 6, wherein the electronic device determines the second gesture map based on the target area, including:
    所述电子设备确定出所述目标区域中包含的目标对象的所述第三肢体;The electronic device determines the third limb of the target object contained in the target area;
    所述电子设备确定出包含所述第三肢体的第二姿态图。The electronic device determines a second posture map including the third limb.
  8. 根据权利要求7所述的方法,其特征在于,所述电子设备确定出包含所述第三肢体的第二姿态图,包括: The method according to claim 7, wherein the electronic device determines the second posture map including the third limb, including:
    所述电子设备确定出包含所述第三肢体的多个姿态图;The electronic device determines multiple gesture images including the third limb;
    所述电子设备从所述多个姿态图中确定出所述第二姿态图,所述第二姿态图为所述多个姿态图中所述第三肢体所在的区域包含所述第一目标像素点最多的姿态图。The electronic device determines the second posture map from the plurality of posture maps, and the second posture map is a region where the third limb in the plurality of posture maps contains the first target pixel. The pose picture with the most points.
  9. 根据权利要求2所述的方法,其特征在于,The method according to claim 2, characterized in that:
    在所述第一分割结果和所述第三分割结果不满足第一预设条件的情况下,所述电子设备基于所述第三分割结果将所述第三图像中背景内容所在的区域中的第三背景内容替换为预设背景内容,得到第七图像;In the case where the first segmentation result and the third segmentation result do not meet the first preset condition, the electronic device segments the area in the third image where the background content is located based on the third segmentation result. The third background content is replaced with the preset background content to obtain the seventh image;
    所述电子设备显示所述第七图像、第一控件和第一提示信息,所述第一提示信息用于提示训练所述第二图像分割轻量模型。The electronic device displays the seventh image, the first control and first prompt information, where the first prompt information is used to prompt training of the second image segmentation lightweight model.
  10. 根据权利要求9所述的方法,其特征在于,所述电子设备显示所述第七图像、第一控件和第一提示信息之后,所述方法还包括:The method according to claim 9, characterized in that after the electronic device displays the seventh image, the first control and the first prompt information, the method further includes:
    所述电子设备检测到作用于所述第一控件的操作,所述电子设备确定第三阈值,所述第三阈值小于所述第一阈值;The electronic device detects an operation on the first control, and the electronic device determines a third threshold, and the third threshold is smaller than the first threshold;
    在所述第一分割结果与所述第三分割结果的差值满足第二预设条件的情况下,所述电子设备基于所述第一分割结果和所述第三分割结果确定第四姿态图,所述第四姿态图用于指示用户做出第四姿态;所述第二预设条件为:所述第一图像中第四肢体所在的区域中所述第二目标像素点的数目大于第二阈值;所述第二目标像素点为所述第一分割结果中像素点的像素信息和所述第二分割结果中像素点的像素信息的差值大于所述第三阈值的像素点;When the difference between the first segmentation result and the third segmentation result satisfies a second preset condition, the electronic device determines a fourth posture map based on the first segmentation result and the third segmentation result. , the fourth posture image is used to instruct the user to make a fourth posture; the second preset condition is: the number of the second target pixel points in the area where the fourth limb is located in the first image is greater than the Two thresholds; the second target pixel is a pixel whose difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the second segmentation result is greater than the third threshold;
    所述电子设备基于第八图像训练所述第二图像分割轻量模型,得到所述目标图像分割轻量模型,所述第八图像中的目标对象的姿态为所述第四姿态。The electronic device trains the second image segmentation lightweight model based on an eighth image to obtain the target image segmentation lightweight model, and the posture of the target object in the eighth image is the fourth posture.
  11. 根据权利要求1中所述的方法,其特征在于,所述响应于用户的背景替换训练操作,所述电子设备显示第一姿态图之前,所述方法还包括:The method according to claim 1, characterized in that, before the electronic device displays the first gesture image in response to the user's background replacement training operation, the method further includes:
    在所述电子设备检测到所述第一图像分割轻量模型的使用时长大于第一时长的情况下,所述电子设备显示第二提示信息和第二控件,所述第二提示信息用于提示训练所述第一图像分割轻量模型;所述背景替换训练操作为作用于所述第二控件的操作。When the electronic device detects that the usage time of the first image segmentation lightweight model is greater than the first time duration, the electronic device displays second prompt information and a second control, and the second prompt information is used to prompt The first image segmentation lightweight model is trained; the background replacement training operation is an operation acting on the second control.
  12. 一种背景替换的装置,其特征在于,包括:A background replacement device, characterized by including:
    显示单元,用于显示对第一图像进行第一背景替换得到的第二图像;所述第一背景替换基于所述第一图像分割轻量模型进行;所述第一图像中包括所述目标对象所在的区域和第一背景内容所在的区域;所述第二图像为将所述第一图像中的所述第一背景内容替换为第二背景内容得到的;所述第二背景内容与所述第一背景内容不同;A display unit configured to display a second image obtained by performing first background replacement on the first image; the first background replacement is performed based on the first image segmentation lightweight model; the first image includes the target object and the area where the first background content is located; the second image is obtained by replacing the first background content in the first image with the second background content; the second background content and the The first background content is different;
    所述显示单元,还用于响应于用户的背景替换训练操作,显示第一姿态图,所述第一姿态图用于指示用户做出第一姿态;The display unit is further configured to display a first gesture diagram in response to the user's background replacement training operation, where the first gesture diagram is used to instruct the user to make the first gesture;
    获取单元,用于获取第三图像,所述第三图像中用户的姿态为所述第一姿态;An acquisition unit, configured to acquire a third image, where the user's posture in the third image is the first posture;
    所述显示单元,还用于显示对所述第一图像进行第二背景替换得到的第四图像;所述第二背景替换基于所述目标图像分割轻量模型进行,所述目标图像分割轻量模型是基于第三图像训练得到;所述第四图像为将所述第一图像中的所述第一背景内容替换为所述第二背景内容得到的。 The display unit is also used to display a fourth image obtained by performing a second background replacement on the first image; the second background replacement is performed based on the target image segmentation lightweight model, and the target image segmentation lightweight model The model is trained based on the third image; the fourth image is obtained by replacing the first background content in the first image with the second background content.
  13. 根据权利要求12所述的装置,其特征在于,所述装置还包括:The device according to claim 12, characterized in that the device further includes:
    图像分割单元,用于将所述第三图像输入至图像分割全量模型,得到第一分割结果;An image segmentation unit, configured to input the third image to the full image segmentation model to obtain the first segmentation result;
    所述图像分割单元,还用于将所述第三图像输入至第一图像分割轻量模型,得到第二分割结果;所述图像分割全量模型中模型参数的数量大于所述第一图像分割轻量模型中模型参数的数量;所述第一分割结果和所述第二分割结果用于指示所述第三图像中的所述目标对象所在的区域和第三背景内容所在的区域;The image segmentation unit is also used to input the third image into the first image segmentation lightweight model to obtain a second segmentation result; the number of model parameters in the image segmentation full model is greater than the first image segmentation lightweight model. measuring the number of model parameters in the model; the first segmentation result and the second segmentation result are used to indicate the area where the target object is located and the area where the third background content is located in the third image;
    模型训练单元,用于基于所述第一分割结果和所述第二分割结果训练所述第一图像分割轻量模型,得到第二图像分割轻量模型;A model training unit configured to train the first image segmentation lightweight model based on the first segmentation result and the second segmentation result to obtain a second image segmentation lightweight model;
    所述图像分割单元,还用于将所述第三图像输入至所述第二图像分割轻量模型,得到所述第三分割结果;The image segmentation unit is also configured to input the third image to the second image segmentation lightweight model to obtain the third segmentation result;
    确定单元,用于在所述第一分割结果和所述第三分割结果不相同的情况下,基于所述第一分割结果和所述第三分割结果确定第二姿态图,所述第二姿态图用于指示用户做出第二姿态,所述第二姿态图中包括第一肢体,所述第一分割结果中所述第一肢体所在的区域与所述第三分割结果中的所述第一肢体所在的区域不相同;Determining unit, configured to determine a second pose map based on the first segmentation result and the third segmentation result when the first segmentation result and the third segmentation result are different, the second pose The image is used to instruct the user to make a second posture. The second posture image includes the first limb. The area where the first limb is located in the first segmentation result is the same as the area where the first limb is located in the third segmentation result. One limb is in a different area;
    所述模型训练单元,还用于基于第五图像训练所述第二图像分割轻量模型,得到所述目标图像分割轻量模型,所述第五图像中的所述目标对象的姿态为所述第二姿态。The model training unit is further configured to train the second image segmentation lightweight model based on the fifth image to obtain the target image segmentation lightweight model, and the posture of the target object in the fifth image is the Second posture.
  14. 根据权利要求13所述的装置,其特征在于,The device according to claim 13, characterized in that:
    所述获取单元,还用于获取所述第五图像;The acquisition unit is also used to acquire the fifth image;
    所述图像分割单元,还用于将所述第五图像输入至所述图像分割全量模型,得到第四分割结果;The image segmentation unit is also used to input the fifth image to the full image segmentation model to obtain a fourth segmentation result;
    所述图像分割单元,还用于将所述第五图像输入至所述第二图像分割轻量模型,得到第五分割结果;所述第四分割结果和所述第五分割结果用于指示所述第五图像中的所述目标对象所在的区域和第四背景内容所在的区域;The image segmentation unit is also configured to input the fifth image to the second image segmentation lightweight model to obtain a fifth segmentation result; the fourth segmentation result and the fifth segmentation result are used to indicate the The area where the target object is located and the area where the fourth background content is located in the fifth image;
    所述模型训练单元,还用于基于所述第四分割结果和所述第五分割结果训练所述第一图像分割轻量模型,得到第三图像分割轻量模型;The model training unit is further configured to train the first image segmentation lightweight model based on the fourth segmentation result and the fifth segmentation result to obtain a third image segmentation lightweight model;
    所述图像分割单元,还用于将所述第五图像输入至所述第三图像分割轻量模型,得到第六分割结果;The image segmentation unit is also configured to input the fifth image to the third image segmentation lightweight model to obtain a sixth segmentation result;
    所述确定单元,还用于在所述第四分割结果和所述第六分割结果不满足第一预设条件的情况下,所述第三图像分割轻量模型为所述目标图像分割轻量模型。The determination unit is also configured to, when the fourth segmentation result and the sixth segmentation result do not meet the first preset condition, the third image segmentation lightweight model is the target image segmentation lightweight model. Model.
  15. 根据权利要求14所述的装置,其特征在于,The device according to claim 14, characterized in that:
    所述确定单元,还用于在所述第四分割结果和所述第六分割结果满足第一预设条件的情况下,基于所述第四分割结果和所述第六分割结果确定第三姿态图,所述第三姿态图用于指示用户做出第三姿态,所述第三姿态图中包括第二肢体,所述第四分割结果中所述第二肢体与所述第六分割结果中的所述第二肢体不相同;The determining unit is further configured to determine a third posture based on the fourth segmentation result and the sixth segmentation result when the fourth segmentation result and the sixth segmentation result satisfy a first preset condition. The third posture diagram is used to instruct the user to make a third posture. The third posture diagram includes a second limb. The second limb in the fourth segmentation result is the same as the second limb in the sixth segmentation result. The second limb is different;
    所述模型训练单元,还用于基于第六图像训练所述第三图像分割轻量模型,得到所述目标图像分割轻量模型,所述第六图像中的目标对象的姿态为所述第三姿态。The model training unit is further configured to train the third image segmentation lightweight model based on the sixth image to obtain the target image segmentation lightweight model, and the posture of the target object in the sixth image is the third image segmentation lightweight model. attitude.
  16. 根据权利要求13-15中任一项所述的装置,其特征在于, The device according to any one of claims 13-15, characterized in that,
    所述确定单元,还用于:在所述第一分割结果和所述第三分割结果的差值满足所述第一预设条件的情况下,基于所述第一分割结果和所述第三分割结果的差值确定所述第一图像的目标区域;The determining unit is further configured to: when the difference between the first segmentation result and the third segmentation result satisfies the first preset condition, determine based on the first segmentation result and the third segmentation result. The difference between the segmentation results determines the target area of the first image;
    所述电子设备基于所述目标区域确定所述第二姿态图。The electronic device determines the second pose map based on the target area.
  17. 根据权利要求16所述的装置,其特征在于,所述第一分割结果、所述第三分割结果包括所述第三图像中像素点的像素信息;The device according to claim 16, wherein the first segmentation result and the third segmentation result include pixel information of pixels in the third image;
    所述确定单元,具体用于:基于所述第一分割结果中像素点的像素信息和所述第三分割结果中像素点的像素信息的差值,确定所述第三图像中第一目标像素点,所述第一目标像素点为所述第一分割结果与所述第三分割结果的中像素信息差值大于第一阈值的像素点;The determining unit is specifically configured to determine the first target pixel in the third image based on the difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the third segmentation result. point, the first target pixel point is a pixel point in which the difference in pixel information between the first segmentation result and the third segmentation result is greater than a first threshold;
    确定所述第三图像中目标对象的一个或多个肢体所在的区域,所述一个或多个肢体中包括第三肢体;Determine the area in the third image where one or more limbs of the target object are located, and the one or more limbs include a third limb;
    在所述第三图像中所述第三肢体所在的区域中所述第一目标像素点的数量大于第二阈值的情况下,确定所述第三图像中所述第三肢体所在的区域为所述目标区域。When the number of the first target pixel points in the area where the third limb is located in the third image is greater than the second threshold, it is determined that the area where the third limb is located in the third image is the area where the third limb is located. Describe the target area.
  18. 根据权利要求17所述的装置,其特征在于,The device according to claim 17, characterized in that:
    所述确定单元,具体用于:确定出所述目标区域中包含的目标对象的所述第三肢体;The determining unit is specifically configured to: determine the third limb of the target object contained in the target area;
    确定出包含所述第三肢体的第二姿态图。A second posture map including the third limb is determined.
  19. 根据权利要求18所述的装置,其特征在于,The device according to claim 18, characterized in that:
    所述确定单元,具体用于:确定出包含所述第三肢体的多个姿态图;The determining unit is specifically configured to: determine multiple posture images including the third limb;
    从所述多个姿态图中确定出所述第二姿态图,所述第二姿态图为所述多个姿态图中所述第三肢体所在的区域包含所述第一目标像素点最多的姿态图。The second posture map is determined from the plurality of posture maps, and the second posture map is the posture in which the area where the third limb is located in the plurality of posture maps contains the most first target pixels. picture.
  20. 根据权利要求13所述的装置,其特征在于,所述装置还包括替换单元,The device of claim 13, further comprising a replacement unit,
    所述替换单元,用于在所述第一分割结果和所述第三分割结果不满足第一预设条件的情况下,基于所述第三分割结果将所述第三图像中背景内容所在的区域中的第三背景内容替换为预设背景内容,得到第七图像;The replacement unit is configured to replace the location of the background content in the third image based on the third segmentation result when the first segmentation result and the third segmentation result do not meet the first preset condition. The third background content in the area is replaced with the preset background content to obtain the seventh image;
    所述显示单元,还用于:显示所述第七图像、第一控件和第一提示信息,所述第一提示信息用于提示训练所述第二图像分割轻量模型。The display unit is further configured to display the seventh image, the first control and first prompt information, where the first prompt information is used to prompt training of the second image segmentation lightweight model.
  21. 根据权利要求20所述的装置,其特征在于,The device according to claim 20, characterized in that:
    所述确定单元,还用于检测到作用于所述第一控件的操作,所述电子设备确定第三阈值,所述第三阈值小于所述第一阈值;The determining unit is further configured to detect an operation on the first control, and the electronic device determines a third threshold, where the third threshold is smaller than the first threshold;
    所述确定单元,还用于在所述第一分割结果与所述第三分割结果的差值满足第二预设条件的情况下,基于所述第一分割结果和所述第三分割结果确定第四姿态图,所述第四姿态图用于指示用户做出第四姿态;所述第二预设条件为:所述第一图像中第四肢体所在的区域中所述第二目标像素点的数目大于第二阈值;所述第二目标像素点为所述第一分割结果中像素点的像素信息和所述第二分割结果中像素点的像素信息的差值大于所述第三阈值的像素点;The determination unit is further configured to determine based on the first segmentation result and the third segmentation result when the difference between the first segmentation result and the third segmentation result satisfies a second preset condition. A fourth posture map, the fourth posture map is used to instruct the user to make a fourth posture; the second preset condition is: the second target pixel point in the area where the fourth limb is located in the first image The number of pixels is greater than the second threshold; the second target pixel is a pixel whose difference between the pixel information of the pixel in the first segmentation result and the pixel information of the pixel in the second segmentation result is greater than the third threshold. pixel;
    所述模型训练单元,还用于基于第八图像训练所述第二图像分割轻量模型,得到所述目标图像分割轻量模型,所述第八图像中的目标对象的姿态为所述第四姿态。 The model training unit is further configured to train the second image segmentation lightweight model based on the eighth image to obtain the target image segmentation lightweight model, and the posture of the target object in the eighth image is the fourth image segmentation model. attitude.
  22. 根据权利要求12所述的装置,其特征在于,The device according to claim 12, characterized in that:
    所述显示单元,还用于在所述电子设备检测到所述第一图像分割轻量模型的使用时长大于第一时长的情况下,显示第二提示信息和第二控件,所述第二提示信息用于提示训练所述第一图像分割轻量模型;所述背景替换训练操作为作用于所述第二控件的操作。The display unit is also configured to display second prompt information and a second control when the electronic device detects that the usage time of the first image segmentation lightweight model is longer than the first time length. The second prompt The information is used to prompt training of the first image segmentation lightweight model; the background replacement training operation is an operation acting on the second control.
  23. 一种背景替换系统,所述系统包括电子设备和服务器;其中,A background replacement system, the system includes electronic equipment and a server; wherein,
    所述电子设备用于获取第一图像,并将所述第一图像发送给所述服务器;The electronic device is used to obtain a first image and send the first image to the server;
    所述服务器用于接收所述第一图像,并将所述第一图像输入第一图像分割轻量模型,确定出所述第一图像中的目标对象所在的区域和第一背景内容所在的区域;The server is configured to receive the first image, input the first image into a first image segmentation lightweight model, and determine the area where the target object in the first image is located and the area where the first background content is located. ;
    所述服务器将所述第一图像中的所述第一背景内容替换为第二背景内容,得到的第二图像,并将所述第二图像发送给所述电子设备;The server replaces the first background content in the first image with a second background content to obtain a second image, and sends the second image to the electronic device;
    所述电子设备用于获取所述第二图像,并显示所述第二图像;The electronic device is used to acquire the second image and display the second image;
    所述电子设备用于响应于用户的背景替换训练操作,显示第一姿态图,所述第一姿态图用于指示用户做出第一姿态;The electronic device is configured to display a first gesture diagram in response to the user's background replacement training operation, and the first gesture diagram is used to instruct the user to make a first gesture;
    所述电子设备用于获取第三图像,并将所述第三图像发送至所述服务器,所述第三图像中用户的姿态为所述第一姿态;The electronic device is configured to obtain a third image, and send the third image to the server, where the user's posture in the third image is the first posture;
    所述服务器用于获取所述第三图像,并基于所述第三图像训练所述第一图像分割轻量模型,得到目标图像分割轻量模型;The server is configured to obtain the third image, and train the first image segmentation lightweight model based on the third image to obtain a target image segmentation lightweight model;
    所述服务器用于将所述第一图像输入至目标图像分割轻量模型,确定出所述第一图像中的目标对象所在的区域和第一背景内容所在的区域;The server is configured to input the first image into a target image segmentation lightweight model, and determine the area where the target object is located and the area where the first background content is located in the first image;
    所述服务器用于将所述第一图像中的所述第一背景内容替换为所述第二背景内容,得到的第四图像,并将所述第四图像发送给所述电子设备;The server is configured to replace the first background content in the first image with the second background content, obtain a fourth image, and send the fourth image to the electronic device;
    所述电子设备接收所述第四图像,并显示所述第四图像。The electronic device receives the fourth image and displays the fourth image.
  24. 一种电子设备,其特征在于,所述电子设备包括一个或多个处理器和一个或多个存储器;其中,所述一个或多个存储器与所述一个或多个处理器耦合,所述一个或多个存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述一个或多个处理器执行所述计算机指令时,使得所述电子设备执行如权利要求1-11任一项所述的方法。An electronic device, characterized in that the electronic device includes one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, and the one or a plurality of memories for storing computer program code, the computer program code including computer instructions, when the one or more processors execute the computer instructions, causing the electronic device to perform any one of claims 1-11 method described in the item.
  25. 一种芯片系统,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行如权利要求1-11中任一项所述的方法。A chip system, the chip system is applied to electronic equipment, the chip system includes one or more processors, the processor is used to call computer instructions to cause the electronic equipment to execute any one of claims 1-11 method described in the item.
  26. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-11任一项所述的方法。 A computer-readable storage medium includes instructions, characterized in that when the instructions are run on an electronic device, the electronic device is caused to perform the method according to any one of claims 1-11.
PCT/CN2023/079248 2022-03-18 2023-03-02 Background replacement method and electronic device WO2023174063A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210271308.4 2022-03-18
CN202210271308.4A CN116823869A (en) 2022-03-18 2022-03-18 Background replacement method and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023174063A1 true WO2023174063A1 (en) 2023-09-21

Family

ID=88022343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/079248 WO2023174063A1 (en) 2022-03-18 2023-03-02 Background replacement method and electronic device

Country Status (2)

Country Link
CN (1) CN116823869A (en)
WO (1) WO2023174063A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746305A (en) * 2024-02-21 2024-03-22 四川大学华西医院 Medical care operation training method and system based on automatic evaluation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200258236A1 (en) * 2017-10-24 2020-08-13 Hewlett-Packard Development Company, L.P. Person segmentations for background replacements
CN112150499A (en) * 2019-06-28 2020-12-29 华为技术有限公司 Image processing method and related device
CN112529073A (en) * 2020-12-07 2021-03-19 北京百度网讯科技有限公司 Model training method, attitude estimation method and apparatus, and electronic device
CN113160231A (en) * 2021-03-29 2021-07-23 深圳市优必选科技股份有限公司 Sample generation method, sample generation device and electronic equipment
CN113194254A (en) * 2021-04-28 2021-07-30 上海商汤智能科技有限公司 Image shooting method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200258236A1 (en) * 2017-10-24 2020-08-13 Hewlett-Packard Development Company, L.P. Person segmentations for background replacements
CN112150499A (en) * 2019-06-28 2020-12-29 华为技术有限公司 Image processing method and related device
CN112529073A (en) * 2020-12-07 2021-03-19 北京百度网讯科技有限公司 Model training method, attitude estimation method and apparatus, and electronic device
CN113160231A (en) * 2021-03-29 2021-07-23 深圳市优必选科技股份有限公司 Sample generation method, sample generation device and electronic equipment
CN113194254A (en) * 2021-04-28 2021-07-30 上海商汤智能科技有限公司 Image shooting method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746305A (en) * 2024-02-21 2024-03-22 四川大学华西医院 Medical care operation training method and system based on automatic evaluation
CN117746305B (en) * 2024-02-21 2024-04-19 四川大学华西医院 Medical care operation training method and system based on automatic evaluation

Also Published As

Publication number Publication date
CN116823869A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US11195283B2 (en) Video background substraction using depth
JP7058373B2 (en) Lesion detection and positioning methods, devices, devices, and storage media for medical images
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
US11282207B2 (en) Image processing method and apparatus, and storage medium
WO2019128508A1 (en) Method and apparatus for processing image, storage medium, and electronic device
US11481869B2 (en) Cross-domain image translation
CN108205655B (en) Key point prediction method and device, electronic equipment and storage medium
CN108765278B (en) Image processing method, mobile terminal and computer readable storage medium
WO2020103647A1 (en) Object key point positioning method and apparatus, image processing method and apparatus, and storage medium
JP2020522285A (en) System and method for whole body measurement extraction
US11563885B2 (en) Adaptive system for autonomous machine learning and control in wearable augmented reality and virtual reality visual aids
JP2020523711A (en) Method and apparatus for generating medical report
KR20200145827A (en) Facial feature extraction model learning method, facial feature extraction method, apparatus, device, and storage medium
CN111862124A (en) Image processing method, device, equipment and computer readable storage medium
WO2023174063A1 (en) Background replacement method and electronic device
CN112200041A (en) Video motion recognition method and device, storage medium and electronic equipment
CN111080746A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111814749A (en) Human body feature point screening method and device, electronic equipment and storage medium
US20210279928A1 (en) Method and apparatus for image processing
CN113221695A (en) Method for training skin color recognition model, method for recognizing skin color and related device
WO2024041108A1 (en) Image correction model training method and apparatus, image correction method and apparatus, and computer device
US11232616B2 (en) Methods and systems for performing editing operations on media
CN112036307A (en) Image processing method and device, electronic equipment and storage medium
US20230071291A1 (en) System and method for a precise semantic segmentation
US20240161382A1 (en) Texture completion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769569

Country of ref document: EP

Kind code of ref document: A1