CN116156092A - Background replacement method, device, computer equipment and storage medium - Google Patents

Background replacement method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116156092A
CN116156092A CN202310154433.1A CN202310154433A CN116156092A CN 116156092 A CN116156092 A CN 116156092A CN 202310154433 A CN202310154433 A CN 202310154433A CN 116156092 A CN116156092 A CN 116156092A
Authority
CN
China
Prior art keywords
background
attribute
image
sample
video session
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310154433.1A
Other languages
Chinese (zh)
Inventor
张驰
徐雪
杨洁琼
江文乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310154433.1A priority Critical patent/CN116156092A/en
Publication of CN116156092A publication Critical patent/CN116156092A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the field of communications technology, and provides a background replacement method, apparatus, computer device, storage medium and computer program product, which can be applied to the financial field or other related fields. The method and the device can improve the efficiency of background generation and background replacement. The method comprises the following steps: the method comprises the steps of obtaining an image of a first object in a video session, wherein the video session is the video session between the first object and a second object, identifying the image to obtain a first attribute feature of the first object, obtaining a second attribute feature of the first object, wherein the first attribute feature is used for representing object information of the first object in the image, the second attribute feature is used for representing basic information of the first object, carrying out feature fusion processing on the first attribute feature and the second attribute feature to obtain a fusion attribute feature, inputting the fusion attribute feature into a pre-trained background generation model to generate a target background, and replacing the video session background of the second object in the video session with the target background.

Description

Background replacement method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of communications technologies, and in particular, to a background replacement method, apparatus, computer device, storage medium, and computer program product.
Background
Banking sites are one of the media for interfacing banks with society, and need to process various businesses for objects (such as clients), and in order to facilitate business handling, a worker familiar with banking business is usually configured in the banking sites to guide the objects to complete corresponding businesses. To further facilitate business handling, video session (e.g., video review) services may be provided for objects that are inconvenient to reach the site of the website through agents (e.g., customer service personnel) of the website. In order to provide a more specialized and rigorous video session service, a background wall needs to be established in a remote seat center for background replacement, so that irrelevant backgrounds are prevented from entering a video.
In the prior art, the existing background wall is an entity background wall, a designer is required to design a background picture firstly, then the entity background wall is manufactured according to the background picture and is placed behind an agent, and background replacement is achieved.
Disclosure of Invention
Based on this, it is necessary to provide a background replacement method, an apparatus, a computer device, a computer readable storage medium and a computer program product in view of the above technical problems.
In a first aspect, the present application provides a background replacement method. The method comprises the following steps:
acquiring an image of a first object in a video session; the video session is a video session between a first object and a second object;
identifying the image to obtain a first attribute feature of the first object, and obtaining a second attribute feature of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object;
performing feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object;
inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background;
and replacing the video session background of the second object in the video session with the target background.
In one embodiment, the object information includes dressing information and mood information;
identifying the image to obtain a first attribute feature of the first object, including:
performing dressing dimension identification on the image to obtain dressing information of the first object in the image, and performing emotion dimension identification on the image to obtain emotion information of the first object in the image;
Performing feature recognition on the dressing information to obtain dressing features of the first object in the image, and performing feature recognition on the emotion information to obtain emotion features of the first object in the image;
a first attribute feature is determined based on the dressing feature and the emotional feature.
In one embodiment, obtaining the second attribute characteristic of the first object includes:
acquiring basic information of a first object;
mapping the basic information to a preset semantic space to obtain semantic features corresponding to the basic information;
and determining a second attribute characteristic according to the semantic characteristic.
In one embodiment, inputting the fusion attribute feature into a pre-trained background generation model to generate a target background includes:
inputting the fusion attribute characteristics into a background generation model, and predicting the first object based on the fusion attribute characteristics through the background generation model to obtain a prediction result of the first object; the prediction result is used for representing the expected style of the first object;
and generating a target background according to the prediction result through a background generation model.
In one embodiment, the pre-trained background generation model is trained by:
acquiring a first sample attribute characteristic and a second sample attribute characteristic of a sample object; the first sample attribute features are used for representing sample object information of the sample object in the sample image, and the second sample attribute features are used for representing sample basic information of the sample object;
Performing feature fusion processing on the first sample attribute features and the second sample attribute features to obtain sample fusion attribute features of the sample object;
inputting the sample fusion attribute characteristics into a background generation model to be trained to obtain a sample background;
training the background generating model to be trained according to satisfaction of the sample background and the sample object on the sample background to obtain a pre-trained background generating model.
In one embodiment, acquiring an image of a first object in a video session includes:
acquiring a video frame of a video session;
inputting the video frame into a pre-trained object recognition model, recognizing an image area of the first object in the video frame through the object recognition model, and cutting the video frame according to the image area to obtain an image of the first object.
In one embodiment, before replacing the video session context of the second object in the video session with the target context, the method further comprises:
in the video session, identifying a video session background of the second object;
replacing the video session context of the second object in the video session with the target context, comprising:
and replacing the video session background with the target background according to the replacement relation between the video session background and the target background.
In a second aspect, the present application also provides a background replacement device. The device comprises:
the image acquisition module is used for acquiring an image of a first object in the video session; the video session is a video session between a first object and the second object;
the image recognition module is used for recognizing the image to obtain a first attribute characteristic of the first object and obtaining a second attribute characteristic of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object;
the feature fusion module is used for carrying out feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object;
the target background generation module is used for inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background;
and the target background replacing module is used for replacing the video session background of the second object in the video session with the target background.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
Acquiring an image of a first object in a video session; the video session is a video session between a first object and a second object; identifying the image to obtain a first attribute feature of the first object, and obtaining a second attribute feature of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object; performing feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object; inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background; and replacing the video session background of the second object in the video session with the target background.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring an image of a first object in a video session; the video session is a video session between a first object and a second object; identifying the image to obtain a first attribute feature of the first object, and obtaining a second attribute feature of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object; performing feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object; inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background; and replacing the video session background of the second object in the video session with the target background.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
acquiring an image of a first object in a video session; the video session is a video session between a first object and a second object; identifying the image to obtain a first attribute feature of the first object, and obtaining a second attribute feature of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object; performing feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object; inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background; and replacing the video session background of the second object in the video session with the target background.
The background replacing method, the device, the computer equipment, the storage medium and the computer program product acquire an image of a first object in a video session, the video session is the video session between the first object and a second object, the image is identified to obtain a first attribute feature of the first object, the second attribute feature of the first object is acquired, the first attribute feature is used for representing object information of the first object in the image, the second attribute feature is used for representing basic information of the first object, feature fusion processing is carried out on the first attribute feature and the second attribute feature to obtain a fusion attribute feature of the first object, the fusion attribute feature is input into a pre-trained background generation model to generate a target background, and the video session background of the second object in the video session is replaced by the target background. According to the scheme, the image of the first object is acquired in the video session between the first object and the second object, the first attribute characteristic of the first object is identified from the image, the second attribute characteristic of the basic information of the first object is acquired, the first attribute characteristic and the second attribute characteristic are fused to obtain the fused attribute characteristic of the first object, the background generation model is utilized, the target background is obtained based on the fused attribute characteristic, the original background of the second object in the video session is replaced by the target background, so that the background generation and background replacement efficiency is improved, the background generation process is effectively simplified, the defect that the cost for manufacturing the entity background wall is high is overcome, a large amount of cost is further saved, the generated two-dimensional background style is matched with the object information of the first object, the fusion degree with the first object is higher, and the generated background reality is improved.
Drawings
FIG. 1 is a flow diagram of a background replacement method in one embodiment;
FIG. 2 is a schematic diagram of a video session in one embodiment;
FIG. 3 is a flowchart illustrating a first attribute feature step of obtaining a first object in one embodiment;
FIG. 4 is a flowchart illustrating a step of acquiring a second attribute of the first object in one embodiment;
FIG. 5 is a flow diagram of the step of generating a target background in one embodiment;
FIG. 6 is a block diagram of a background replacement device in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a background replacement method is provided, and this embodiment is applied to a terminal (may also be a server) by using the method for illustration, and includes the following steps:
step S101, an image of a first object in a video session is acquired.
In this step, as shown in fig. 2, the video session is a video session between a first object and a second object, where the video session may be a video call, for example, the video session may be a video call performed between the first object and the second object, the video session may also be a video session between a first terminal and the second terminal, the first object may be an object corresponding to the first terminal, and the second object may be an object corresponding to the second terminal.
It should be noted that, the execution body of the embodiment may be a first terminal, a second terminal or a server, which is not limited herein, where the first terminal and the second terminal may be connected in communication, the server and the first terminal may be connected in communication, and the server and the second terminal may be connected in communication.
Specifically, as shown in fig. 2, the terminal identifies an image area of the first object in a video frame of the video session, and takes the image area of the first object as an image of the first object.
Step S102, the image is identified, a first attribute feature of the first object is obtained, and a second attribute feature of the first object is obtained.
In this step, the first attribute is used to characterize the object information of the first object in the image, for example, the first attribute is obtained by performing feature recognition on the object information of the first object in the image; the second attribute features are used to characterize the underlying information of the first object, e.g., the second attribute features result from feature recognition of the underlying information of the first object.
Specifically, the terminal identifies an image of the first object to obtain object information of the first object in the image, performs feature identification on the object information to obtain first attribute features of the first object, acquires basic information of the first object, and performs feature identification on the basic information to obtain second attribute features of the first object.
And step S103, carrying out feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object.
Specifically, the terminal dynamically fuses the first attribute feature and the second attribute feature to obtain a fused attribute feature of the first object.
Step S104, inputting the fusion attribute features into a pre-trained background generation model to generate a target background.
In this step, the background generation model may be a style migration-based background generation model (e.g., generator).
Specifically, the terminal fusion attribute features are input into a pre-trained background generation model, and a target background is generated based on the fusion attribute features through the background generation model.
Step S105, replacing the video session background of the second object in the video session with the target background.
In this step, as shown in fig. 2, the video session context may refer to the original context of the second object in the video session.
Specifically, in the video session, the terminal identifies the video session background of the second object, and replaces the video session background of the second object with the target background.
In the background replacing method, an image of a first object in a video session is acquired, the video session is the video session between the first object and a second object, the image is identified to obtain a first attribute feature of the first object, the second attribute feature of the first object is acquired, the first attribute feature is used for representing object information of the first object in the image, the second attribute feature is used for representing basic information of the first object, feature fusion processing is carried out on the first attribute feature and the second attribute feature to obtain a fusion attribute feature of the first object, the fusion attribute feature is input into a pre-trained background generation model to generate a target background, and the video session background of the second object in the video session is replaced by the target background. According to the scheme, the image of the first object is acquired in the video session between the first object and the second object, the first attribute characteristic of the first object is identified from the image, the second attribute characteristic of the basic information of the first object is acquired, the first attribute characteristic and the second attribute characteristic are fused to obtain the fused attribute characteristic of the first object, the background generation model is utilized, the target background is obtained based on the fused attribute characteristic, the original background of the second object in the video session is replaced by the target background, so that the background generation and background replacement efficiency is improved, the background generation process is effectively simplified, the defect that the cost for manufacturing the entity background wall is high is overcome, a large amount of cost is further saved, the generated two-dimensional background style is matched with the object information of the first object, the fusion degree with the first object is higher, and the generated background reality is improved.
In one embodiment, as shown in fig. 3, the identifying the image in step S102 to obtain the first attribute of the first object specifically includes: step S301, performing dressing dimension recognition on the image to obtain dressing information of the first object in the image, and performing emotion dimension recognition on the image to obtain emotion information of the first object in the image; step S302, carrying out feature recognition on the wearing information to obtain wearing features of the first object in the image, and carrying out feature recognition on the emotion information to obtain emotion features of the first object in the image; step S303, determining a first attribute feature according to the dressing feature and the emotion feature.
In this embodiment, the object information includes dressing information and emotion information, where the dressing information may be an image area representing dressing of the first object in the image, or may be information representing dressing of the first object, and the emotion information may be an image area representing emotion of the first object in the image, or may be information representing emotion of the first object; dressing dimensions may refer to dressing aspects; emotion dimensions may refer to emotional aspects; the dressing feature may be a dressing feature, such as dressing style; the emotional characteristic may be a characteristic of an emotion, such as an emotional characterization.
Specifically, the terminal performs dressing dimension recognition on the image, determines dressing information of the first object in the image, performs emotion dimension recognition on the image, determines emotion information of the first object in the image, performs feature recognition on the dressing information to obtain dressing features of the first object in the image, performs feature recognition on the emotion information to obtain emotion features of the first object in the image, and combines the dressing features and the emotion features to obtain first attribute features.
As shown in fig. 2, the terminal extracts a video frame (such as a face-view image) from a video session (such as a face-view image), inputs the video frame to a pre-trained target detection model, performs target detection on the video frame (performs target detection on the video frame, outputs an image marked with an image region rectangular frame of a first object, and cuts the image region rectangular frame of the first object in the image to obtain an image of the first object, inputs the image of the first object to an image pre-training network model (such as a model constructed based on VGG, resNet or concept, where VGG may refer to a depth convolutional neural network, resNet may refer to a residual network, and concept may refer to a depth network), and extracts a first attribute feature (such as information about a wearing style, emotion characterization, etc. of the first object) in the image of the first object through the image pre-training network model.
According to the technical scheme, the wearing features and the emotion features of the first object are obtained by identifying the image, and the wearing features and the emotion features are used as the first attribute features, so that the first attribute features which are more diversified and more accurate are obtained, and the accuracy of background generation and background replacement is improved.
In one embodiment, as shown in fig. 4, the acquiring the second attribute feature of the first object in step S102 specifically includes: step S401, basic information of a first object is acquired; step S402, mapping the basic information to a preset semantic space to obtain semantic features corresponding to the basic information; step S403, determining a second attribute feature according to the semantic feature.
In this embodiment, the basic information may be text attribute information of the first object, for example, the basic information of the first object may include work information, age information, and/or personalized information of the first object; the preset semantic space may be a preset semantic space (e.g., vector space) determined from a word vector model.
Specifically, the terminal acquires fixed basic information of the first object from the database, maps the basic information to a preset semantic space, performs feature mapping processing to obtain semantic features corresponding to the basic information, and takes the semantic features as second attribute features of the first object. For example, the terminal may perform object recognition on the image of the first object to obtain the name of the first object, query the database according to the name of the first object, and obtain the fixed basic information and the personalized basic information of the first object as the basic information of the first object.
The terminal performs high-dimensional mapping on basic information of the first object through a word vector model to obtain corresponding mapping features serving as second attribute features, and then performs dynamic fusion processing on the obtained mapping features and first attribute features such as dressing style, emotion characterization and the like of the first object extracted from the image of the first object based on an attention mechanism to obtain a final fusion feature serving as a fusion attribute feature of the first object.
According to the technical scheme, based on the word vector model, feature mapping is carried out on the basic information of the first object to obtain the second attribute features, and the efficiency and accuracy of obtaining the second attribute features are improved, so that the efficiency and accuracy of background generation and background replacement are improved.
In one embodiment, as shown in fig. 5, the step S104 of inputting the fusion attribute feature into the pre-trained background generation model, and generating the target background specifically includes: step S501, inputting the fusion attribute characteristics into a background generation model, and predicting a first object based on the fusion attribute characteristics through the background generation model to obtain a prediction result of the first object; step S502, generating a target background according to a prediction result by a background generation model.
In this embodiment, the prediction result is used to represent a desired style of the first object, for example, the prediction result may refer to feature information or feature elements of the desired style of the first object, and the desired style may refer to a style (such as a cartoon style) preferred by the first object; the target background may be a target style two-dimensional background.
Specifically, the terminal inputs the fusion attribute characteristics into a background generation model, predicts the first object according to the expected style based on the fusion attribute characteristics through the background generation model to obtain a target style, takes the characteristics of the target style as a prediction result of the first object, and generates a target background according to the prediction result through the background generation model.
The terminal inputs the fusion attribute features into a pre-trained background generation model based on style migration, the fusion attribute features are processed through the background generation model, and based on the fusion attribute features, the background generation model can automatically analyze feature information of a corresponding object, such as emotion, age, working information, color preference and the like of the object, and a corresponding target style two-dimensional background is generated through a deep learning model decoding mode (the generated two-dimensional background can be effectively applied to object portrait background switching in real life).
According to the technical scheme, the expected style of the first object is determined based on the fusion attribute characteristics by utilizing the background generation model, the target background is generated according to the expected style, and the generated style of the two-dimensional background is matched with the attribute information of the first object, so that the accuracy of background generation and background replacement is improved, and the fusion degree of the background and the first object is improved.
In one embodiment, the pre-trained background generation model is trained by the following method, specifically including: acquiring a first sample attribute characteristic and a second sample attribute characteristic of a sample object; performing feature fusion processing on the first sample attribute features and the second sample attribute features to obtain sample fusion attribute features of the sample object; inputting the sample fusion attribute characteristics into a background generation model to be trained to obtain a sample background; training the background generating model to be trained according to satisfaction of the sample background and the sample object on the sample background to obtain a pre-trained background generating model.
In this embodiment, the first sample attribute feature is used to characterize sample object information of a sample object in a sample image, and the second sample attribute feature is used to characterize sample base information of the sample object; a sample may refer to data that is a training sample, e.g., a sample object may be a first object that is a training sample.
Specifically, the terminal obtains sample object information of a sample object in a sample image, performs feature recognition on the sample object information to obtain a first sample attribute feature of the sample object, performs feature recognition on sample basic information of the sample object to obtain a second sample attribute feature of the sample object, performs feature fusion processing on the first sample attribute feature and the second sample attribute feature to obtain a sample fusion attribute feature of the sample object, inputs the sample fusion attribute feature into a background generation model to be trained to obtain a sample background, obtains satisfaction of the sample object to the sample background, trains the background generation model to be trained according to the sample background and the satisfaction of the sample object to the sample background, and obtains a pre-trained background generation model.
Illustratively, the terminal performs end-to-end joint training on the attention network and a background generation model based on style migration (background generation model to be trained): the method comprises the steps that video frame data (sample images) in sample videos are collected by a terminal, the video frame data (sample images) are disassembled into sample object information (object image portions) and background information (background portions), meanwhile attribute information of sample objects in the video frames is collected, two-dimensional background pictures of a target style are collected to serve as migration style types (e.g. cartoon types), wherein the main purpose of an attention network is to organically combine the sample object information (first sample attribute characteristics) and fixed attribute information (second sample attribute characteristics) to extract corresponding comprehensive characteristics (sample fusion attribute characteristics), the comprehensive characteristics serve as background generation models of style migration to input and generate corresponding target two-dimensional background pictures, loss calculation is conducted on the generated backgrounds and target background classifiers, the optimization process of the models is conducted to minimize parameter selection on the loss results, after training of each fixed number of rounds, the sample objects are independently assessed and verified to confirm whether conversion effects of the background pictures in the concentrated video frames are satisfactory, and if the satisfaction reaches a certain threshold, training is stopped, namely all training processes are learned end to end by end. For example, the terminal performs style migration by generating an countermeasure network, for example, training a classifier by taking a two-dimensional background picture as a positive sample (such as a cartoon type), that is, taking the two-dimensional background of the cartoon picture as the positive sample and the background of a video picture frame as the negative sample, firstly, encoding the video picture frame by a generator to obtain picture characterization, and generating a corresponding target background picture based on the picture characterization and fusion characteristics obtained by aggregation as input, wherein the target background picture needs to achieve the purpose of minimizing the error of the classification result of the two-dimensional background picture in the generation stage, and finally matching the generated background picture with the manual annotation in a supervised learning mode.
According to the technical scheme, the sample background is obtained through the first sample attribute features and the second sample attribute features of the sample object, the background generation model to be trained is trained by using the sample background and the satisfaction degree of the sample object on the sample background, and the pre-trained background generation model with higher accuracy and efficiency is obtained, so that the accuracy and efficiency of background generation and background replacement are improved.
In one embodiment, the capturing the image of the first object in the video session in step S101 specifically includes: acquiring a video frame of a video session; inputting the video frame into a pre-trained object recognition model, recognizing an image area of the first object in the video frame through the object recognition model, and cutting the video frame according to the image area to obtain an image of the first object.
In this embodiment, the object recognition model may be an object detection model.
Specifically, the terminal acquires a video frame from a video session, inputs the video frame to a pre-trained object recognition model, recognizes an image area of a first object in the video frame through the object recognition model, and performs clipping processing on the image area of the first object on the video frame to obtain an image of the first object.
According to the technical scheme, the image area of the first object is identified from the video frame, and the video frame is cut to obtain the image of the first object, so that the image of the first object can be obtained more efficiently and accurately, and the accuracy and the efficiency of background generation and background replacement can be improved subsequently.
In one embodiment, the method may further determine a video session background by: in the video session, identifying a video session background of the second object; the replacing the video session background of the second object in the video session with the target background in step S105 specifically includes: and replacing the video session background with the target background according to the replacement relation between the video session background and the target background.
Specifically, the terminal identifies the video session background of the second object in the video session, constructs a replacement relationship between the video session background and the target background, and replaces the video session background in the video session with the target background by using the replacement relationship between the video session background and the target background, that is, replaces the area where the original background appears in the surface-view video frame with the target two-dimensional background.
According to the technical scheme of the embodiment, the video session background in the video session is replaced by the target background according to the replacement relation between the video session background and the target background, so that the accuracy and the efficiency of background replacement are improved.
The following describes, in an embodiment, a background replacement method provided in the present application, where the embodiment is applied to a terminal (or may be a server) to perform an illustration, and the main steps include:
the method comprises the steps that a terminal obtains a first sample attribute characteristic and a second sample attribute characteristic of a sample object.
And secondly, the terminal performs feature fusion processing on the first sample attribute features and the second sample attribute features to obtain sample fusion attribute features of the sample object.
Thirdly, the terminal inputs the sample fusion attribute characteristics to a background generation model to be trained, and a sample background is obtained.
And fourthly, training the background generation model to be trained according to the satisfaction degree of the sample background and the sample object on the sample background by the terminal to obtain a pre-trained background generation model.
And fifthly, the terminal acquires video frames of the video session.
And sixthly, inputting the video frame into a pre-trained object recognition model by the terminal, recognizing an image area of the first object in the video frame through the object recognition model, and cutting the video frame according to the image area to obtain an image of the first object.
Seventh, the terminal performs dressing dimension recognition on the image to obtain dressing information of the first object in the image, and performs emotion dimension recognition on the image to obtain emotion information of the first object in the image.
Eighth, the terminal performs feature recognition on the dressing information to obtain dressing features of the first object in the image, and performs feature recognition on the emotion information to obtain emotion features of the first object in the image.
And ninth, the terminal determines the first attribute characteristic according to the dressing characteristic and the emotion characteristic.
And tenth step, the terminal acquires the basic information of the first object.
Eleventh, the terminal maps the basic information to a preset semantic space to obtain semantic features corresponding to the basic information.
And twelfth, the terminal determines second attribute characteristics according to the semantic characteristics.
And thirteenth, the terminal performs feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object.
And fourteenth step, the terminal inputs the fusion attribute characteristics into a background generation model, predicts the first object based on the fusion attribute characteristics through the background generation model, and obtains a prediction result of the first object.
And fifteenth, the terminal generates a target background according to the prediction result by using a background generation model.
Sixteenth, the terminal identifies a video session background of the second object in the video session.
Seventeenth, the terminal replaces the video session background with the target background according to the replacement relation between the video session background and the target background.
The first sample attribute features are used for representing sample object information of the sample object in the sample image, and the second sample attribute features are used for representing sample basic information of the sample object; the video session is a video session between a first object and a second object; the object information includes dressing information and emotion information; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object; the prediction results are used to characterize a desired style of the first object.
According to the technical scheme, in a video session between a first object and a second object, an image of the first object is acquired, first attribute features of the first object are identified from the image, second attribute features of basic information of the first object are acquired, the first attribute features and the second attribute features are fused to obtain fused attribute features of the first object, a background generation model is utilized, a target background is obtained based on the fused attribute features, and an original background of the second object in the video session is replaced by the target background, so that the background generation and background replacement efficiency is improved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiments of the present application also provide a background replacement apparatus for implementing the above-mentioned background replacement method. The implementation of the solution provided by the device is similar to that described in the above method, so the specific limitation in one or more embodiments of the background replacement device provided below may be referred to above for limitation of the background replacement method, which is not repeated here.
In one embodiment, as shown in FIG. 6, a context replacement device is provided, the device 600 may include:
an image acquisition module 601, configured to acquire an image of a first object in a video session; the video session is a video session between a first object and the second object;
an image recognition module 602, configured to recognize the image, obtain a first attribute feature of the first object, and obtain a second attribute feature of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object;
a feature fusion module 603, configured to perform feature fusion processing on the first attribute feature and the second attribute feature, to obtain a fused attribute feature of the first object;
the target background generation module 604 is configured to input the fusion attribute feature to a pre-trained background generation model, and generate a target background;
and the target background replacing module 605 is configured to replace the video session background of the second object in the video session with the target background.
In one embodiment, the object information includes dressing information and mood information; the image recognition module 602 is further configured to perform dressing dimension recognition on the image to obtain the dressing information of the first object in the image, and perform emotion dimension recognition on the image to obtain the emotion information of the first object in the image; performing feature recognition on the dressing information to obtain dressing features of the first object in the image, and performing feature recognition on the emotion information to obtain emotion features of the first object in the image; and determining the first attribute characteristic according to the dressing characteristic and the emotion characteristic.
In one embodiment, the image recognition module 602 is further configured to obtain basic information of the first object; mapping the basic information to a preset semantic space to obtain semantic features corresponding to the basic information; and determining the second attribute characteristic according to the semantic characteristic.
In one embodiment, the target background generating module 604 is further configured to input the fusion attribute feature to the background generating model, and predict, by using the background generating model, the first object based on the fusion attribute feature, so as to obtain a prediction result of the first object; the prediction result is used for representing the expected style of the first object; and generating the target background according to the prediction result by the background generation model.
In one embodiment, the apparatus 600 further comprises: the model training module is used for acquiring a first sample attribute characteristic and a second sample attribute characteristic of the sample object; the first sample attribute features are used for representing sample object information of the sample object in a sample image, and the second sample attribute features are used for representing sample basic information of the sample object; performing feature fusion processing on the first sample attribute features and the second sample attribute features to obtain sample fusion attribute features of the sample object; inputting the sample fusion attribute characteristics to a background generation model to be trained to obtain a sample background; and training the background generation model to be trained according to the satisfaction degree of the sample background and the sample object on the sample background to obtain the pre-trained background generation model.
In one embodiment, the image acquisition module 601 is further configured to acquire a video frame of the video session; inputting the video frame into a pre-trained object recognition model, recognizing an image area of the first object in the video frame through the object recognition model, and cutting the video frame according to the image area to obtain an image of the first object.
In one embodiment, the apparatus 600 further comprises: the background recognition module is used for recognizing the video session background of the second object in the video session; the target background replacing module 605 is further configured to replace the video session background with the target background according to a replacement relationship between the video session background and the target background.
The various modules in the above-described background replacement apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
It should be noted that the method and apparatus for background replacement provided in the present application may be used in the application field related to background replacement in the financial field, and may also be used in the processing related to background replacement in any field other than the financial field, where the application field of the method and apparatus for background replacement provided in the present application is not limited.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a context replacement method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (11)

1. A background replacement method, the method comprising:
acquiring an image of a first object in a video session; the video session is a video session between a first object and the second object;
identifying the image to obtain a first attribute characteristic of the first object, and obtaining a second attribute characteristic of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object;
Performing feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object;
inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background;
and replacing the video session background of the second object in the video session with the target background.
2. The method according to claim 1, wherein the object information includes dressing information and emotion information;
the identifying the image to obtain the first attribute feature of the first object includes:
performing dressing dimension identification on the image to obtain dressing information of the first object in the image, and performing emotion dimension identification on the image to obtain emotion information of the first object in the image;
performing feature recognition on the dressing information to obtain dressing features of the first object in the image, and performing feature recognition on the emotion information to obtain emotion features of the first object in the image;
and determining the first attribute characteristic according to the dressing characteristic and the emotion characteristic.
3. The method of claim 1, wherein the obtaining the second attribute characteristic of the first object comprises:
acquiring basic information of the first object;
mapping the basic information to a preset semantic space to obtain semantic features corresponding to the basic information;
and determining the second attribute characteristic according to the semantic characteristic.
4. The method of claim 1, wherein the inputting the fused attribute features into a pre-trained background generation model generates a target background, comprising:
inputting the fusion attribute characteristics into the background generation model, and predicting the first object based on the fusion attribute characteristics through the background generation model to obtain a prediction result of the first object; the prediction result is used for representing the expected style of the first object;
and generating the target background according to the prediction result by the background generation model.
5. The method of claim 1, wherein the pre-trained background generation model is trained by:
acquiring a first sample attribute characteristic and a second sample attribute characteristic of a sample object; the first sample attribute features are used for representing sample object information of the sample object in a sample image, and the second sample attribute features are used for representing sample basic information of the sample object;
Performing feature fusion processing on the first sample attribute features and the second sample attribute features to obtain sample fusion attribute features of the sample object;
inputting the sample fusion attribute characteristics to a background generation model to be trained to obtain a sample background;
and training the background generation model to be trained according to the satisfaction degree of the sample background and the sample object on the sample background to obtain the pre-trained background generation model.
6. The method of claim 1, wherein the acquiring an image of the first object in the video session comprises:
acquiring a video frame of the video session;
inputting the video frame into a pre-trained object recognition model, recognizing an image area of the first object in the video frame through the object recognition model, and cutting the video frame according to the image area to obtain an image of the first object.
7. The method of claim 1, further comprising, prior to replacing a video session context of a second object in the video session with the target context:
identifying a video session background of the second object in the video session;
The replacing the video session background of the second object in the video session with the target background includes:
and replacing the video session background with the target background according to the replacement relation between the video session background and the target background.
8. A background replacement device, the device comprising:
the image acquisition module is used for acquiring an image of a first object in the video session; the video session is a video session between a first object and the second object;
the image recognition module is used for recognizing the image to obtain a first attribute characteristic of the first object and obtaining a second attribute characteristic of the first object; the first attribute features are used for representing object information of the first object in the image, and the second attribute features are used for representing basic information of the first object;
the feature fusion module is used for carrying out feature fusion processing on the first attribute features and the second attribute features to obtain fusion attribute features of the first object;
the target background generation module is used for inputting the fusion attribute characteristics into a pre-trained background generation model to generate a target background;
And the target background replacing module is used for replacing the video session background of the second object in the video session with the target background.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310154433.1A 2023-02-22 2023-02-22 Background replacement method, device, computer equipment and storage medium Pending CN116156092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310154433.1A CN116156092A (en) 2023-02-22 2023-02-22 Background replacement method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310154433.1A CN116156092A (en) 2023-02-22 2023-02-22 Background replacement method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116156092A true CN116156092A (en) 2023-05-23

Family

ID=86356056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310154433.1A Pending CN116156092A (en) 2023-02-22 2023-02-22 Background replacement method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116156092A (en)

Similar Documents

Publication Publication Date Title
CN111767228B (en) Interface testing method, device, equipment and medium based on artificial intelligence
CN109345130A (en) Method, apparatus, computer equipment and the storage medium of Market Site Selection
CN110515986B (en) Processing method and device of social network diagram and storage medium
CN116049397B (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
CN117332766A (en) Flow chart generation method, device, computer equipment and storage medium
CN116030466B (en) Image text information identification and processing method and device and computer equipment
CN116977336A (en) Camera defect detection method, device, computer equipment and storage medium
CN115758271A (en) Data processing method, data processing device, computer equipment and storage medium
CN116156092A (en) Background replacement method, device, computer equipment and storage medium
CN116612474B (en) Object detection method, device, computer equipment and computer readable storage medium
CN114490996B (en) Intention recognition method and device, computer equipment and storage medium
CN114781557B (en) Image information acquisition method and device and computer-readable storage medium
CN117676270A (en) Video data generation method, device, computer equipment and storage medium
CN117975473A (en) Bill text detection model training and detection method, device, equipment and medium
CN117670686A (en) Video frame enhancement method, device, computer equipment and storage medium
CN117407418A (en) Information acquisition method, information acquisition device, computer apparatus, storage medium, and program product
CN116881122A (en) Test case generation method, device, equipment, storage medium and program product
CN115630973A (en) User data processing method, device, computer equipment and storage medium
CN115658899A (en) Text classification method and device, computer equipment and storage medium
CN116049009A (en) Test method, test device, computer equipment and computer readable storage medium
CN116861071A (en) Information pushing method, information pushing device, computer equipment, storage medium and program product
CN116823384A (en) Product recommendation method, device, apparatus, storage medium and computer program product
CN117390098A (en) Data analysis method, device, computer equipment and storage medium
CN117495191A (en) Quality of service evaluation method, device, computer equipment and storage medium
CN117150311A (en) Data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination