CN117376597A - Digital human video processing method, electronic equipment and medium - Google Patents

Digital human video processing method, electronic equipment and medium Download PDF

Info

Publication number
CN117376597A
CN117376597A CN202210763176.7A CN202210763176A CN117376597A CN 117376597 A CN117376597 A CN 117376597A CN 202210763176 A CN202210763176 A CN 202210763176A CN 117376597 A CN117376597 A CN 117376597A
Authority
CN
China
Prior art keywords
feedback information
digital
video
adjustment
digital human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210763176.7A
Other languages
Chinese (zh)
Inventor
郑洛
李蒙蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202210763176.7A priority Critical patent/CN117376597A/en
Publication of CN117376597A publication Critical patent/CN117376597A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23412Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/475End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data
    • H04N21/4756End-user interface for inputting end-user data, e.g. personal identification number [PIN], preference data for rating content, e.g. scoring a recommended movie
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4781Games
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application relates to the technical field of information communication, and discloses a digital human video processing method, electronic equipment and a medium. The digital human video processing method comprises the following steps: the method comprises the steps that first electronic equipment obtains feedback information of a user on a first digital human video, wherein the first digital human video comprises a plurality of video elements; acquiring characteristic feedback information corresponding to at least one video element from each feedback information; and adjusting the corresponding video element based on the acquired characteristic feedback information, and generating a second digital human video based on the adjusted video element. Based on the scheme, video elements in the digital human video, such as digital people or virtual scenes, can be adjusted based on feedback information of the user on the digital human video, and audience satisfaction can be effectively improved. And the current video element can be adjusted and updated in time according to the adjusted video element, so that the updating period is shortened.

Description

Digital human video processing method, electronic equipment and medium
Technical Field
The present disclosure relates to the field of information communication technologies, and in particular, to a digital human video processing method, an electronic device, and a medium.
Background
With the continuous development of information communication technologies (Information Communication Technology, ICT), more and more video contents created by replacing real people with digital people and replacing real scenes with virtual scenes are presented in various industries (e.g., film and television synthetic programs, live broadcasting platforms, etc.), and video contents created based on digital assets (digital people and virtual scenes) are also more and more popular with the public. Wherein digital humans refer to the integrated products that exist in the non-physical world, are created and used by computer means, and have multiple human features (e.g., appearance features, human performance capabilities, interactivity, etc.).
In the prior art, digital human video (i.e. video including digital human and some virtual scenes) may be continuously adjusted according to different requirements of clients. As shown in fig. 1, some adjustment methods in the prior art are: the digital assets corresponding to the digital person video, namely the digital person and the virtual scene where the digital person is located, are adjusted by the offline training platform 10 based on various demands issued by clients or interiors, for example, the nose, the makeup of the digital person and the like can be adjusted, and the lights, the colors and the like in the virtual scene can also be adjusted. The adjusted digital asset is then manually updated to digital asset platform 11. When playing is required, the digital content production platform 12 can load the adjusted digital asset from the digital asset platform 11, and can acquire corresponding video on demand, live broadcast, interactive live broadcast and the like based on the adjusted digital asset, and send the video to the playing platform for playing.
In the adjustment process, the update of the digital asset corresponding to the digital human video needs to be performed offline, so that a long time is required, and real-time adjustment is difficult to achieve. In addition, the viewer experience is not considered, resulting in poor viewer satisfaction in some cases.
Disclosure of Invention
In order to solve the above problems, embodiments of the present application provide a digital human video processing method, an electronic device, and a medium.
In a first aspect, an embodiment of the present application provides a digital human video processing method, including: the method comprises the steps that first electronic equipment obtains feedback information of a user on a first digital human video, wherein the first digital human video comprises a plurality of video elements; the first electronic equipment acquires characteristic feedback information corresponding to at least one video element from the feedback information; and the first electronic equipment adjusts the corresponding video element based on the acquired characteristic feedback information and generates a second digital human video based on the adjusted video element.
It can be appreciated that in the embodiment of the present application, in the method, video elements in the digital human video, for example, digital people or virtual scenes, can be adjusted based on feedback information of the user watching the digital human video, so that satisfaction of viewers can be effectively improved.
In addition, in the scheme, the current video element can be adjusted and updated in time according to the adjusted video element, so that the updating period is shortened. After the next playing instruction is obtained, corresponding video is generated in time based on the adjusted video elements and is sent to a terminal playing platform for playing.
In one possible implementation, the video element includes a digital person and a virtual scene, and the feature feedback information of the video element includes feature feedback information of the digital person and feature feedback information of the virtual scene.
In a possible implementation, the adjusting the corresponding video element based on the obtained characteristic feedback information includes: acquiring adjustment willingness of each user corresponding to each characteristic feedback information in the characteristic feedback information, and determining the characteristic to be adjusted and the adjustment mode of the video element based on the first adjustment willingness when the number of the first adjustment willingness in each user adjustment willingness reaches the set number; and adjusting the corresponding video elements based on the characteristics to be adjusted and the adjustment mode.
It can be appreciated that in the embodiment of the present application, after the adjustment will corresponding to the user reaches the set number, the video elements may be adjusted, so that the adjusted video may satisfy the will of most users, and user satisfaction is improved.
In a possible implementation, the characteristic feedback information corresponding to at least one video element is obtained from each feedback information; comprising the following steps: acquiring keywords corresponding to the feedback information; taking the feedback information corresponding to the corresponding keywords and the characteristics of the digital person as the characteristic feedback information of the digital person; and taking the feedback information corresponding to the corresponding keywords and the characteristics of the virtual scene as the characteristic feedback information of the virtual scene.
In one possible implementation, the feedback information includes bullet screen information, post information, comment information, and questionnaire information sent by the user.
In one possible implementation, the characteristic feedback information of the digital person includes appearance characteristic feedback information of the digital person, stature characteristic feedback information of the digital person, and figuration characteristic feedback information of the digital person.
In one possible implementation, the feature feedback information of the virtual scene includes color feature feedback information, light feature feedback information, and layout feature feedback information of the virtual scene.
In one possible implementation, the method further comprises: and when the next playing instruction is acquired, the second digital human video is sent to the second electronic equipment.
It can be understood that in the embodiment of the application, in live broadcast or other programs currently being played, the digital person is not updated in real time, and when a next playing instruction is received, the digital person video is played based on the adjusted digital person, so that the situation that the digital person model changes in the earlier stage and the later stage of the same-stage program, and poor experience is caused to the audience can be effectively avoided.
In a second aspect, an embodiment of the present application provides an electronic device, including: and a memory for storing instructions for execution by one or more processors of the electronic device, and the processor, which is one of the one or more processors of the electronic device, is for performing the digital human video processing method mentioned in the embodiments of the present application.
In a third aspect, an embodiment of the present application provides a readable storage medium, where instructions are stored on the readable storage medium, where the instructions when executed on an electronic device cause the electronic device to perform a digital human video processing method mentioned in the embodiment of the present application.
In a fourth aspect, embodiments of the present application provide a digital human video processing system, including: the information collection module is used for obtaining feedback information of a user on a first digital human video, and the first digital human video comprises a plurality of video elements; the information extraction module is used for acquiring characteristic feedback information corresponding to at least one video element from the feedback information; and the automatic optimization module is used for adjusting the corresponding video element based on the acquired characteristic feedback information and generating a second digital human video based on the adjusted video element.
Drawings
FIG. 1 illustrates a flow diagram of a digital human video adjustment method, according to some embodiments of the present application;
FIG. 2 illustrates a schematic diagram showing a digital human video playback interface, according to some embodiments of the present application;
FIG. 3 illustrates a schematic diagram of a digital human video processing system, according to some embodiments of the present application;
FIG. 4 illustrates a flow diagram of a digital human video processing method, according to some embodiments of the present application;
FIG. 5 illustrates a process schematic of a digital human video processing method, according to some embodiments of the present application;
FIG. 6 illustrates an interactive flow diagram of a digital human video processing method, according to some embodiments of the present application;
fig. 7 illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application.
Detailed Description
Illustrative embodiments of the present application include, but are not limited to, a digital human video processing method, an electronic device, and a medium.
In order to solve the technical problems in the background technology, the embodiment of the application provides a digital human video processing method which can be used for a digital human video processing system. The method comprises the following steps: and in the process of playing the current digital human video, the feedback information of the current digital human video watched by the user is obtained in real time, and every time the time is set, the feedback information corresponding to each feature of the digital human in the digital human video or each feature in the virtual scene is obtained from the feedback information, and the feature to be adjusted and the adjustment mode are obtained based on the feedback information corresponding to each feature. And then adjusting the digital person and the virtual scene in the digital person video based on the characteristics to be adjusted and the adjustment mode to obtain the adjusted digital person and virtual scene. After the next playing instruction is obtained, the digital person video processing system can directly generate corresponding videos based on the adjusted digital person and virtual scene, and sends the videos to the terminal playing platform for playing.
It is understood that digital person video refers to video that contains digital persons and virtual scenes. The digital personal video processing system may refer to a server that adjusts and generates digital personal video.
Feedback information for a user viewing digital personal video may include bullet screen information, post and comment information, and questionnaire information filled out after viewing, etc., transmitted when the user views digital personal video.
It can be understood that in the embodiment of the application, the adjustment mode of the digital person or the virtual scene in the digital person video can be determined based on the feedback information of the audience watching the digital person video, so that the satisfaction degree of the audience can be effectively improved.
In addition, the digital person video processing system can timely adjust and update the current digital person and virtual scene according to the determined adjustment modes of the digital person, the virtual scene and the like, and the update period is shortened. After the next playing instruction is obtained, generating corresponding video based on the adjusted digital person and virtual scene in time, and sending the video to a terminal playing platform for playing.
Digital person features in digital person video may include any digital person related features such as the appearance features of the digital person (e.g., eyeglasses, nose, mouth, etc.), stature features (e.g., height, weight, etc.), image features (e.g., clothing, apparel, etc.). Scene features include colors, lights, layout features, etc. of the virtual scene.
It can be understood that the feature to be adjusted and the adjustment mode are acquired based on the feedback information corresponding to the acquired features. May include:
firstly, acquiring user adjustment willingness corresponding to feedback information corresponding to each feature, and determining the feature to be adjusted and the adjustment mode based on the adjustment willingness when any same user adjustment willingness reaches the set quantity; if not, the feature to be adjusted and the adjustment mode are not acquired.
It can be understood that in the embodiment of the present application, the corresponding user adjustment will may be obtained from the feedback information corresponding to each feature through a classification algorithm, a semantic understanding algorithm, and the like.
For example, when the bullet screen information published by the user watching the digital human video is "the digital human A face is too round" and "the scene light is too dark", the digital human video processing system can determine that the piece of information of "the digital human A face is too round" is feedback information of the digital human, and the piece of information of "the scene light is too dark" is feedback information of the scene through a classification algorithm.
The understood adjustment parameters obtained by feature extraction of the bullet screen information of the digital human A face too round are digital human A, face, width and narrowing through a semantic understanding algorithm. Namely, the adjustment willingness is determined as "the face shape width of the digital person a is narrowed". It can be appreciated that each piece of feedback information may extract one or more pieces of user adjustment intent. When the number of the user adjustment willing of narrowing the face width of the digital person A corresponding to the face feature reaches the set number, determining the adjustment mode to narrow the face width of the digital person A.
For example, fig. 2 illustrates a schematic diagram of a digital human video playback interface, according to some embodiments of the present application. As shown in fig. 2, a user logs in to a live platform through a mobile phone 100 to watch digital human video, and a display screen of the mobile phone 100 displays a digital human video interface 101.
With continued reference to fig. 2, when watching digital person a live, the user considers that the face of digital person a is too round, and then evaluates this presentation state of digital person a on the live platform for watching digital person, for example, sends comment information 1011 of "digital face is somewhat round". After the digital human video processing system acquires the evaluation information from the live broadcast platform, the digital human face dotted circle information can be determined to be feedback information of the digital human through a classification algorithm, and the adjustment parameters for extracting the characteristics of the bullet screen information of the digital human face dotted circle are digital human A, facial form, width and narrowing through a semantic understanding algorithm. Namely, determining the adjustment willingness as 'the width of the digital human A face is narrowed'.
It can be appreciated that each piece of feedback information may extract one or more pieces of user adjustment intent. When the number of user adjustment will of the narrowing of the face width of the digital person A reaches the set number, the feature to be adjusted is determined to be the facial feature, and the adjustment mode is to narrow the face width of the digital person A. And performs the adjustment to obtain the adjusted digital person.
It can be understood that, in order to avoid causing the change of the digital mannequin in the earlier and later stages of the same-stage program, the audience generates bad experience, in the live broadcast or other programs currently being played, the digital people are not updated in real time, and when the next playing instruction is received, the playing of the digital person video is performed based on the adjusted digital people.
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the digital human video adjustment system of the present application will be described in further detail with reference to the accompanying drawings.
Fig. 3 illustrates a schematic diagram of a digital human video processing system 2, according to some embodiments of the present application. As shown in fig. 3, the digital human video processing system 2 includes a video content production platform 20, a digital asset platform 21, and a media distribution platform 22. The video content production platform 20 includes an information collection extraction module 200, an automatic optimization module 203, and a video content production module 204. The information collection and extraction module 200 includes an information collection module 201 and an information extraction module 202.
The information collecting module 201 is configured to obtain feedback information of a user watching a digital human video. The real-time feedback information of the user watching the digital human video may include barrage information, posts and comments information, and questionnaire information filled after watching, etc., which are transmitted when the user watches the digital human video. It may be understood that, in the embodiment of the present application, the feedback information may be any information capable of representing the opinion of the viewer, and the acquisition form of the feedback information may be any implementation manner.
The information extraction module 202 is configured to obtain feedback information corresponding to each feature of a digital person in the digital person video or each feature in a scene from feedback information of a user watching the digital person video.
The digital person features in the digital person video may include any digital person related features such as appearance features, stature features, and image features of the digital person, for example, the appearance features may include eyebrows, eyeglasses, nose, mouth, arms, legs, etc.; stature features may include height, fat, thin, etc.; the visual characteristics may include clothing, ornaments, makeup, etc. Scene features include colors, lights, layout features, etc. of the virtual scene.
It may be appreciated that in some embodiments, the keyword extraction may be performed on the feedback information of the user through a neural network model or a related algorithm, and the feedback information including the keywords corresponding to the individual features in the digital person or the virtual scene may be used as the feedback information corresponding to each feature.
For example, when the bullet screen information of the user in watching the digital human video is "the digital human a face is too round", the keywords obtained by extracting the keywords of the feedback information of the user through the neural network model or the related algorithm include the digital human a, the face and the circle. The keyword "face" corresponds to the facial feature of the digital person, and the "digital person a face is too round" is used as feedback information corresponding to the facial feature of the digital person.
For example, when the bullet screen information of the user in watching the digital human video is "the nose of the digital human small a is inclined rightward", keywords obtained by extracting keywords from feedback information of the user through a neural network model or a related algorithm and the like include the digital human small a, the nose and the right skew. The keyword "nose" corresponds to the nose feature of the digital person, and the "small A nose of the digital person is skewed to the right" is used as feedback information corresponding to the nose feature of the digital person.
For example, if the bullet screen information of the user when watching the digital person video is "stage lighting is too dark", keywords obtained by extracting keywords from feedback information of the user through a neural network model or a related algorithm and the like include stage, lighting and darkness. The key word 'lamplight' corresponds to the lamplight characteristics of the digital person, and the 'stage lamplight is too dark' is used as feedback information corresponding to the lamplight characteristics of the virtual scene.
The automatic optimization module 203 is configured to determine whether a user adjustment intention reaches a set adjustment threshold, and if so, trigger a process of automatically adjusting parameters of the digital mannequin, and determine a feature to be adjusted and an adjustment mode based on the adjustment intention; if not, the feature to be adjusted and the adjustment mode are not acquired.
The automatic optimization module 203 is further configured to adjust the digital person and the virtual scene in the digital person video based on the feature to be adjusted and the adjustment mode, obtain the adjusted digital person and virtual scene, and store the adjusted digital person and virtual scene.
It can be appreciated that in the embodiment of the present application, the automatic optimization module 203 may adjust parameters corresponding to the digital person and the virtual scene in the digital person video when determining the feature to be adjusted and the adjustment mode.
For example, after the adjustment method is determined to narrow the face width of the digital person, the parameters corresponding to the face width of the digital person may be adjusted, for example, the parameters may be reduced by a small amount. The adjusted digital human model is then generated by any implementable means such as MESH deformation.
The automatic optimizing module 203 is further configured to update the adjusted digital person and virtual scene to the digital asset platform 21.
Digital asset platform 21 is used to store and manage digital assets, such as digital assets for adjusted digital persons and virtual scenes.
The video content creation module 204 is configured to obtain digital assets such as adjusted digital persons and virtual scenes, create corresponding digital person videos based on the adjusted digital persons and virtual scenes and corresponding video requirements, and send the digital person videos to the media distribution platform 22.
The media distribution platform 22 is used to send the adjusted digital personal video to the various playback platforms.
The digital personal video processing method provided in the embodiment of the present application will be described in detail with reference to the above digital personal video processing system 2. Fig. 4 illustrates a flow diagram of a digital human video processing method, according to some embodiments of the present application. The method may be performed by the digital human video processing system 2. As shown in fig. 4, the flow includes the steps of:
401: and acquiring feedback information of the user watching the digital human video.
It is understood that digital person video refers to video that contains digital persons and virtual scenes.
The real-time feedback information of the user watching the digital personal video may include bullet screen information, post and comment information, and questionnaire information filled out after watching, etc., which are transmitted when the user watches the digital personal video. It may be understood that, in the embodiment of the present application, the feedback information may be any information capable of representing the opinion of the viewer, and the acquisition form of the feedback information may be any implementation manner.
It will be appreciated that in some embodiments, the digital human video processing system may extract the screen bullet information sent by the user through optical character recognition (Optical Character Recognition, OCR). And acquiring posts and comment information sent by the user, questionnaire survey information filled after watching and the like through a background monitoring module.
402: and acquiring feedback information corresponding to each feature of the digital person in the digital person video or each feature in the scene from feedback information of the user watching the digital person video.
The digital person features in the digital person video may include any digital person related features such as appearance features, stature features, and image features of the digital person, for example, the appearance features may include eyebrows, eyeglasses, nose, mouth, arms, legs, etc.; stature features may include height, fat, thin, etc.; the visual characteristics may include clothing, ornaments, makeup, etc. Scene features include colors, lights, layout features, etc. of the virtual scene.
It may be appreciated that in some embodiments, the keyword extraction may be performed on the feedback information of the user through a neural network model or a related algorithm, and the feedback information including the keywords corresponding to the individual features in the digital person or the virtual scene may be used as the feedback information corresponding to each feature.
For example, when the bullet screen information of the user in watching the digital human video is "the digital human a face is too round", the keywords obtained by extracting the keywords of the feedback information of the user through the neural network model or the related algorithm include the digital human a, the face and the circle. The keyword "face" corresponds to the facial feature of the digital person, and the "digital person a face is too round" is used as feedback information corresponding to the facial feature of the digital person.
For example, when the bullet screen information of the user in watching the digital human video is "the nose of the digital human small a is inclined rightward", keywords obtained by extracting keywords from feedback information of the user through a neural network model or a related algorithm and the like include the digital human small a, the nose and the right skew. The keyword "nose" corresponds to the nose feature of the digital person, and the "small A nose of the digital person is skewed to the right" is used as feedback information corresponding to the nose feature of the digital person.
For example, if the bullet screen information of the user when watching the digital person video is "stage lighting is too dark", keywords obtained by extracting keywords from feedback information of the user through a neural network model or a related algorithm and the like include stage, lighting and darkness. The key word 'lamplight' corresponds to the lamplight characteristics of the digital person, and the 'stage lamplight is too dark' is used as feedback information corresponding to the lamplight characteristics of the virtual scene.
403: and acquiring user adjustment willingness corresponding to the feedback information corresponding to each feature based on the feedback information corresponding to each feature.
It can be understood that, in the embodiment of the present application, as shown in fig. 5, the corresponding user adjustment will may be obtained from the feedback information corresponding to each feature through a classification algorithm, a semantic understanding algorithm, and the like.
Specifically, the feedback information corresponding to each feature may be filtered and classified by using a classification algorithm according to the classification algorithm, for example, the feedback information corresponding to each feature in the digital mannequin, the feedback information corresponding to each feature in the virtual scene, and other types of feedback information.
For example, when the bullet screen information published by the user watching the digital human video is that the digital human A face is too round and the scene light is too dark, the electronic equipment can determine that the piece of information of the digital human A face is too round is feedback information of the digital human and the piece of information of the scene light is too dark through a classification algorithm is feedback information of the scene.
It will be appreciated that in the embodiments of the present application, there may be a variety of classification schemes, for example, classification into digital mannequins (or digital mannequins and scenes), classification into digital mannequins, sound, smart design, or scenes, and so on.
And then converting the feedback information into qualitative parameters of adjustment such as a digital human model or a virtual scene through semantic understanding, and carrying out parameter adjustment according to a set threshold. For example, feedback information corresponding to each feature in the digital human model can be converted into qualitative parameters for digital human model adjustment through semantic understanding, and whether the user adjustment willingness reaches a set adjustment threshold value is judged, so that adjustment of corresponding parameter values is performed.
For example, the adjusting parameters for extracting the characteristics of the bullet screen information of the digital human A face too round through the semantic understanding algorithm are digital human A, face, width and narrowing. Namely, the adjustment willingness is determined as "the face shape width of the digital person a is narrowed". It can be appreciated that each piece of feedback information may extract one or more pieces of user adjustment intent. It can be appreciated that the user's adjustment will be used to reflect the qualitative ratings of the digital person or virtual scene, etc. that need to be adjusted.
It can be appreciated that in the embodiment of the present application, the semantic understanding algorithm may be iterated continuously, so as to enable more accurate user adjustment will to be obtained.
404: judging whether the adjustment willingness of the user reaches a set adjustment threshold value, if so, turning to 405, and determining the feature to be adjusted and the adjustment mode based on the adjustment willingness; if not, go to 408, and do not perform the acquisition of the feature to be adjusted and the adjustment mode.
The mode of judging whether the user adjustment will reach the set adjustment threshold value can judge that the same number of user adjustment will reaches the set number, if so, the user adjustment will is determined to reach the set adjustment threshold value; if not, determining that the user adjustment will not reach the set adjustment threshold.
405: the feature to be adjusted and the adjustment mode are determined based on the adjustment intent.
In this embodiment of the present application, when it is determined that an adjustment intent of an arbitrary target user reaches a set adjustment threshold, an adjustment manner related to the target adjustment intent may be used as an adjustment manner of the feature to be adjusted based on the feature related to the target adjustment intent as the feature to be adjusted.
For example, when the user's intention to adjust the face width of the digital person a becomes narrow' reaches the set adjustment threshold, the face feature related to the "face width of the digital person a becomes a feature to be adjusted, and the face width becomes narrow as an adjustment mode of the face feature.
406: and adjusting the digital person and the virtual scene in the digital person video based on the characteristics to be adjusted and the adjustment mode, and acquiring the adjusted digital person and virtual scene.
It can be understood that in the embodiment of the present application, in determining the feature to be adjusted and the adjustment manner, parameters corresponding to the digital person and the virtual scene in the digital person video may be adjusted.
For example, after the adjustment method is determined to narrow the face width of the digital person, the parameters corresponding to the face width of the digital person may be adjusted, for example, the parameters may be reduced by a small amount. Then, as shown in fig. 5, an adjusted digital human model is generated by any implementable means such as MESH deformation, and the adjusted digital model is applied.
407: after the playing instruction is acquired, generating a corresponding video based on the adjusted digital person and the virtual scene, and sending the video to a terminal playing platform for playing.
It can be appreciated that in order to avoid causing the change of the digital mannequin in the earlier and later stages of the same-stage program, which causes bad experience for the audience, in live broadcast, on-demand broadcast or other programs which are currently being played, the digital people in live broadcast may not be updated in real time, and when the next playing instruction is received, the playing of the digital person video is performed based on the adjusted digital people.
408: and the characteristics to be adjusted and the adjustment modes are not acquired.
It can be understood that in the embodiment of the application, the adjustment mode of the digital person or the virtual scene in the digital person video can be determined based on the feedback information of the audience watching the digital person video, so that the satisfaction degree of the audience can be effectively improved.
In addition, the digital person video processing system can timely adjust and update the current digital person and virtual scene according to the determined adjustment modes of the digital person, the virtual scene and the like, and the update period is shortened. After the next playing instruction is obtained, generating corresponding video based on the adjusted digital person and virtual scene in time, and sending the video to a terminal playing platform for playing.
Fig. 6 illustrates a schematic diagram of an interactive flow for digital human video processing in a digital human video processing system, according to some embodiments of the present application. As shown in fig. 6, the flow includes the steps of:
601: the operator uploads digital human video data to digital asset platform 21 and digital asset platform 21 receives the digital human video data uploaded by the operator.
It will be appreciated that an operator may upload offline prepared digital personal video data, such as first-made digital personal video, to digital asset platform 21.
602: video content production module 204 loads digital human video data from digital asset platform 21.
603: the video content production module 204 receives the acquisition request from the media distribution platform 22 and transmits digital personal video data to the media distribution platform 22.
For example, the media distribution platform 22 sends a corresponding acquisition request to the video content production module 204 in response to a play request from a live platform or an on-demand platform.
604: the media distribution platform 22 sends the digital personal video to a playback platform such as a live platform or an on-demand platform.
It will be appreciated that in embodiments of the present application, the media distribution platform 22 may send digital personal video to a playback platform such as a live platform or an on-demand platform for viewing by viewers.
605: the information collection module 201 obtains feedback information of the user viewing the digital human video.
606: the information collection module 201 sends feedback information of the user viewing the digital human video to the information extraction module 202.
607: the information extraction module 202 obtains feedback information corresponding to each feature of a digital person in the digital person video or each feature in a scene from feedback information of a user viewing the digital person video.
608: the information extraction module 202 sends feedback information corresponding to each feature to the automatic optimization module 203.
609: the automatic optimization module 203 obtains user adjustment willingness corresponding to the feedback information corresponding to each feature based on the feedback information corresponding to each feature.
610: the automatic optimizing module 203 judges whether the adjustment willingness of the user reaches the set adjustment threshold, if so, the process goes to 611, and the feature to be adjusted and the adjustment mode are determined based on the adjustment willingness; if not, go to 614 and do not acquire the feature to be adjusted and the adjustment mode.
611: the automatic optimization module 203 determines the feature to be adjusted and the adjustment mode based on the adjustment will.
It can be appreciated that in the embodiment of the present application, when the automatic optimization module 203 determines that the adjustment intent of the user reaches the set adjustment threshold, the flow of automatically adjusting the digital mannequin parameter may be triggered, that is, the feature to be adjusted and the adjustment mode are determined based on the adjustment intent.
612: the automatic optimization module 203 adjusts the digital person and the virtual scene in the digital person video based on the feature to be adjusted and the adjustment mode, and obtains and stores the adjusted digital person and virtual scene.
613: the automatic optimization module 203 updates the adjusted digital person and virtual scene to the digital asset platform 21.
614: the automatic optimization module 203 does not perform the acquisition of the feature to be adjusted and the adjustment mode.
The application provides an electronic device, comprising: a memory and a processor; the memory is used for storing program instructions; the processor is configured to invoke the program instructions in the memory to cause the electronic device to perform the digital human video processing method described above.
The application provides a chip system, which is applied to electronic equipment comprising a memory, a display screen and a sensor; the chip system includes: a processor; the electronic device performs the digital human video processing method described above when the processor executes the computer instructions stored in the memory.
The present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes an electronic device to perform the digital human video processing method described above.
The present application provides a computer program product comprising: and executing the instructions, wherein the executing instructions are stored in the readable storage medium, and at least one processor of the electronic device can read the executing instructions from the readable storage medium, and the executing instructions are executed by the at least one processor to enable the electronic device to implement the digital human video processing method.
Referring now to fig. 7, shown is a block diagram of a system 1400 in accordance with one embodiment of the present application. Fig. 7 schematically illustrates an example system 1400 in accordance with various embodiments. In one embodiment, the system 1400 may include one or more processors 1404, system control logic 1408 coupled to at least one of the processors 1404, a system memory 1412 coupled to the system control logic 1408, a non-volatile memory (NVM) 1416 coupled to the system control logic 1408, and a network interface 1420 coupled to the system control logic 1408.
In some embodiments, the processor 1404 may include one or more single-core or multi-core processors. In some embodiments, the processor 1404 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In embodiments where the system 1400 employs an enhanced Node B (eNB) 101 or a radio access network (Radio Access Network, RAN) controller 102, the processor 1404 may be configured to perform the digital human video processing method described above.
In some embodiments, the system control logic 1408 may include any suitable interface controller to provide any suitable interface to at least one of the processors 1404 and/or any suitable device or component in communication with the system control logic 1408.
In some embodiments, the system control logic 1408 may include one or more memory controllers to provide an interface to the system memory 1412. The system memory 1412 may be used for loading and storing data and/or instructions. The memory 1412 of the system 1400 may include any suitable volatile memory, such as suitable Dynamic Random Access Memory (DRAM), in some embodiments.
NVM/memory 1416 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, NVM/memory 1416 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as at least one of a Hard Disk Drive (HDD), compact Disc (CD) Drive, digital versatile Disc (Digital Versatile Disc, DVD) Drive.
The NVM/memory 1416 may include a portion of the storage resources on the device mounting the system 1400 or it may be accessed by, but not necessarily part of, the apparatus. For example, NVM/storage 1416 may be accessed over a network via network interface 1420.
In particular, the system memory 1412 and NVM/storage 1416 may include: a temporary copy and a permanent copy of instructions 1424. The instructions 1424 may include: instructions that, when executed by at least one of the processors 1404, cause the system 1400 to implement the methods shown in fig. 3-4. In some embodiments, instructions 1424, hardware, firmware, and/or software components thereof may additionally/alternatively be disposed in system control logic 1408, network interface 1420, and/or processor 1404.
Network interface 1420 may include a transceiver to provide a radio interface for system 1400 to communicate over one or more networks to any other suitable devices (e.g., front end modules, antennas, etc.). In some embodiments, the network interface 1420 may be integrated with other components of the system 1400. For example, the network interface 1420 may be integrated into at least one of the processor 1404, the system memory 1412, the nvm/storage 1416, and a firmware device (not shown) having instructions which, when executed by at least one of the processor 1404, the system 1400 implements the digital human video processing methods described above.
The network interface 1420 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 1420 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.
In one embodiment, at least one of the processors 1404 may be packaged together with logic for one or more controllers of the system control logic 1408 to form a System In Package (SiP). In one embodiment, at least one of the processors 1404 may be integrated on the same die with logic for one or more controllers of the System control logic 1408 to form a System on Chip (SoC).
The system 1400 may further include: input/output (I/O) devices 1432. The I/O device 1432 may include a user interface to enable a user to interact with the system 1400; the design of the peripheral component interface enables peripheral components to also interact with the system 1400. In some embodiments, system 1400 further includes a sensor for determining at least one of environmental conditions and location information associated with system 1400.
In some embodiments, the user interface may include, but is not limited to, a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., still image cameras and/or video cameras), a flashlight (e.g., light emitting diode flash), and a keyboard.
In some embodiments, the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.
In some embodiments, the sensors may include, but are not limited to, gyroscopic sensors, accelerometers, proximity sensors, ambient light sensors, and positioning units. The positioning unit may also be part of the network interface 1420 or interact with the network interface 1420 to communicate with components of a positioning network, such as Global Positioning System (GPS) satellites.
Embodiments disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the present application may be implemented as a computer program or program code that is executed on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), microcontroller, application Specific Integrated Circuit (ASIC), or microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in the present application are not limited in scope to any particular programming language. In either case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed over a network or through other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including but not limited to floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared signal digital signals, etc.) in an electrical, optical, acoustical or other form of propagated signal using the internet. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some structural or methodological features may be shown in a particular arrangement and/or order. However, it should be understood that such a particular arrangement and/or ordering may not be required. Rather, in some embodiments, these features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of structural or methodological features in a particular figure is not meant to imply that such features are required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the present application, each unit/module is a logic unit/module, and in physical aspect, one logic unit/module may be one physical unit/module, or may be a part of one physical unit/module, or may be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logic unit/module itself is not the most important, and the combination of functions implemented by the logic unit/module is the key to solve the technical problem posed by the present application. Furthermore, to highlight the innovative part of the present application, the above-described device embodiments of the present application do not introduce units/modules that are less closely related to solving the technical problems presented by the present application, which does not indicate that the above-described device embodiments do not have other units/modules.
It should be noted that in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the present application.

Claims (10)

1. A digital human video processing method, comprising:
the method comprises the steps that first electronic equipment obtains feedback information of a user on a first digital human video, wherein the first digital human video comprises a plurality of video elements;
the first electronic equipment acquires characteristic feedback information corresponding to at least one video element from the feedback information;
and the first electronic equipment adjusts the corresponding video element based on the acquired characteristic feedback information and generates a second digital human video based on the adjusted video element.
2. The method of claim 1, wherein the video element comprises a digital person and a virtual scene, and wherein the characteristic feedback information of the video element comprises characteristic feedback information of the digital person and characteristic feedback information of the virtual scene.
3. The method according to claim 1 or 2, wherein said adjusting the corresponding video element based on the obtained characteristic feedback information comprises:
acquiring adjustment willingness of each user corresponding to each characteristic feedback information in the characteristic feedback information, and determining the characteristic to be adjusted and the adjustment mode of the video element based on the first adjustment willingness when the number of the first adjustment willingness in each user adjustment willingness reaches the set number;
And adjusting the corresponding video elements based on the characteristics to be adjusted and the adjustment mode.
4. The method according to claim 2, wherein the characteristic feedback information corresponding to at least one of the video elements is obtained from each of the feedback information; comprising the following steps:
acquiring keywords corresponding to the feedback information;
taking the feedback information corresponding to the corresponding keywords and the characteristics of the digital person as the characteristic feedback information of the digital person;
and taking the feedback information corresponding to the corresponding keywords and the characteristics of the virtual scene as the characteristic feedback information of the virtual scene.
5. The method of claim 1 or 2, wherein the feedback information includes bullet screen information, post information, comment information, and questionnaire information transmitted by the user.
6. The method of claim 2, wherein the characteristic feedback information of the digital person comprises appearance characteristic feedback information of the digital person, stature characteristic feedback information of the digital person, and figuration characteristic feedback information of the digital person.
7. The method of claim 2, wherein the feature feedback information of the virtual scene comprises color feature feedback information, light feature feedback information, and layout feature feedback information of the virtual scene.
8. The method according to claim 1, characterized in that it comprises: and when the next playing instruction is acquired, the second digital human video is sent to the second electronic equipment.
9. An electronic device, comprising: a memory for storing instructions for execution by one or more processors of the electronic device, and the processor, which is one of the one or more processors of the electronic device, for performing the digital human video processing method of any one of claims 1 to 8.
10. A readable storage medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the digital human video processing method of any of claims 1-8.
CN202210763176.7A 2022-06-29 2022-06-29 Digital human video processing method, electronic equipment and medium Pending CN117376597A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210763176.7A CN117376597A (en) 2022-06-29 2022-06-29 Digital human video processing method, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210763176.7A CN117376597A (en) 2022-06-29 2022-06-29 Digital human video processing method, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN117376597A true CN117376597A (en) 2024-01-09

Family

ID=89400906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210763176.7A Pending CN117376597A (en) 2022-06-29 2022-06-29 Digital human video processing method, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN117376597A (en)

Similar Documents

Publication Publication Date Title
US10721527B2 (en) Device setting adjustment based on content recognition
CN109379636B (en) Bullet screen processing method, device and system
CN108184144B (en) Live broadcast method and device, storage medium and electronic equipment
CN109121007B (en) Movie content recommendation method based on multi-face recognition, smart television and system
US20170289589A1 (en) Live video classification and preview selection
US10198846B2 (en) Digital Image Animation
US10541000B1 (en) User input-based video summarization
CN107566907A (en) video clipping method, device, storage medium and terminal
CN108156522A (en) Homepage content method for pushing, device and computer readable storage medium
CN113766296B (en) Live broadcast picture display method and device
US20170048597A1 (en) Modular content generation, modification, and delivery system
CN111242682B (en) Article display method
CN112235635B (en) Animation display method, animation display device, electronic equipment and storage medium
CN108171160B (en) Task result identification method and device, storage medium and electronic equipment
CN110033502B (en) Video production method, video production device, storage medium and electronic equipment
WO2020093798A1 (en) Method and apparatus for displaying target image, terminal, and storage medium
CN114245228B (en) Page link release method and device and electronic equipment
CN111491187A (en) Video recommendation method, device, equipment and storage medium
CN114697703B (en) Video data generation method and device, electronic equipment and storage medium
US10535192B2 (en) System and method for generating a customized augmented reality environment to a user
CN112019890B (en) Live broadcast distribution method, device, server and medium
CN113518233A (en) Cover display method and device, electronic equipment and storage medium
US10936878B2 (en) Method and device for determining inter-cut time range in media item
CN113282770A (en) Multimedia recommendation system and method
CN117376597A (en) Digital human video processing method, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication