CN113691860A

CN113691860A - UGC media content generation method, device, equipment and storage medium

Info

Publication number: CN113691860A
Application number: CN202110811872.6A
Authority: CN
Inventors: 黄旭; 潘兴德
Original assignee: Beijing Panoramic Sound Information Technology Co ltd
Current assignee: Beijing Panoramic Sound Information Technology Co ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-11-23
Anticipated expiration: 2041-07-19
Also published as: CN113691860B

Abstract

The invention provides a method, a device, equipment and a storage medium for generating UGC media content, wherein the method comprises the steps of selecting first media content from PGC media content, analyzing the selected content in the first media content according to a preset analyzing mode to obtain analyzed media content, wherein the analyzing mode corresponds to the type of the selected content, editing the analyzed media content to obtain edited media content, and encapsulating the edited media content according to a preset encapsulating mode to obtain the UGC media content, thereby solving the problems of poor quality of the generated UGC media content and single content caused by the fact that each audio component contained in the audio component cannot be completely disassembled in the prior art, and the diversity and interactivity of UGC media content are enriched.

Description

UGC media content generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of digital media production, and in particular, to a method, an apparatus, a device, and a storage medium for generating UGC media content.

Background

At present, the audio and video field is developing vigorously, and the media contents (such as movies, TV shows, integrated art, self-media, etc.) with different shapes and colors are produced by a media production tool, so that rich audio and video experience is provided for people. Media content can be divided into two broad categories: the first type is PGC (Professional Generated Content), and the second type is UGC (User Generated Content). In recent years, PGC and UGC have been combined primarily, and users can use PGC programs distributed on the internet to perform simple secondary authoring (such as dubbing, ghost stock, accompaniment, etc.), so that media contents are more diversified.

However, in the prior art, when the UGC media content is produced by the PGC media content, each audio component contained in the PGC media content cannot be completely separated, so that a single audio component (such as dialogue, music, environmental sound, etc.) cannot be deleted or replaced, and only the single audio component can be completely retained or completely deleted, thereby causing a problem of poor quality of the produced UGC media content.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for generating UGC (user generated content) media content, which are used for solving the problems of poor quality and single content of the generated UGC media content in the prior art because each audio component contained in the UGC media content cannot be completely disassembled.

In one aspect, the present invention provides a method for generating UGC media content, including:

selecting first media content from PGC media content, wherein the first media content can comprise one content segment, a plurality of content segments or all content segments in the PGC media content;

analyzing the selected content in the first media content according to a preset analysis mode to obtain analyzed media content, wherein the analysis mode corresponds to the type of the selected content;

editing the analyzed media content to obtain edited media content;

and packaging the edited media content according to a preset packaging mode to obtain UGC media content.

Optionally, the selecting the first media content from the PGC media contents includes:

selecting first media content from PGC media content in a mode of selected starting time and ending time; alternatively, the first and second electrodes may be,

and selecting the first media content from the PGC media contents in a mode of the selected starting time and the selected duration.

Optionally, the selected content may include all or part of the content of the first media content;

the analyzing the selected content in the first media content according to a preset analyzing mode to obtain an analyzed media content, wherein the analyzing mode corresponds to the category of the selected content, and the analyzing method comprises the following steps:

acquiring the category contained in all or part of the first media content;

and analyzing the content of each category in all or part of the first media content according to the analysis mode corresponding to each category to obtain analyzed media content, and taking the rest of the first media content as unresolved media content.

Optionally, the category includes audio content, video content, or audio-video content, and the parsing manner includes: audio decoding, video decoding or audio and video demultiplexing;

the analyzing all the content or the content of each category in part of the content of the first media content according to the analyzing mode corresponding to each category to obtain the analyzed media content, and taking the rest content of the first media content as the unresolved media content, includes:

when the selected content in the first media content only comprises audio content, carrying out audio decoding on the selected content to obtain audio data and/or auxiliary data;

when the selected content in the first media content only comprises video content, performing video decoding on the selected content to obtain video data;

and when the selected content in the first media content comprises audio and video content, performing audio and video multiplexing on the selected content to obtain first audio content and first video content, and performing audio decoding and/or video decoding on the first audio content and the first video content based on an editing requirement.

Optionally, the audio decoding and/or video decoding the first audio content and the first video content based on the editing requirement includes:

if the editing requirement comprises editing the first audio content and the first video content is not edited, determining that the selected content can comprise part of the first media content, performing audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and taking the first video content as unresolved media content;

if the editing requirement comprises editing the first audio content and the first video content, and the selected content is determined to include all the content of the first media content, performing audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and performing video decoding on the first video content to obtain first video data.

Optionally, the editing the analyzed media content to obtain an edited media content includes:

and editing the first audio data and/or the first auxiliary data and/or the first video data to generate edited media content, wherein the edited media content comprises second audio data modified based on the first audio data and/or second auxiliary data modified based on the first auxiliary data and/or second video data modified based on the first video data.

Optionally, the encapsulation mode includes audio coding, video coding, or audio-video multiplexing;

the step of encapsulating the edited media content according to a preset encapsulation mode to obtain UGC media content includes:

packaging the edited media content to obtain second media content;

if the second media content only comprises audio content, audio coding is carried out on the audio data and/or the auxiliary data in the second media content to obtain UGC media content;

if the second media content only comprises video content, video coding is carried out on video data in the second media content to obtain UGC media content;

if the second media content comprises audio and video content, audio coding is carried out on audio data and/or auxiliary data in the second media content to obtain coded audio content, video coding is carried out on video data in the second media content to obtain coded video content, and the coded audio content and the coded video content are multiplexed into UGC media content.

Optionally, the method further comprises:

and packaging all or part of the unresolved media content to obtain UGC media content.

In another aspect, the present invention provides a UGC media content generation apparatus, including:

the system comprises a selecting module, a selecting module and a content selecting module, wherein the selecting module is used for selecting first media content from PGC media content, and the first media content can comprise one content segment, a plurality of content segments or all content segments in the PGC media content;

the analysis module is used for analyzing the selected content in the first media content according to a preset analysis mode to obtain analyzed media content, and the analysis mode corresponds to the category of the selected content;

the editing module is used for editing the analyzed media content to obtain edited media content;

and the packaging module is used for packaging the edited media content according to a preset packaging mode to obtain UGC media content.

In another aspect, the present invention provides a UGC media content generating device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory to cause the at least one processor to perform the method for generating UGC media content described above.

In another aspect, the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the method for generating the UGC media content is implemented.

In another aspect, the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements the method of generating UGC media content as described above.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic structural diagram of a system for generating UGC media content according to an embodiment of the present invention;

fig. 2 is a schematic view of an application scenario implemented by a generation system based on UGC media content according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a method for generating UGC media content according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of another UGC media content generation method according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of an analysis method according to an embodiment of the present invention;

fig. 6 is a schematic flow chart of another analysis method according to an embodiment of the present invention;

fig. 7 is a schematic flowchart of another parsing method according to an embodiment of the present invention;

fig. 8 is a schematic flowchart of another parsing method according to an embodiment of the present invention;

fig. 9 is a flowchart illustrating an editing method according to an embodiment of the present invention;

FIG. 10 is a flowchart illustrating another editing method according to an embodiment of the present invention;

FIG. 11 is a flowchart illustrating another editing method according to an embodiment of the present invention;

fig. 12 is a schematic flowchart of a packaging method according to an embodiment of the present invention;

fig. 13 is a schematic flow chart of another packaging method according to an embodiment of the present invention;

fig. 14 is a schematic flow chart of another packaging method according to an embodiment of the present invention;

fig. 15 is a schematic flow chart of another packaging method according to an embodiment of the present invention;

fig. 16 is a schematic flow chart of another packaging method according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of another UGC media content generation apparatus according to an embodiment of the present invention;

fig. 18 is a block diagram of a system for generating UGC media content according to an embodiment of the present invention.

With the above figures, certain embodiments of the invention have been illustrated and described in more detail below. The drawings and the description are not intended to limit the scope of the inventive concept in any way, but rather to illustrate it by those skilled in the art with reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The terms to which the present invention relates will be explained first:

professional production content: the Content includes movies, synthesis, dramas, etc., and the radio and television practitioners collect, edit, encode and output audios and videos, and the main production means is a Professional audio and video production tool (such as Pro Tools, Nuendo, Adobe Premiere, etc.). For audio, content diversity is particularly obvious, and due to different technical characteristics of different audio codecs, the production tool can implement production of traditional multi-channel audio (such as MP3, AAC, AC3, AC4, WANOS, AVS, FLAC, APE, and the like), and can add various auxiliary data (such as spatial position of audio, reverberation parameter, rendering angle, and the like), implement audio production based on objects (such as ATMOS, WANOS, AVS, MPEG-H, and the like) and scenes (such as FOA, HOA, and the like), so as to produce PGC media content with more stereoscopic sensation, immersion feeling, and higher quality.

The method mainly includes that a User generates Content (UGC for short), the Content is mainly self-media, the User records materials through mobile equipment such as a mobile phone and a tablet, and then clips the materials through a professional or non-professional audio and video making tool to make a stream media work with distinct personality, and the stream media work can be shared on the Internet at the first time. The user can also download the works released by other people, and secondary creation is performed on the basis of the works, so that the interactive and operability are strong.

Audio coding: PCM audio sample data (uncompressed data or data corresponding to an "audio waveform") is compressed into audio byte stream data of a certain format.

Audio decoding: the compressed byte stream data is parsed into PCM audio sample data.

Video coding: video pixel data (uncompressed data, data corresponding to each frame of picture in a video) such as RGB and YUV are compressed into video byte stream data of a certain format.

Video decoding: and analyzing the compressed video byte stream data into video pixel data such as RGB, YUV and the like.

Audio and video multiplexing: the encoded/compressed audio data and the encoded/compressed video data are encapsulated into a media container (e.g., MP4, TS, etc.) to form a media container byte stream.

Audio and video demultiplexing: the media container byte stream is parsed into compressed audio data and compressed video data.

Secondly, a brief introduction is made to the way of generating UGC media content based on PGC media content and the existing defects in the prior art:

at present, the method for generating UGC media content based on PGC media content mainly includes that a user uses a PGC program published on the internet to perform simple secondary creation (such as dubbing, ghost stock, accompaniment and the like), so that the generated UGC media content is more diversified. However, the current way of generating UGC media content based on PGC media content has obvious limitations, which are specifically embodied in that:

UGC audio quality is not generally high. If a user wants to use a PGC program as a material to produce a personalized UGC program, the PGC program already distributed on the internet can only be used as a creation material, and these programs are both stereo versions mixed by professionals, and cannot completely separate each audio component contained therein, so that a single audio component (such as dialogue, music score, environmental sound, etc.) cannot be deleted or replaced, and only all the audio components can be retained or deleted. For example, if a user wants to replace a part of dialog segment of a PGC program with his/her dubbing, the ambient sound and the background music contained in the replaced dialog segment will be deleted at the same time, which results in discontinuity of the ambient sound and the background music in the content, and thus causes a problem of poor quality of the generated UGC audio.

Secondly, the number of tracks supported by the current UGC production tool is very limited, and a user can only put audio materials on at most 2 tracks, namely, the audio materials are overlapped with the original PGC program materials, and meanwhile, any auxiliary data cannot be added, so that the user can only make stereo audio, cannot make multi-channel audio based on objects and scenes, the real creative intention of the user cannot be comprehensively displayed, and the generated UGC media content is single.

Therefore, the present invention provides a method for generating UGC media content, which can be freely created on the basis of PGC media content, and a user can use the entire PGC media content by the method for generating UGC media content, or intercept one or more content segments of PGC media content, and edit the content segments at will according to his production intention, thereby making multi-channel, object-based and scene-based UGC media content, and thus solving the problems of poor quality of generated UGC media content and single content caused by the inability to completely disassemble each audio component contained in the prior art, and simultaneously enriching the diversity and interactivity of the UGC media content.

The following introduces an architectural schematic diagram of a UGC media content generation system provided by an embodiment of the present invention:

fig. 1 is a schematic structural diagram of a system for generating UGC media content according to an embodiment of the present invention, where the system 100 for generating UGC media content includes: extraction section 101, analysis section 102, editing section 103, and packaging section 104.

The extracting unit 1 is configured to select a first media content from PGC media contents, where the first media content may include one content segment, multiple content segments, or all content segments in the PGC media contents.

The analysis unit 2 is configured to analyze the selected content in the first media content according to a preset analysis manner, so as to obtain an analyzed media content, where the analysis manner corresponds to a category of the selected content.

The editing unit 3 is configured to edit the analyzed media content to obtain an edited media content.

And the packaging unit 4 is used for packaging the edited media content according to a preset packaging mode to obtain UGC media content.

The following introduces an application scenario for generating the UGC media content based on the above-mentioned UGC media content generation system:

as shown in fig. 2, the PGC media content S1 is input to the extracting unit 101, and at the same time, the entire content or a part of the content of the PGC media content S1 may also be input to the encapsulating unit 104; the extracting unit 101 extracts the PGC media content S1 to obtain a first media content S2, and inputs the first media content S2 to the parsing unit 102, wherein the first media content S2 may be all or part of the PGC media content S1; the parsing unit 102 parses all or part of the first media content S2 to obtain parsed media content S3; the parsed media content S3 is input to the editing unit 103, while the unresolved media content S4 may be input to the packaging unit 104 or directly discarded (dashed line in fig. 2); the editing unit 103 performs editing operation on the analyzed media content S3 to obtain edited media content S5, and inputs the edited media content S5 to the encapsulating unit 104; the packaging unit 104 packages all or part of the edited media content S5 to obtain the final UGC media content S6, and the packaging unit 104 may further package all or part of the PGC media content S1 and/or the unresolved media content S4 to obtain the UGC media content S6.

In this scenario, the extracting unit 101 selects all or part of the PGC media content S1 to obtain a first media content S2, where the extracting method includes: the first media content S2 is selected from the PGC media content S1 by setting a start time and an end time or selecting a start time and a duration of the first media content S2, wherein the first media content S2 may be the PGC media content S1 itself, a segment of the PGC media content S1, or a combination of multiple segments of the PGC media content S1.

In this scenario, the parsing unit 102 parses all or part of the media content S2 to obtain parsed media content S3 and/or parsed media content S4, where the parsing includes: audio decoding, video decoding, audio-video demultiplexing, and the like.

In this scenario, the editing unit 103 may perform an editing operation on the parsed media content S3 according to the editing requirement of the user, so as to obtain an edited media content S5. The editing manner includes, but is not limited to, adding, deleting, replacing, and the like.

In this scenario, the encapsulation manner of the encapsulation unit 104 includes audio encoding, video encoding, audio and video multiplexing, and the like. For example, when it is determined that the edited media content includes only audio content, the packaging unit 104 needs to package the edited media content S5 by audio coding.

In the application scenario of the generation of the UGC media content, a first media content is selected from the PGC media content, the first media content may include one content segment, a plurality of content segments or all content segments in the PGC media content, and according to a preset parsing manner, analyzing the selected content in the first media content to obtain analyzed media content, wherein the analyzing mode corresponds to the category of the selected content, editing the analyzed media content to obtain edited media content, and according to a preset packaging mode, and packaging the edited media content to obtain UGC media content, thereby solving the problems of poor quality and single content of the generated UGC media content caused by the fact that each audio component contained in the audio component cannot be completely disassembled in the prior art, and simultaneously enriching the diversity and interactivity of the UGC media content.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 3 is a schematic flowchart of a method for generating UGC media content according to an embodiment of the present invention, and as shown in fig. 3, the method according to the embodiment may include:

s101, selecting first media content from PGC media content.

In this step, the first media content may include one content segment, a plurality of content segments, or all of the content segments in the PGC media content.

In this embodiment of the present invention, the first media content may include all or part of the PGC media content, for example, the PGC media content includes 10 content segments, and the first media content may include 5 content segments.

S102, analyzing the selected content in the first media content according to a preset analyzing mode to obtain the analyzed media content.

In this step, the preset parsing manner includes, but is not limited to, audio decoding, video decoding, or audio-video demultiplexing. The specific analysis method is determined according to the category of the selected content in the first media content, that is, the analysis method corresponds to the category of the selected content.

In this embodiment of the present invention, for example, the content category may include audio content, video content, or audio/video content, and when the content selected in the first media content includes the audio content, the content selected in the first media content is subjected to audio decoding to obtain the parsed media content.

S103, editing the analyzed media content to obtain the edited media content.

In this step, the editing mode can be set according to the user requirement.

In the embodiment of the present invention, for example, new audio data is added to the parsed media content by importing a file, recording, adding a special sound effect, and the like, to obtain the edited media content, and other ways may also be included in addition, which is not limited in this disclosure.

And S104, packaging the edited media content according to a preset packaging mode to obtain UGC media content.

In this step, the predetermined encapsulation manner includes, but is not limited to, audio encoding, video encoding, or audio-video multiplexing. Specifically, what packaging method is adopted needs to be determined according to the category of the content included in the edited media content, that is, the packaging method corresponds to the content included in the edited media content.

In the embodiment of the present invention, for example, the content category may include audio content, video content, or audio-video content, and when the content included in the edited media content only includes the audio content, the edited media content is subjected to audio coding to obtain the UGC media content.

In the embodiment of the UGC media content generation method provided by the invention, a first media content is selected from PGC media content, the first media content may include one content segment, a plurality of content segments or all content segments in the PGC media content, the selected content in the first media content is analyzed according to a preset analysis mode to obtain an analyzed media content, the analysis mode corresponds to the category of the selected content, the analyzed media content is edited to obtain an edited media content, and the edited media content is packaged according to a preset packaging mode to obtain the UGC media content, thereby solving the problems of poor quality of the generated UGC media content and single content caused by that each audio component contained in the prior art cannot be completely disassembled, and the diversity and interactivity of UGC media content are enriched.

Fig. 4 is a flowchart illustrating another UGC media content generation method according to an embodiment of the present invention, and as shown in fig. 4, the method according to this embodiment may include:

s201, selecting a first media content from the PGC media contents, where the first media content may include one content segment, multiple content segments, or all content segments in the PGC media contents.

In the embodiment of the present invention, as an alternative, step S201 may include:

s2011, the first media content is selected from the PGC media contents by the selected start time and end time.

In the embodiment of the present invention, specifically, an implementation process of selecting a first media content from PGC media contents may be implemented by obtaining a total duration of the PGC media contents, determining a start time and an end time of at least one specified segment to be extracted from the PGC media contents, and using a content segment formed by the at least one start time and the end time as the first media content, where when the first media content includes a plurality of content segments, the plurality of content segments may be different, partially overlapped, or completely different, and a content duration corresponding to the start time and the end time of each content segment is less than or equal to the total duration.

That is, if a user wants to extract a content segment from the PGC media content, it is necessary to determine the start/end time of extracting the content segment, and use the content segment formed by the start/end time as the first media content, that is, the first media content includes only one content segment at this time. If a user wants to extract a plurality of content segments from PGC media content, the start and end times of each content segment need to be determined, and a plurality of content segments are formed based on the plurality of start and end times, that is, the first media content at this time includes a plurality of content segments. If the user wants to extract all the specified segments from the PGC media content, that is, obtain the entire PGC media content, the content segments formed by the total duration are the first media content, that is, the first media content at this time includes all the content segments.

For example, the total duration of the PGC media content is denoted as T, the start time and the end time of a certain specified segment that needs to be extracted are determined from the PGC media content as T1 and T2, and a content segment [ T1, T2] formed by at least one of the start time and the end time is used as the first media content, that is, the first media content at this time includes a content segment [ T1, T2 ].

Further, the above operations may be repeated to select a plurality of content segments from the PGC media content, and the plurality of content segments are used as the first media content. For example, a content segment [ T1, T2] and a content segment [ T3, T4] are selected from PGC media content, wherein the content segment [ T1, T2] and the content segment [ T3, T4] can be different, partially overlapped or completely different from each other, and the content duration corresponding to the content segment [ T1, T2] and the content segment [ T3, T4] is less than or equal to the total duration T, i.e. 0 ≦ T1 ≦ T2 ≦ T, 0 ≦ T3 ≦ T4 ≦ T. For example, the PGC media content is 60s, the content segments [10, 20] and the content segments [15, 30] are selected from the PGC media content, and the content segments [10, 20] and the content segments [15, 30] are used as the first media content, where the content segments [10, 20] indicate that the start time of the content segment is 10s, the end time is 20s, the content segments [15, 30] indicate that the start time of the content segment is 15s, and the end time is 30s, the two content segments are partially overlapped, and not only the start time of each content segment is less than the end time, which is less than the total duration (0 < 10 < 20 < T, 0 < 15 < 30 < T), but also the content duration corresponding to the content segments [ T1, T2] is 10s, and the content duration corresponding to the content segments [ T3, T4] is 15s, which are both less than the total 60 s.

Alternatively, as an extreme case, when T1 is 0 and T2 is T, it indicates that all the content of the PGC media content is extracted from the PGC media content, and it is understood that the first media content at this time is equal to the PGC media content.

S2012, selecting the first media content from the PGC media contents in the mode of the selected starting time and the selected duration.

In an embodiment of the present invention, specifically, a total duration of the PGC media content may be obtained, and then a start time and a duration of at least one specified segment that needs to be extracted from the PGC media content are used as the first media content, where when the first media content includes a plurality of content segments, the plurality of content segments may be different, partially overlapped, mutually included, or completely different, and a content duration corresponding to each content segment is less than or equal to the total duration.

For example, the PGC media content is 60s, the start time of a content segment selected from the PGC media content is 10s, the duration is 10s, the start time of another content segment is 15s, and the duration is 20s, then the content segment [10, 10] and the content segment [15, 20] are taken as the first media content, where the content segment [10, 10] indicates that the start time of the content segment is 10s, the duration is 10s, the content segment [15, 20] indicates that the start time of the content segment is 15s, and the duration is 20s, the two content segments are partially overlapped, and the content duration corresponding to the content segment [10, 10] is 10s, the content duration corresponding to the content segment [15, 20] is 20s, both of which are less than the total duration 60 s.

In the embodiment of the present invention, through the two manners of selecting the first media content in step S2011 or step S2012, the duration of the content segment that is most needed can be accurately controlled while the content segment that is selected by the user according to the user' S own needs is satisfied, so that the subsequent editing operation can be directly performed.

S202, acquiring the category contained in all or part of the first media content.

In this step, the category includes audio content, video content, or audiovisual content, wherein audiovisual content includes both audio content and video content.

In the embodiment of the present invention, all or part of the content of the first media content may include only audio content, only video content, or audio-video content. The method comprises the steps of classifying all contents or part of contents of the first media content into categories so as to analyze the contents of each category in the all contents or the part of contents of the first media content respectively based on an analysis mode corresponding to each category in the following process, and obtaining the analyzed media content.

In the embodiment of the invention, the first media content is classified so that different analysis modes can be set for different types of content in the following steps, thereby improving the analysis efficiency.

S203, analyzing the content of each category in the whole content or part of the content of the first media content according to the analysis mode corresponding to each category to obtain the analyzed media content, and taking the rest content of the first media content as the unresolved media content.

In this step, the parsing means includes audio decoding, video decoding, or audio-video demultiplexing. The analysis mode corresponding to the audio content is audio decoding, the analysis mode corresponding to the video content is video decoding, and the analysis mode corresponding to the audio and video content is audio and video demultiplexing.

In the embodiment of the present invention, as an alternative, step S203 may specifically include:

s2031, when the selected content in the first media content includes only audio content, performing audio decoding on the selected content to obtain audio data and/or auxiliary data.

In this embodiment of the present invention, when the selected content in the first media content includes only audio content, the audio content includes uncompressed audio data (e.g., PCM), auxiliary data (e.g., spatial coordinates, equalization parameters, reverberation order, etc.), and in addition, the audio content may further include other data, which is not limited in this disclosure. In addition, the audio format corresponding to the audio content includes, but is not limited to, a channel-based audio format (e.g., MP3, AAC, AC3, AC4, WANOS, AVS, FLAC, APE, etc.), an object-based audio format (e.g., ATMOS, WANOS, AVS, MPEG-H), a scene-based audio format (e.g., FOA, HOA), and the like.

Alternatively, in an actual application, as shown in fig. 5, for example, when the first media content S2 only includes audio content, the parsing unit 102 performs audio decoding on the first media content S2 to obtain the audio data S9 and/or the auxiliary data S10, where the parsed media content S3 is the audio data S9+ the auxiliary data S10, and does not include the unresolved media content at this time.

S2032, when the selected content in the first media content only includes video content, performing video decoding on the selected content to obtain video data.

In the embodiment of the present invention, the video content includes video data, and in addition, the video content may further include other data, which is not limited in the present invention. In addition, the video format corresponding to the video content includes, but is not limited to, microsoft video (e.g., WMV, ASF, ASX), Real Player (e.g., RM, RMVB), MPEG video (e.g., MP4), cell phone video (e.g., 3GP), Apple video (e.g., MOV, M4V), other common video (e.g., AVI, DAT, MKV, FLV, VOB, etc.).

Alternatively, in practical applications, as shown in fig. 6, when the first media content S2 only includes video content, the parsing unit 102 performs video decoding on the first media content S2 to obtain video data S11, where the parsed media content S3 is equal to the video data S11, and no unresolved media content is included at this time.

S2033, when the selected content in the first media content comprises audio and video content, performing audio and video multiplexing on the selected content to obtain first audio content and first video content, and performing audio decoding and/or video decoding on the first audio content and the first video content based on editing requirements.

In the embodiment of the present invention, if the editing requirement includes editing the first audio content and the first video content is not edited, it is determined that the selected content may include a part of the first media content, the first audio content is subjected to audio decoding to obtain first audio data and/or first auxiliary data, and the first video content is used as an unresolved media content; if the editing requirement comprises editing the first audio content and the first video content, and the selected content is determined to include all the content of the first media content, performing audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and performing video decoding on the first video content to obtain first video data.

Optionally, in an actual application, in step S2033, when the first media content S2 includes audio content and video content at the same time, the parsing unit 102 performs audio/video demultiplexing on the first media content to obtain the audio content and the video content, and then performs audio decoding and/or video decoding according to an editing requirement of the editing unit 103, where a specific implementation process may include the following two scenarios:

in one scenario, as shown in fig. 7, if the editing unit 103 only needs to edit the audio content, the parsing unit 102 obtains the audio content S7 and the video content S8 by demultiplexing the audio and video, performs audio decoding on the audio content S7 according to the editing requirement (only needs to edit the audio content), obtains the audio data S9 and/or the auxiliary data S10, and does not perform decoding on the video content S8, at this time, the parsed media content S3 is the audio data S9+ the auxiliary data S10, and the parsed media content S4 is the video content S8. It should be noted that, the present invention determines the parsing object by considering the editing requirement of the editing unit 103, that is, the purpose of this step is still to perform audio decoding and/or video decoding on the first audio content and the first video content, that is, parsing on the first audio content and/or the first video content, and therefore this step is still performed by the parsing unit 102.

In another scenario, as shown in fig. 8, if the editing unit 103 needs to edit audio and video simultaneously, the audio and video are demultiplexed, and then the audio content S7 is decoded into audio data S9 and/or auxiliary data S10, and the video content S8 is decoded into video data S11, at this time, the parsed media content S3 is audio data S9+ auxiliary data S10+ video data S11, and the unresolved media content S4 has no content.

It should be noted that, besides, the implementation process of step S2033 may also include other scenarios, which are not limited by the present invention. For example, when the editing unit 103 only needs to edit the video content, the parsing unit 102 first obtains the audio content S7 and the video content S8 by demultiplexing the audio and video, and then performs video decoding on the video content S8 according to the editing requirement (only needs to edit the video content), so as to obtain the video data S11, and the audio content S7 does not perform decoding, at this time, the parsed media content S3 is video data S11, and the parsed media content S4 is audio content S7.

In the embodiment of the invention, the first media content is classified, so that different analysis modes are set for different types of content, accurate analysis is ensured, and the analysis efficiency can be improved.

And S204, editing the analyzed media content to obtain the edited media content.

In this embodiment of the present invention, based on the execution results of steps S2031 to S2033, step S204 may specifically include: and editing the first audio data and/or the first auxiliary data and/or the first video data to generate edited media content, wherein the edited media content comprises second audio data modified based on the first audio data and/or second auxiliary data modified based on the first auxiliary data and/or second video data modified based on the first video data.

The editing mode may include audio editing and/or video editing, wherein the editing operation corresponding to the editing mode includes but is not limited to any combination of the following modes: changing a start time and/or an end time of the entire content or a part of the content of the first audio data and/or the first video data; adding new audio data into the first audio data by means of file importing, recording, special sound effects (such as applause, laughter and the like) adding and the like; deleting the whole content or part of the content of the first audio data and/or the first auxiliary data and/or the first video data; adding new video data into the first video data in a mode of importing a video file and the like; auxiliary data (e.g., spatial coordinates, equalization parameters, reverberation order, etc.) that changes the entire content or a portion of the content of the first auxiliary data; the auxiliary data is added to the first auxiliary data by manually writing auxiliary information, importing a configuration file, and the like, and the above editing operation can be repeated.

Optionally, in practical applications, step S204 may include the following scenarios:

in a scenario, as shown in fig. 9, the editing unit 103 performs an editing operation on the first audio content (the first audio data S9, the first auxiliary data S10) to obtain edited audio data S12 and/or edited auxiliary data S13, where the editing operation for the scenario may include, but is not limited to, adding new audio data in the first audio data S9 by importing a file, recording a sound, adding a special sound effect, and adding auxiliary data to the first auxiliary data by manually writing auxiliary information, importing a configuration file, and the like. Note that, in this scenario, when the editing unit 103 edits only the audio content, the edited media content S5 is the edited audio data S12 (channel-based audio) or the edited media content S5 is the edited audio data S12+ edited auxiliary data S13 (object-and-scene-based audio).

In another scenario, as shown in fig. 10, the editing unit 103 performs an editing operation on the first video content (the first video data S11) to obtain edited video data S14, where the editing operation for the scenario may include adding new video data to the first video data S11 by importing a video file, and the like. In this scenario, when editing section 103 edits only the first video content, edited media content S5 is edited video data S14.

In another scenario, as shown in fig. 11, the editing unit 103 performs an editing operation on the first audio content (the first audio data S9, the first auxiliary data S10) and the first video content (the first video data S11), resulting in edited audio data S12 and/or edited auxiliary data S13 and edited video data S14. Wherein, the editing operation of the scene may include but is not limited to: new audio data is added to the first audio data S9 by importing a file, recording, adding special sound, and the like, auxiliary data is added to the first auxiliary data by manually writing auxiliary information, importing a profile, and the like, and new video data is added to the first video data S11 by importing a video file, and the like. Note that, in this scenario, when the editing unit 103 edits the first audio content and the first video content at the same time, the edited media content S5 is the first audio data S12+ the first video data S14 (channel-based audio) or the edited media content S5 is the first audio data S12+ the first auxiliary data S13+ the first video data S14 (object-and-scene-based audio).

And S205, packaging the edited media content according to a preset packaging mode to obtain UGC media content.

In the embodiment of the present invention, as an alternative, step S205 may specifically include:

s2051, packaging the edited media content to obtain a second media content.

In this embodiment of the present invention, for example, the packaging unit 104 packages the edited data S5 into the second media content S6, the packaging manner includes but is not limited to: audio coding, video coding, audio-video multiplexing, and the like.

And S2052, if the second media content only comprises audio content, carrying out audio coding on the audio data and/or the auxiliary data in the second media content to obtain UGC media content.

In this step, when the encapsulation method includes audio coding, the coding method corresponding to the audio coding includes, but is not limited to: a regular encoding scheme (encoding all audio data and/or auxiliary data), an incremental encoding scheme (encoding only modified audio data and/or auxiliary data), and the like.

In the embodiment of the present invention, as shown in fig. 12, when the second media content S6 only contains audio content, the encapsulating unit 104 performs audio encoding on audio data and/or auxiliary data in the second media content S6 (for example, the media content obtained after editing based on the step S204 includes the first audio data S12 and/or the first auxiliary data S13, so as to encapsulate the first audio data S12 and/or the first auxiliary data S13 into the second media content, that is, the second media content includes the first audio data S12 and/or the first auxiliary data S13), so as to obtain the UGC audio content S15, and finally outputs the UGC audio content S15, where the UGC media content S15 is the second media content S6, and the UGC media content S15 at this time may also be referred to as UGC audio content.

Further, in addition to the application scenario of fig. 12 above, the following scenarios may be included based on step S2052: as shown in fig. 13, if the second media content includes only audio content, the encapsulating unit 104 obtains the encoded audio content S16 by performing audio encoding on the audio data and/or the auxiliary data in the second media content (for example, the first audio data S12 and/or the first auxiliary data S13), and performs audio-video multiplexing on the unresolved media content S4 and the encoded audio content S16 to obtain the UGC media content S15, and finally outputs the UGC media content S15 equal to the second media content S6+ the unresolved media content S4.

And S2053, if the second media content only comprises video content, video coding is carried out on the video data in the second media content to obtain UGC media content.

In the embodiment of the present invention, as shown in fig. 14, when the second media content S6 only includes video content, the encapsulating unit 104 performs video encoding on the video content (e.g., the first video data S14) in the second media content S6 to obtain the UGC video content S15, and finally outputs the UGC video content S15, where the UGC media content S15 is the second media content S6, and the UGC media content S15 at this time may also be called as the UGC video content.

S2054, if the second media content comprises audio and video content, audio coding is carried out on the audio data and/or the auxiliary data in the second media content to obtain coded audio content, video coding is carried out on the video data in the second media content to obtain coded video content, and the coded audio content and the coded video content are multiplexed into UGC media content.

In this embodiment of the present invention, as shown in fig. 15, when the second media content S6 includes both audio content and video content, the encapsulating unit 104 first performs audio coding on the audio content (e.g., the first audio data S12 and/or the first auxiliary data S13) in the second media content S6 to obtain the encoded audio content S16, performs video coding on the video content (e.g., the first video data S14) in the second media content S6 to obtain the encoded video content S17, and multiplexes the encoded audio content S16 and the encoded video content S17 into the UGC media content S15 and finally outputs the UGC media content S15 ═ the encoded audio content S16+ the encoded video content S17, that is, the UGC media content S15 ═ the second media content S6.

In the embodiment of the invention, the media contents are classified, and different packaging modes are set for different types of contents, so that the accurate packaging of each type of contents is ensured, and the packaging efficiency can be improved.

It should be noted that, besides the schemes in fig. 12 to fig. 15, the UGC media content may be obtained through other schemes, which is not limited in the embodiment of the present invention. For example, the scheme may further include:

s206, packaging all or part of the unresolved media content to obtain UGC media content.

In this embodiment of the present invention, as an alternative, when the packaging unit 104 performs audio/video multiplexing, in addition to packaging the edited media content S5, it may package all or part of the unresolved media content S4, and package the edited media content S5 and the unresolved media content S4 into the UGC media content S15, where the edited media content S5 and the unresolved media content S4 may respectively include audio content and/or video content (for example, the edited media content S5 only includes audio content, and the unresolved media content S4 only includes video content).

As another alternative, all or part of the unresolved media content S4 may be encapsulated separately to obtain UGC media content S15.

It should be noted that, in addition to packaging all or part of the unresolved media content S4, the packaging unit 104 may also package all or part of the PGC media content S1 to obtain the UGC media content, and at this time, the entire content of the PGC media content S1 may be synthesized, including insertion, replacement, combination, splicing, and the like. For example, as shown in fig. 16, after the packaging unit 104 packages the second media content S6 into the UGC media content S15 (the specific process includes that the packaging unit 104 first performs audio coding on the first audio data S12 and/or the first auxiliary data S13 in the second media content S6 to obtain an encoded audio content S16, performs video coding on the first video data S14 in the second media content S6 to obtain an encoded video content S17, and then multiplexes the encoded audio content S16 and the encoded video content S17 into the UGC media content S15), replaces one or more content in the PGC media content S1 with the UGC media content S15, or inserts the UGC media content S15 at one or more positions in the PGC media content S1, the packaging unit 104 multiplexes the UGC media content S15 and the PGC media content S1 into a final UGC media content S18, and outputs a final UGC media content S4624 or a final UGC media content S585 + 18 in the PGC media content S1 or PGC media content S1, and outputs the final UGC media content S1 or all the UGC media content S18 All or a portion of the content in (a).

It should be noted that, in addition to the packaging method according to the above-described embodiment, other packaging methods may be included, the present invention is not limited thereto, and the packaging method based on the UGC media content generation method of the present invention is within the scope of the present invention.

Fig. 17 is a schematic structural diagram of a UGC media content generating device according to an embodiment of the present invention, and as shown in fig. 7, the UGC media content generating device 10 includes:

a selecting module 11, configured to select a first media content from PGC media contents, where the first media content may include one content segment, multiple content segments, or all content segments in the PGC media contents;

the analysis module 12 is configured to analyze the selected content in the first media content according to a preset analysis manner to obtain an analyzed media content, where the analysis manner corresponds to a category of the selected content;

the editing module 13 is configured to edit the analyzed media content to obtain an edited media content;

and the packaging module 14 is configured to package the edited media content according to a preset packaging manner, so as to obtain the UGC media content.

Optionally, in this embodiment of the present invention, the selecting module 11 of the apparatus is configured to select, by means of the selected start time and end time, a first media content from the PGC media content; or, the first media content is selected from the PGC media contents by means of the selected start time and duration.

Optionally, in the embodiment of the present invention, the selecting module 11 of the apparatus is configured to obtain a total duration of the PGC media content; determining a start time and an end time of at least one designated segment to be extracted from the PGC media content, and taking a content segment formed by the at least one start time and the end time as the first media content, where the first media content includes a plurality of content segments, the plurality of content segments may be different, partially overlapped, or completely different from each other, and a content duration corresponding to the start time and the end time of each content segment is less than or equal to the total duration.

Optionally, in the embodiment of the present invention, the selecting module 11 of the apparatus is configured to obtain a total duration of the PGC media content; taking the start time and the duration of at least one specified segment to be extracted from the PGC media content, and the content segments formed by at least one of the start time and the duration as the first media content, where when the first media content includes a plurality of content segments, the content segments may be different from each other, partially overlapped with each other, mutually contained with each other, or completely the same, and the content duration corresponding to each content segment is less than or equal to the total duration.

Optionally, in this embodiment of the present invention, the selected content may include all content or part content of the first media content; the analysis module 12 of the device is configured to obtain categories included in all or part of the first media content, where the categories include audio content, video content, or audio-video content; and analyzing the content of each category in all or part of the first media content according to the analysis mode corresponding to each category to obtain analyzed media content, and taking the rest of the first media content as unresolved media content.

Optionally, in the embodiment of the present invention, the parsing manner includes audio decoding, video decoding, or audio/video demultiplexing; when the selected content in the first media content includes only audio content, the parsing module 12 of the apparatus is specifically configured to perform audio decoding on the selected content to obtain audio data and/or auxiliary data; when the selected content in the first media content only includes video content, the parsing module 12 of the apparatus is specifically configured to perform video decoding on the selected content to obtain video data; when the selected content in the first media content includes audio/video content, the parsing module 12 of the apparatus is specifically configured to perform audio/video multiplexing on the selected content to obtain first audio content and first video content, and perform audio decoding and/or video decoding on the first audio content and the first video content based on an editing requirement.

Optionally, in this embodiment of the present invention, if the editing requirement includes editing the first audio content and does not edit the first video content, it is determined that the selected content may include a part of the first media content, the parsing module 12 of the apparatus is further configured to perform audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and use the first video content as an unresolved media content;

if the editing requirement includes editing the first audio content and the first video content, it is determined that the selected content may include all of the first media content, and the parsing module 12 of the apparatus is further configured to perform audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and perform video decoding on the first video content to obtain first video data.

Optionally, in this embodiment of the present invention, the editing module 13 of the apparatus is specifically configured to perform an editing operation on the first audio data and/or the first auxiliary data and/or the first video data, and generate an edited media content, where the edited media content includes second audio data modified based on the first audio data and/or second auxiliary data modified based on the first auxiliary data and/or second video data modified based on the first video data.

Optionally, in the embodiment of the present invention, the encapsulation manner includes audio coding, video coding, or audio-video multiplexing; the packaging module 14 of the apparatus is configured to package the edited media content to obtain a second media content; if the second media content only includes audio content, the encapsulation module 14 is further configured to perform audio coding on the audio data and/or the auxiliary data in the second media content to obtain UGC media content; if the second media content only includes video content, the encapsulation module 14 is further configured to perform video coding on the video data in the second media content to obtain UGC media content; if the second media content includes audio/video content, the encapsulation module 14 is further configured to perform audio coding on the audio data and/or the auxiliary data in the second media content to obtain coded audio content, perform video coding on the video data in the second media content to obtain coded video content, and multiplex the coded audio content and the coded video content into UGC media content.

Optionally, in this embodiment of the present invention, the encapsulating module 15 of the apparatus is further configured to encapsulate all or part of the unresolved media content to obtain the UGC media content.

Fig. 18 is a block diagram of a system for generating UGC media content according to an embodiment of the present invention, where the apparatus 800 of the device may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, data communication, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on the device 800. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The apparatus 800 may access a wireless network based on a communication standard. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

The embodiment of the invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, the method for generating the UGC media content of the embodiment of the method is realized. Such as the memory 804 including instructions executable by the processor 820 of the device 800 to perform the methods described above. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium, wherein instructions of the storage medium, when executed by a processor of a client, enable the client to perform a method of generating UGC media content for the client.

Embodiments of the present invention also provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for generating the UGC media content as described above is implemented.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for generating UGC media content, comprising:

editing the analyzed media content to obtain edited media content;

2. The method of claim 1, wherein selecting the first media content from the PGC media contents comprises:

3. The method of claim 1, wherein the selected content comprises all or part of the first media content;

acquiring categories contained in all contents or part of contents of the first media content, wherein the categories comprise audio contents, video contents or audio and video contents;

4. The method of claim 3, wherein the parsing comprises audio decoding, video decoding, or audio-video demultiplexing;

5. The method of claim 4, wherein said audio and/or video decoding the first audio content and the first video content based on the editing requirement comprises:

6. The method of claim 5, wherein editing the parsed media content to obtain edited media content comprises:

7. The method of claim 6, wherein the encapsulation mode comprises audio coding, video coding, or audio-video multiplexing;

packaging the edited media content to obtain second media content;

8. The method of any one of claims 4-7, further comprising:

9. An apparatus for generating UGC media content, comprising:

10. A generation device of UGC media content, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of generating UGC media content according to any of claims 1 to 8.

11. A computer-readable storage medium having computer-executable instructions stored therein which, when executed by a processor, implement the UGC media content generation method of any one of claims 1 to 8.

12. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the UGC media content generation method according to any one of claims 1 to 8.