CN113691860B

CN113691860B - UGC media content generation method, device, equipment and storage medium

Info

Publication number: CN113691860B
Application number: CN202110811872.6A
Authority: CN
Inventors: 黄旭; 潘兴德
Original assignee: Beijing Panoramic Sound Information Technology Co ltd
Current assignee: Beijing Panoramic Sound Information Technology Co ltd
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2023-12-08
Anticipated expiration: 2041-07-19
Also published as: CN113691860A

Abstract

According to the method, the device, the equipment and the storage medium for generating the UGC media content, the first media content is selected from the PGC media content, the first media content can comprise one content segment, a plurality of content segments or all content segments in the PGC media content, the selected content in the first media content is analyzed according to the preset analysis mode, the analyzed media content is obtained, the analysis mode corresponds to the type of the selected content, the analyzed media content is edited, the edited media content is obtained, and the edited media content is packaged according to the preset packaging mode, so that the UGC media content is obtained, and the problems that the generated UGC media content is poor in quality and single in content due to the fact that each audio component contained in the first media content cannot be completely disassembled in the prior art are solved, and meanwhile the diversity and the interactivity of the UGC media content are enriched.

Description

UGC media content generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of digital media production, and in particular, to a method, an apparatus, a device, and a storage medium for generating UGC media content.

Background

Currently, the audio and video field is being developed vigorously, and various media contents (such as movies, television shows, variety, self-media and the like) are produced by media production tools, so that a rich audio-visual experience is provided for people. Media content can be divided into two main categories: the first is PGC (Professional Generated Content, professional produced content) and the second is UGC (User Generated Content, user produced content). In recent years, PGC and UGC have begun to be combined primarily, and users can use PGC programs published on the internet to perform simple secondary authoring (such as dubbing, ghost, accompaniment, etc.), so that media contents are more diversified.

However, in the current technology, when UGC media content is produced by PGC media content, each audio component contained therein cannot be completely disassembled, so that a single audio component (such as a pair of white, a soundtrack, an ambient sound, etc.) cannot be deleted or replaced, and only the single audio component can be completely reserved or completely deleted, so that the problem of poor quality of the produced UGC media content is caused.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for generating UGC media content, which are used for solving the problems of poor quality and single content of the generated UGC media content caused by the fact that each audio component contained in the UGC media content cannot be completely disassembled in the prior art.

In one aspect, the present invention provides a method for generating UGC media content, including:

selecting first media content from the PGC media content, wherein the first media content can comprise one content segment, a plurality of content segments or all content segments in the PGC media content;

analyzing the selected content in the first media content according to a preset analysis mode to obtain analyzed media content, wherein the analysis mode corresponds to the category of the selected content;

editing the parsed media content to obtain edited media content;

and packaging the edited media content according to a preset packaging mode to obtain UGC media content.

Optionally, the selecting the first media content from the PGC media content includes:

selecting a first media content from the PGC media content by way of the selected start time and end time; or,

The first media content is selected from among the PGC media content by way of a selected start time and duration.

Optionally, the selected content may include all or part of the first media content;

analyzing the selected content in the first media content according to a preset analysis mode to obtain analyzed media content, wherein the analysis mode corresponds to the category of the selected content and comprises the following steps:

acquiring all or a part of categories contained in the first media content;

and respectively analyzing all or part of the content of each category in the first media content according to the analysis mode corresponding to each category to obtain analyzed media content, and taking the rest of the content of the first media content as unresolved media content.

Optionally, the category includes audio content, video content or audiovisual content, and the parsing method includes: audio decoding, video decoding or audio-video demultiplexing;

analyzing the whole content or part of the content of each category of the first media content according to the analysis mode corresponding to each category to obtain the analyzed media content, and taking the rest of the content of the first media content as the unresolved media content, wherein the method comprises the following steps:

When the selected content in the first media content only comprises audio content, audio decoding is carried out on the selected content to obtain audio data and/or auxiliary data;

when the selected content in the first media content only comprises video content, video decoding is carried out on the selected content to obtain video data;

when the selected content in the first media content comprises audio and video content, audio and video multiplexing is carried out on the selected content to obtain first audio content and first video content, and audio decoding and/or video decoding are carried out on the first audio content and the first video content based on editing requirements.

Optionally, the audio decoding and/or video decoding of the first audio content and the first video content based on the editing requirement includes:

if the editing requirement comprises editing the first audio content and not editing the first video content, determining that the selected content can comprise part of the first media content, performing audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and taking the first video content as unresolved media content;

If the editing requirement comprises editing the first audio content and the first video content, determining that the selected content can comprise all the content of the first media content, performing audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and performing video decoding on the first video content to obtain first video data.

Optionally, the editing the parsed media content to obtain edited media content includes:

editing the first audio data and/or the first auxiliary data and/or the first video data to generate edited media content, wherein the edited media content comprises second audio data modified based on the first audio data and/or second auxiliary data modified based on the first auxiliary data and/or second video data modified based on the first video data.

Optionally, the packaging mode includes audio coding, video coding or audio-video multiplexing;

the step of packaging the edited media content according to a preset packaging mode to obtain UGC media content, including:

packaging the edited media content to obtain second media content;

If the second media content only comprises audio content, audio encoding is carried out on audio data and/or auxiliary data in the second media content to obtain UGC media content;

if the second media content only comprises video content, video coding is carried out on video data in the second media content to obtain UGC media content;

and if the second media content comprises audio and video content, audio encoding is carried out on the audio data and/or auxiliary data in the second media content to obtain encoded audio content, video encoding is carried out on the video data in the second media content to obtain encoded video content, and the encoded audio content and the encoded video content are multiplexed into UGC media content.

Optionally, the method further comprises:

and packaging all or part of the unresolved media content to obtain UGC media content.

In another aspect, the present invention provides a device for generating UGC media content, including:

a selecting module, configured to select a first media content from PGC media contents, where the first media content may include one content segment, a plurality of content segments, or all content segments in the PGC media content;

The analysis module is used for analyzing the selected content in the first media content according to a preset analysis mode to obtain analyzed media content, wherein the analysis mode corresponds to the category of the selected content;

the editing module is used for editing the parsed media content to obtain edited media content;

and the packaging module is used for packaging the edited media content according to a preset packaging mode to obtain UGC media content.

In another aspect, the present invention provides a device for generating UGC media content, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes the computer-executable instructions stored by the memory such that the at least one processor performs the method of generating UGC media content described above.

In another aspect, the present invention provides a computer readable storage medium having stored therein computer executable instructions that when executed by a processor implement the method for generating UGC media content described above.

In another aspect, the present invention provides a computer program product comprising a computer program which when executed by a processor implements the method of generating UGC media content described above.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic diagram of a system for generating UGC media content according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an application scenario implemented by a UGC-based media content generation system according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for generating UGC media content according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating another method for generating UGC media content according to an embodiment of the present invention;

fig. 5 is a flow chart of an analysis method according to an embodiment of the present invention;

FIG. 6 is a flow chart of another parsing method according to an embodiment of the present invention;

FIG. 7 is a flow chart of another parsing method according to an embodiment of the present invention;

FIG. 8 is a flow chart of another parsing method according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart of an editing method according to an embodiment of the present invention;

FIG. 10 is a flowchart of another editing method according to an embodiment of the present invention;

FIG. 11 is a flowchart of another editing method according to an embodiment of the present invention;

fig. 12 is a flow chart of a packaging method according to an embodiment of the present invention;

FIG. 13 is a flow chart of another packaging method according to an embodiment of the present invention;

Fig. 14 is a flow chart of another packaging method according to an embodiment of the present invention;

FIG. 15 is a flow chart of another packaging method according to an embodiment of the present invention;

FIG. 16 is a flow chart of another packaging method according to an embodiment of the present invention;

FIG. 17 is a schematic structural diagram of another UGC media content generation device according to an embodiment of the invention;

FIG. 18 is a block diagram of a system for generating UGC media content according to an embodiment of the invention.

Specific embodiments of the present invention have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

First, the terms involved in the present invention will be explained:

the professional production content is as follows: (Professional Generated Content, abbreviated as PGC), such contents include movies, variety, television shows, etc., and the broadcast and television practitioners collect, clip, and output codes of audio and video, and the main production means is professional audio and video production Tools (such as Pro Tools, nuendo, adobe premier, etc.). For audio, the content diversity is particularly obvious, and because different audio codecs have different technical characteristics, the manufacturing tool can not only realize the manufacturing of traditional multi-channel audio (such as MP3, AAC, AC3, AC4, WANOS, AVS, FLAC, APE and the like), but also add various auxiliary data (such as the spatial position, reverberation parameters, rendering angles and the like of the audio) to realize the manufacturing of audio based on objects (such as ATMOS, WANOS, AVS, MPEG-H and the like) and scenes (such as FOA, HOA and the like), so that PGC media content with more stereoscopic impression, immersion impression and higher quality can be produced.

The user produces content (User Generated Content, UGC for short) which is mainly self-media, mainly records materials through mobile equipment such as mobile phones and tablets, clips through professional or non-professional audio and video production tools, produces vivid streaming media works and can be shared on the Internet at the first time. Users can download works published by other people and perform secondary creation based on the works, so that the method has strong interactivity and operability.

Audio coding: PCM audio sample data (uncompressed data or data corresponding to an "audio waveform") is compressed into audio byte stream data in a certain format.

Audio decoding: the compressed byte stream data is parsed into PCM audio sample data.

Video coding: video pixel data (uncompressed data, data corresponding to each frame picture in video) such as RGB, YUV and the like is compressed into video byte stream data in a certain format.

Video decoding: and analyzing the compressed video byte stream data into video pixel data such as RGB, YUV and the like.

Audio and video multiplexing: the encoded/compressed audio data and the encoded/compressed video data are encapsulated into a media container (e.g., MP4, TS, etc.) to form a media container byte stream.

Audio and video demultiplexing: the media container byte stream is parsed into compressed audio data and compressed video data.

Next, a brief description will be given of a manner of generating UGC media content based on PGC media content and defects existing in the prior art:

the current method for generating UGC media content based on PGC media content mainly comprises the steps that a user uses a published PGC program on the Internet to perform simple secondary creation (such as dubbing, ghost, accompaniment and the like), so that the generated UGC media content is more diversified. However, the current manner of generating UGC media content based on PGC media content has significant limitations, particularly in:

UGC audio quality is generally not high. If a user wants to make personalized UGC programs by using PGC programs as materials, only PGC programs published on the internet can be used as authoring materials, but these programs are stereo versions mixed by professionals, each audio component contained in these programs cannot be completely disassembled, so that individual audio components (such as white, soundtrack, environmental sound, etc.) cannot be deleted or replaced, and only all the programs can be reserved or deleted. For example, when a user wants to replace a part of a PGC program with its own dubbing, the ambient sound and background music contained in the replaced dubbing will be deleted at the same time, resulting in discontinuous content of the ambient sound and background music, and thus the generated UGC audio is poor in quality.

Secondly, the number of audio tracks supported by the current UGC making tool is very limited, a user can only put audio materials on at most 2 audio tracks, namely, the audio materials are overlapped with original PGC program materials, and meanwhile, any auxiliary data cannot be added, so that the user can only make stereo audio, cannot make multi-channel audio based on objects and scenes, the real authoring intention of the user cannot be comprehensively displayed, and the generated UGC media content is single.

Therefore, the invention provides a method for generating UGC media content, which can freely create on the basis of PGC media content, and a user can use the whole PGC media content or intercept one or more content fragments of the PGC media content by the method for generating the UGC media content, and randomly edit the content fragments according to the manufacturing intention of the user, thereby generating multichannel, object-based and scene-based UGC media content, and solving the problems of poor quality and single content of the generated UGC media content caused by the fact that each audio component contained in the UGC media content cannot be completely disassembled in the prior art.

The architecture schematic diagram of a UGC media content generation system provided by the embodiment of the present invention is described below:

fig. 1 is a schematic architecture diagram of a system for generating UGC media content according to an embodiment of the present invention, where the system 100 for generating UGC media content includes: extraction section 101, analysis section 102, editing section 103, and packaging section 104.

The extracting unit 1 is configured to select a first media content from the PGC media contents, where the first media content may include one content segment, a plurality of content segments, or all content segments in the PGC media contents.

The parsing unit 2 is configured to parse the selected content in the first media content according to a preset parsing manner, so as to obtain parsed media content, where the parsing manner corresponds to the category of the selected content.

The editing unit 3 is configured to edit the parsed media content to obtain edited media content.

The packaging unit 4 is configured to package the edited media content according to a preset packaging manner, so as to obtain UGC media content.

The following introduces an application scene of the generation of UGC media content based on the generation system of UGC media content:

as shown in fig. 2, the PGC media content S1 is input to the extraction unit 101, and the whole content or part of the content of the PGC media content S1 may be input to the encapsulation unit 104; the extracting unit 101 extracts the PGC media content S1 to obtain a first media content S2, and inputs the first media content S2 to the parsing unit 102, where the first media content S2 may be all or part of the PGC media content S1; the parsing unit 102 parses all or part of the first media content S2 to obtain parsed media content S3; the parsed media content S3 is input to the editing unit 103, while the unresolved media content S4 may be input to the packaging unit 104 or directly discarded (dashed line in fig. 2); editing unit 103 performs editing operation on parsed media content S3 to obtain edited media content S5, and inputs edited media content S5 to packaging unit 104; the packaging unit 104 packages all or part of the edited media content S5 to obtain the final UGC media content S6, and the packaging unit 104 may package all or part of the PGC media content S1 and/or the unresolved media content S4 to obtain the UGC media content S6.

In this scenario, the extracting unit 101 selects all or part of the PGC media content S1 to obtain the first media content S2, where the extracting manner includes: the first media content S2 is selected from the PGC media content S1 by setting a start time and an end time of the first media content S2 or selecting a start time and a duration, where the first media content S2 may be the PGC media content S1 itself, a piece of the PGC media content S1, or a combination of a plurality of pieces of the PGC media content S1.

In this scenario, the parsing unit 102 parses all or part of the content in the media content S2 to obtain parsed media content S3 and/or unresolved media content S4, where the parsing method includes: audio decoding, video decoding, audio-video demultiplexing, etc.

In this scenario, the editing unit 103 may perform editing operation on the parsed media content S3 according to the editing requirement of the user, to obtain edited media content S5. Among them, editing modes include, but are not limited to, addition, deletion, substitution, and the like.

In this scenario, the encapsulation mode of the encapsulation unit 104 includes audio encoding, video encoding, audio-video multiplexing, and the like. For example, when it is determined that the edited media content includes only audio content, the encapsulation unit 104 needs to encapsulate the edited media content S5 by audio encoding.

In the application scenario of the generation of the UGC media content, a first media content is selected from the PGC media content, where the first media content may include one content segment, a plurality of content segments, or all content segments in the PGC media content, the selected content in the first media content is parsed according to a preset parsing manner, so as to obtain parsed media content, the parsing manner corresponds to a category of the selected content, the parsed media content is edited, so as to obtain edited media content, and the edited media content is packaged according to a preset packaging manner, so that the UGC media content is obtained, and thus the problems of poor quality and single content of the generated UGC media content due to the fact that each audio component contained in the PGC media content cannot be completely disassembled in the prior art are solved, and meanwhile, the diversity and interactivity of the UGC media content are enriched.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 3 is a flow chart of a method for generating UGC media content according to an embodiment of the present invention, as shown in fig. 3, where the method in this embodiment may include:

s101, selecting first media content from PGC media content.

In this step, the first media content may include one content segment, a plurality of content segments, or all content segments in the PGC media content.

In the embodiment of the present invention, the first media content may include all or part of the content of the PGC media content, for example, the PGC media content includes 10 content segments, and the first media content may include 5 content segments therein.

S102, analyzing the selected content in the first media content according to a preset analysis mode to obtain analyzed media content.

In this step, the preset parsing scheme includes, but is not limited to, audio decoding, video decoding, or audio-video demultiplexing. The parsing scheme is determined according to the category of the selected content in the first media content, that is, the parsing scheme corresponds to the category of the selected content.

In the embodiment of the present invention, for example, the content category may include audio content, video content, or audiovisual content, and when the selected content in the first media content includes audio content, the selected content in the first media content is subjected to audio decoding, so as to obtain parsed media content.

S103, editing the analyzed media content to obtain edited media content.

In this step, the editing mode can be set according to the user's requirement.

In the embodiment of the present invention, for example, new audio data is added to the parsed media content by means of importing a file, recording, adding special sound effects, etc., so as to obtain edited media content, and other modes may be included in addition, which is not limited in the present invention.

S104, packaging the edited media content according to a preset packaging mode to obtain UGC media content.

In this step, the preset packaging mode includes, but is not limited to, audio coding, video coding or audio-video multiplexing. The specific packaging method is determined by the type of the content contained in the media content after editing, that is, the packaging method corresponds to the content contained in the media content after editing.

In the embodiment of the present invention, for example, the content category may include audio content, video content or audio-video content, and when the content included in the edited media content includes only audio content, audio encoding is performed on the edited media content to obtain UGC media content.

In the embodiment of the method for generating the UGC media content, the first media content is selected from the PGC media content, and can comprise one content segment, a plurality of content segments or all content segments in the PGC media content, the selected content in the first media content is analyzed according to the preset analysis mode to obtain the analyzed media content, the analysis mode corresponds to the type of the selected content, the analyzed media content is edited to obtain the edited media content, and the edited media content is packaged according to the preset packaging mode to obtain the UGC media content, so that the problems that the generated UGC media content is poor in quality and single in content due to the fact that each audio component contained in the UGC media content cannot be completely disassembled in the prior art are solved, and meanwhile the diversity and the interactivity of the UGC media content are enriched.

Fig. 4 is a flowchart of another method for generating UGC media content according to an embodiment of the present invention, as shown in fig. 4, where the method in this embodiment may include:

s201, selecting first media content from PGC media content, wherein the first media content can comprise one content segment, a plurality of content segments or all content segments in the PGC media content.

In the embodiment of the present invention, as an alternative, step S201 may include:

s2011, selecting first media content from PGC media content by means of the selected starting time and ending time.

In the embodiment of the present invention, specifically, in the implementation process of selecting the first media content from the PGC media content by means of the selected start time and end time, the start time and end time of at least one designated segment to be extracted may be determined from the PGC media content by obtaining the total duration of the PGC media content, and the content segment formed by at least one of the start time and the end time may be used as the first media content, where when the first media content includes a plurality of content segments, the content segments may be different from each other, partially overlapped or completely the same, and the content duration corresponding to the start time and the end time of each content segment is less than or equal to the total duration.

That is, if a user wants to extract a content segment from the PGC media content, it is necessary to determine the start and stop time of extracting the content segment, and take the content segment formed by the start and stop time as the first media content, that is, the first media content at this time includes only one content segment. If a user wants to extract a plurality of content segments from the PGC media content, it is necessary to determine the start and stop time of each content segment, and form a plurality of content segments based on the start and stop times, that is, the first media content at this time includes a plurality of content segments. If the user wants to extract all the designated segments from the PGC media content, that is, obtain the entire PGC media content, the content segment formed by the total duration is the first media content, that is, the first media content at this time includes all the content segments.

For example, the total duration of the PGC media content is denoted as T, the start time and the end time of a specific piece to be extracted are determined from the PGC media content to be T1 and T2, and at least one piece of content [ T1, T2] formed by the start time and the end time is used as the first media content, that is, the first media content at this time includes the piece of content [ T1, T2].

Further, the above operations may be repeated, a plurality of content segments may be selected from the PGC media content, and the plurality of content segments may be used as the first media content. For example, the content segments [ T1, T2] and [ T3, T4] are selected from the PGC media content, wherein the content segments [ T1, T2] and [ T3, T4] can be different from each other, partially overlapped or completely identical, and the content duration corresponding to the content segments [ T1, T2] and [ T3, T4] is less than or equal to the total duration T, namely 0.ltoreq.T1.ltoreq.T2.ltoreq.T, and 0.ltoreq.T3.ltoreq.T4.ltoreq.T. For example, the PGC media content is 60s, the content segment [10, 20] and the content segment [15, 30] are selected from the PGC media content, and the content segment [10, 20] and the content segment [15, 30] are taken as the first media content, wherein the content segment [10, 20] is represented as a start time of 10s, an end time of 20s, the content segment [15, 30] is represented as a start time of 15s, an end time of 30s, the two content segments are partially overlapped, and not only the start time of each content segment is less than the end time by less than the total time (0 < 10 < 20 < T,0 < 15 < 30 < T), but also the content time corresponding to the content segment [ T1, T2] is 10s, the content time corresponding to the content segment [ T3, T4] is 15s, and all of the content segments are less than the total time 60s.

Alternatively, as an extreme case, when t1=0, t2=t, indicating that the entire content of the PGC media content is extracted from the PGC media content, it is understood that the first media content at this time is equal to the PGC media content.

S2012, selecting a first media content from the PGC media content by way of the selected start time and duration.

In this embodiment of the present invention, specifically, the total duration of the PGC media content may be obtained, then the starting time and the duration of at least one designated segment to be extracted in the PGC media content are set, and the content segment formed by at least one starting time and the duration is used as the first media content.

For example, the PGC media content is 60s, the starting time of a content segment selected from the PGC media content is 10s, the duration is 10s, the starting time of another content segment is 15s, and the duration is 20s, then the content segments [10, 10] and [15, 20] are regarded as the first media content, wherein the content segments [10, 10] are represented as the starting time of the content segment is 10s, the duration is 10s, the content segments [15, 20] are represented as the starting time of the content segment is 15s, the duration is 20s, the two content segments are partially overlapped, the content duration corresponding to the content segments [10, 10] is 10s, the content duration corresponding to the content segments [15, 20s, and all the content durations are smaller than the total duration 60s.

In the embodiment of the present invention, by the two ways of selecting the first media content in step S2011 or step S2012, the time length of the most needed content segment can be accurately controlled while the content segment selected by the user according to the user' S own needs is satisfied, so that the subsequent editing operation is convenient.

S202, acquiring all or part of the categories contained in the first media content.

In this step, the category includes audio content, video content, or audiovisual content, wherein the audiovisual content includes both audio content and video content.

In the embodiment of the invention, all or part of the first media content can only comprise audio content, only comprise video content or comprise audio-video content. The method comprises the steps of classifying all or part of the first media content so as to analyze the content of each category in all or part of the first media content based on the analysis mode corresponding to each category, and obtaining the analyzed media content.

In the embodiment of the invention, the first media content is classified, so that different analysis modes are set for different types of content in the subsequent step, and the analysis efficiency can be improved.

S203, according to the analysis mode corresponding to each category, analyzing the whole content or part of the content of each category in the first media content to obtain analyzed media content, and taking the rest of the content of the first media content as unresolved media content.

In this step, the parsing scheme includes audio decoding, video decoding, or audio-video demultiplexing. The analysis mode corresponding to the audio content is audio decoding, the analysis mode corresponding to the video content is video decoding, and the analysis mode corresponding to the audio and video content is audio and video demultiplexing.

In the embodiment of the present invention, as an alternative, step S203 may specifically include:

s2031, when the selected content in the first media content includes only audio content, performing audio decoding on the selected content to obtain audio data and/or auxiliary data.

In the embodiment of the present invention, when the selected content in the first media content includes only audio content, the audio content includes uncompressed audio data (such as PCM), auxiliary data (such as spatial coordinates, equalization parameters, reverberation order, etc.), and other data may be included in the audio content, which is not limited in the present invention. Further, audio formats corresponding to the audio content include, but are not limited to, channel-based audio formats (e.g., MP3, AAC, AC3, AC4, WANOS, AVS, FLAC, APE, etc.), object-based audio formats (e.g., ATMOS, WANOS, AVS, MPEG-H), scene-based audio formats (e.g., FOA, HOA), etc.

Optionally, in practical application, as shown in fig. 5, for example, when the first media content S2 includes only audio content, the parsing unit 102 performs audio decoding on the first media content S2 to obtain audio data S9 and/or auxiliary data S10, where the parsed media content s3=audio data s9+auxiliary data S10, and no unresolved media content is included.

S2032, when the selected content in the first media content includes only video content, video decoding is performed on the selected content to obtain video data.

In the embodiment of the present invention, the video content includes video data, and in addition to the video data, the video content may also include other data. In addition, video formats corresponding to the video content include, but are not limited to, microsoft video (e.g., WMV, ASF, ASX), real Player (e.g., RM, RMVB), MPEG video (e.g., MP 4), cell phone video (e.g., 3 GP), apple video (e.g., MOV, M4V), other common video (e.g., AVI, DAT, MKV, FLV, VOB, etc.).

Optionally, in the practical application, as shown in fig. 6, when the first media content S2 includes only video content, the parsing unit 102 performs video decoding on the first media content S2 to obtain video data S11, where the parsed media content s3=the video data S11, and no unresolved media content is included.

S2033, when the selected content in the first media content includes audio/video content, performing audio/video multiplexing on the selected content to obtain first audio content and first video content, and performing audio decoding and/or video decoding on the first audio content and the first video content based on editing requirements.

In the embodiment of the present invention, if the editing requirement includes editing the first audio content and not editing the first video content, it is determined that the selected content may include a part of the first media content, and the first audio content is subjected to audio decoding to obtain first audio data and/or first auxiliary data, and the first video content is used as unresolved media content; if the editing requirement comprises editing the first audio content and the first video content, determining that the selected content can comprise all the content of the first media content, performing audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and performing video decoding on the first video content to obtain first video data.

Optionally, in the practical application, when the first media content S2 includes both the audio content and the video content, the parsing unit 102 performs audio-video demultiplexing on the first media content to obtain the audio content and the video content, and then performs audio decoding and/or video decoding according to the editing requirement of the editing unit 103, and the specific implementation process may include the following two scenarios:

In one scenario, as shown in fig. 7, if the editing unit 103 only needs to edit the audio content, the parsing unit 102 obtains the audio content S7 and the video content S8 after audio-video demultiplexing, performs audio decoding on the audio content S7 according to the editing requirement (only needs to edit the audio content) to obtain the audio data S9 and/or the auxiliary data S10, and does not perform decoding on the video content S8, and at this time, the parsed media content s3=the audio data s9+the auxiliary data S10 and the unresolved media content s4=the video content S8. It should be noted that, the present invention is to determine the parsed object by considering the editing requirement of the editing unit 103, that is, the purpose of this step is still to perform audio decoding and/or video decoding on the first audio content and the first video content, that is, parse the first audio content and/or the first video content, so this step is still performed by the parsing unit 102.

In another scenario, as shown in fig. 8, if the editing unit 103 needs to edit audio and video simultaneously, the audio and video are demultiplexed, then the audio content S7 is decoded into the audio data S9 and/or the auxiliary data S10, the video content S8 is decoded into the video data S11, at this time, the parsed media content s3=audio data s9+auxiliary data s10+video data S11, and the unresolved media content S4 has no content.

It should be noted that, in addition to this, the implementation process of step S2033 may also include other scenarios, which are not limited in this invention. For example, when the editing unit 103 only needs to edit video content, the parsing unit 102 obtains the audio content S7 and the video content S8 after audio and video demultiplexing, then video decodes the video content S8 according to the editing requirement (only needs to edit video content) to obtain the video data S11, and the audio content S7 is not decoded, and at this time, the parsed media content s3=the video data S11 and the unresolved media content s4=the audio content S7.

In the embodiment of the invention, the first media content is classified, so that different analysis modes are set for different types of content, and the analysis efficiency can be improved while the accurate analysis is ensured.

S204, editing the analyzed media content to obtain edited media content.

In the embodiment of the present invention, based on the execution results of the steps S2031 to S2033, step S204 may specifically include: editing the first audio data and/or the first auxiliary data and/or the first video data to generate edited media content, wherein the edited media content comprises second audio data modified based on the first audio data and/or second auxiliary data modified based on the first auxiliary data and/or second video data modified based on the first video data.

Wherein, the editing mode may include audio editing and/or video editing, wherein, the editing operation corresponding to the editing mode includes, but is not limited to, any combination of the following modes: changing a start time and/or an end time of the entire content or a part of the content of the first audio data and/or the first video data; adding new audio data into the first audio data by means of importing files, recording, adding special sound effects (such as applause, laughter and the like) and the like; deleting all or part of the content of the first audio data and/or the first auxiliary data and/or the first video data; adding new video data into the first video data by means of importing video files and the like; auxiliary data (e.g., spatial coordinates, equalization parameters, reverberation orders, etc.) that change the entire content or a portion of the content of the first auxiliary data; auxiliary data is added to the first auxiliary data by means of manually writing auxiliary information, importing configuration files and the like, and the editing operation can be repeated.

Optionally, in practical applications, step S204 may include the following scenarios:

in one scenario, as shown in fig. 9, the editing unit 103 performs an editing operation on the first audio content (the first audio data S9, the first auxiliary data S10) to obtain edited audio data S12 and/or edited auxiliary data S13, where the editing operation of the scenario may include, but is not limited to, adding new audio data to the first audio data S9 by means of importing a file, recording a sound, adding a special sound effect, etc., and adding auxiliary data to the first auxiliary data by means of manually writing auxiliary information, importing a configuration file, etc. In this scene, when the editing unit 103 edits only the audio content, the edited media content s5=the edited audio data S12 (channel-based audio) or the edited media content s5=the edited audio data s12+the edited auxiliary data S13 (object-and scene-based audio).

In another scenario, as shown in fig. 10, the editing unit 103 performs an editing operation on the first video content (the first video data S11) to obtain edited video data S14, where the editing operation of the scenario may include adding new video data by importing a video file into the first video data S11, or the like. In this scenario, when the editing unit 103 edits only the first video content, the edited media content s5=the edited video data S14.

In another scenario, as shown in fig. 11, the editing unit 103 performs editing operations on the first audio content (first audio data S9, first auxiliary data S10) and the first video content (first video data S11) to obtain edited audio data S12 and/or edited auxiliary data S13 and edited video data S14. Wherein the editing operation of the scene may include, but is not limited to: new audio data is added to the first audio data S9 by means of an import file, a recording, a special sound effect, etc., auxiliary data is added to the first auxiliary data by means of a manual write of auxiliary information, an import profile, etc., and new video data is added to the first video data S11 by means of an import video file, etc. In this scene, when the editing unit 103 edits the first audio content and the first video content simultaneously, the edited media content s5=first audio data s12+first video data S14 (channel-based audio) or the edited media content s5=first audio data s12+first auxiliary data s13+first video data S14 (object-and scene-based audio).

S205, packaging the edited media content according to a preset packaging mode to obtain UGC media content.

In the embodiment of the present invention, as an alternative, step S205 may specifically include:

s2051, packaging the edited media content to obtain second media content.

In the embodiment of the present invention, for example, the packaging unit 104 packages the edited data S5 into the second media content S6, where the packaging manner includes, but is not limited to: audio coding, video coding, audio-video multiplexing, etc.

S2052, if the second media content only comprises audio content, audio encoding is performed on the audio data and/or auxiliary data in the second media content, so as to obtain UGC media content.

In this step, when the encapsulation mode includes audio coding, the coding modes corresponding to the audio coding include, but are not limited to: conventional coding (coding of all audio data and/or auxiliary data), incremental coding (coding of only modified audio data and/or auxiliary data), etc.

In the embodiment of the present invention, as shown in fig. 12, when the second media content S6 only includes audio content, the encapsulation unit 104 performs audio encoding on the audio data and/or the auxiliary data in the second media content S6 (for example, the edited media content obtained based on the step S204 includes the first audio data S12 and/or the first auxiliary data S13, so that the first audio data S12 and/or the first auxiliary data S13 are encapsulated into the second media content, that is, the second media content includes the first audio data S12 and/or the first auxiliary data S13), so as to obtain and finally output the UGC audio content S15, where UGC audio content s15=the second media content S6, and the UGC media content S15 at this time may also be called UGC audio content.

Further, in addition to the application scenario of fig. 12 above, the following scenario may be included based on step S2052: as shown in fig. 13, if the second media content includes only audio content, the encapsulation unit 104 performs audio encoding (e.g., the first audio data S12 and/or the first auxiliary data S13) on the audio data and/or the auxiliary data in the second media content to obtain the encoded audio content S16, performs audio-video multiplexing on the unresolved media content S4 and the encoded audio content S16 to obtain the UGC media content S15, and finally outputs the UGC media content s15=the second media content s6+the unresolved media content S4.

S2053, if the second media content only comprises video content, video encoding is carried out on the video data in the second media content to obtain UGC media content.

In the embodiment of the present invention, as shown in fig. 14, when the second media content S6 only includes video content, the encapsulation unit 104 performs video encoding on the video content (for example, the first video data S14) in the second media content S6 to obtain the UGC video content S15 and finally outputs the UGC video content S15, where the UGC media content s15=the second media content S6, and the UGC media content S15 at this time may also be called as UGC video content.

S2054, if the second media content comprises audio and video content, audio encoding is performed on audio data and/or auxiliary data in the second media content to obtain encoded audio content, video encoding is performed on video data in the second media content to obtain encoded video content, and the encoded audio content and the encoded video content are multiplexed into UGC media content.

In the embodiment of the present invention, as shown in fig. 15, when the second media content S6 includes both audio content and video content, the encapsulation unit 104 first performs audio encoding on the audio content (for example, the first audio data S12 and/or the first auxiliary data S13) in the second media content S6 to obtain the encoded audio content S16, performs video encoding on the video content (for example, the first video data S14) in the second media content S6 to obtain the encoded video content S17, multiplexes the encoded audio content S16 and the encoded video content S17 into the UGC media content S15 and finally outputs the UGC media content S15, and then UGC media content s15=the encoded audio content s16+ encoded video content S17, that is, UGC media content s15=second media content S6.

In the embodiment of the invention, the media content is classified, different packaging modes are set for the content of different categories, so that the content of each category can be accurately packaged, and the packaging efficiency can be improved.

It should be noted that, besides the schemes of fig. 12 to 15, UGC media content may be obtained through other schemes, which is not limited by the embodiment of the present invention. For example, the scheme may further include:

s206, packaging all or part of the unresolved media content to obtain UGC media content.

In this embodiment of the present invention, as an alternative, when the packaging unit 104 performs audio and video multiplexing, besides packaging the edited media content S5, all or part of the unresolved media content S4 may be packaged, and the packaged edited media content S5 and the unresolved media content S4 are jointly packaged into the UGC media content S15, where the edited media content S5 and the unresolved media content S4 may include audio content and/or video content, respectively (for example, the edited media content S5 includes only audio content and the unresolved media content S4 includes only video content).

As another alternative, the whole content or part of the content of the unresolved media content S4 may be packaged separately, to obtain UGC media content S15.

It should be noted that, the encapsulation unit 104 may encapsulate all or part of the content in the PGC media content S1 in addition to encapsulating all or part of the unresolved content S4, so as to obtain UGC media content, and at this time, may synthesize the entire content of the PGC media content S1, including insertion, replacement, combination, splicing, and the like. For example, as shown in fig. 16, if the encapsulation unit 104 encapsulates the second media content S6 into UGC media content S15 (specific process includes that the encapsulation unit 104 first performs audio encoding on the first audio data S12 and/or the first auxiliary data S13 in the second media content S6 to obtain encoded audio content S16, performs video encoding on the first video data S14 in the second media content S6 to obtain encoded video content S17, multiplexes the encoded audio content S16 and the encoded video content S17 into UGC media content S15), replaces one or more of the PGC media content S1 with UGC media content S15, or inserts one or more of the PGC media content S15 at one or more positions in the PGC media content S1, and multiplexes the UGC media content S15 and the PGC media content S1 into final UGC media content S18 and outputs the final UGC media content S18=the whole or a part of the UGC media content S15.

It should be noted that, in addition to the encapsulation methods according to the embodiments described above, other encapsulation methods may be included, which are not limited in this regard, and encapsulation methods based on the UGC media content generation method of the present invention are all within the scope of the present invention.

Fig. 17 is a schematic structural diagram of a device for generating UGC media content according to an embodiment of the present invention, as shown in fig. 7, where the device 10 for generating UGC media content includes:

a selecting module 11, configured to select a first media content from PGC media contents, where the first media content may include one content segment, a plurality of content segments, or all content segments in the PGC media content;

the parsing module 12 is configured to parse the selected content in the first media content according to a preset parsing manner, so as to obtain parsed media content, where the parsing manner corresponds to the category of the selected content;

the editing module 13 is configured to edit the parsed media content to obtain edited media content;

and the packaging module 14 is configured to package the edited media content according to a preset packaging manner to obtain UGC media content.

Optionally, in the embodiment of the present invention, the selecting module 11 of the device is configured to select the first media content from the PGC media contents by means of the selected start time and end time; alternatively, the first media content is selected from among the PGC media content by way of a selected start time and duration.

Optionally, in the embodiment of the present invention, the selecting module 11 of the device is configured to obtain a total duration of the PGC media content; and determining the starting time and the ending time of at least one designated segment to be extracted from the PGC media content, and taking at least one content segment formed by the starting time and the ending time as the first media content, wherein when the first media content comprises a plurality of content segments, the content segments can be mutually different, partially overlapped or completely identical, and the content duration corresponding to the starting time and the ending time of each content segment is smaller than or equal to the total duration.

Optionally, in the embodiment of the present invention, the selecting module 11 of the device is configured to obtain a total duration of the PGC media content; and taking a content segment formed by at least one starting time and the duration as the first media content, wherein the starting time and the duration of at least one designated segment to be extracted in the PGC media content can be different from each other, partially overlapped and mutually contained or completely identical, and the content duration corresponding to each content segment is smaller than or equal to the total duration when the first media content comprises a plurality of content segments.

Alternatively, in an embodiment of the present invention, the selected content may include all or part of the first media content; the parsing module 12 of the device is configured to obtain a category included in all or part of the first media content, where the category includes audio content, video content, or audiovisual content; and respectively analyzing all or part of the content of each category in the first media content according to the analysis mode corresponding to each category to obtain analyzed media content, and taking the rest of the content of the first media content as unresolved media content.

Optionally, in the embodiment of the present invention, the parsing manner includes audio decoding, video decoding or audio-video demultiplexing; when the selected content in the first media content only includes audio content, the parsing module 12 of the device is specifically configured to perform audio decoding on the selected content to obtain audio data and/or auxiliary data; when the selected content in the first media content only includes video content, the parsing module 12 of the device is specifically configured to perform video decoding on the selected content to obtain video data; when the selected content in the first media content includes audio/video content, the parsing module 12 of the device is specifically configured to perform audio/video multiplexing on the selected content to obtain first audio content and first video content, and perform audio decoding and/or video decoding on the first audio content and the first video content based on editing requirements.

Optionally, in the embodiment of the present invention, if the editing requirement includes editing the first audio content and not editing the first video content, it is determined that the selected content may include a part of the first media content, and the parsing module 12 of the device is further configured to perform audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and use the first video content as the unresolved media content;

if the editing requirement includes editing the first audio content and the first video content, it is determined that the selected content may include all content of the first media content, and the parsing module 12 of the device is further configured to perform audio decoding on the first audio content to obtain first audio data and/or first auxiliary data, and perform video decoding on the first video content to obtain first video data.

Optionally, in an embodiment of the present invention, the editing module 13 of the apparatus is specifically configured to perform an editing operation on the first audio data and/or the first auxiliary data and/or the first video data, and generate edited media content, where the edited media content includes second audio data modified based on the first audio data and/or second auxiliary data modified based on the first auxiliary data and/or second video data modified based on the first video data.

Optionally, in an embodiment of the present invention, the packaging manner includes audio encoding, video encoding, or audio-video multiplexing; the packaging module 14 of the device is configured to package the edited media content to obtain a second media content; if the second media content only includes audio content, the encapsulation module 14 is further configured to perform audio encoding on the audio data and/or the auxiliary data in the second media content to obtain UGC media content; if the second media content only includes video content, the encapsulation module 14 is further configured to perform video encoding on video data in the second media content to obtain UGC media content; if the second media content includes audio and video content, the encapsulation module 14 is further configured to perform audio encoding on the audio data and/or the auxiliary data in the second media content to obtain encoded audio content, perform video encoding on the video data in the second media content to obtain encoded video content, and multiplex the encoded audio content and the encoded video content into UGC media content.

Optionally, in the embodiment of the present invention, the packaging module 15 of the device is further configured to package all or part of the unresolved content to obtain UGC media content.

FIG. 18 is a block diagram of a system for generating UGC media content according to an embodiment of the invention, the apparatus 800 of the device may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, data communication, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 806 provides power to the various components of the device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 800 is in an operational mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, an orientation or acceleration/deceleration of the device 800, and a change in temperature of the device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The apparatus 800 may access a wireless network based on a communication standard. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

The embodiment of the invention also provides a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the UGC media content generation method of the method embodiment is realized. Such as memory 804 including instructions executable by processor 820 of device 800 to perform the methods described above. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of a client, causes the client to perform the method of generating UGC media content of the client described above.

The embodiment of the invention also provides a computer program product, which comprises a computer program, and the computer program realizes the UGC media content generating method when being executed by a processor.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for generating user generated content UGC media content, comprising:

extracting first media content from professional production content, PGC, media content, the first media content may comprise one content segment or a plurality of content segments in the PGC media content; when the extracted first media content comprises a plurality of content segments, the content segments are different from each other or partially overlapped or completely identical;

editing the parsed media content to obtain edited media content;

packaging the edited media content according to a preset packaging mode to obtain a first UGC media content;

replacing one or more of the PGC media content with the first UGC media content; alternatively, the first UGC media content is inserted at one or more locations in the PGC media content;

multiplexing the first UGC media content and the PGC media content to obtain second UGC media content;

Wherein the parsed media content includes: first audio data and first auxiliary data, or, first audio data, first auxiliary data and first video data; the first assistance data comprises at least one of: spatial coordinates, equalization parameters, reverberation orders;

the editing operation corresponding to the editing comprises any combination of the following modes:

changing a start time and/or an end time of the entire content or a part of the content of the first audio data and/or the first video data; adding new audio data into the first audio data through file importing, recording and special sound effect adding; deleting all or part of the content of the first audio data and/or the first auxiliary data and/or the first video data; adding new video data to the imported video file in the first video data; auxiliary data changing the whole or part of the content of the first auxiliary data; adding auxiliary data into the first auxiliary data by manually writing auxiliary information and importing configuration files;

the extracting the first media content from the PGC media content includes:

extracting first media content from the PGC media content by way of the selected start time and end time; or,

The first media content is extracted from the PGC media content by way of a selected start time and duration.

2. The method of claim 1, wherein the selected content may include all or part of the first media content;

acquiring categories contained in all or part of the first media content, wherein the categories comprise audio content, video content or audio-video content;

3. The method of claim 2, wherein the parsing means comprises audio decoding, video decoding, or audio-video demultiplexing;

4. A method according to claim 3, wherein the audio decoding and/or video decoding of the first audio content and first video content based on editing requirements comprises:

5. The method of claim 4, wherein editing the parsed media content to obtain edited media content comprises:

6. The method of claim 5, wherein the encapsulation comprises audio coding, video coding, or audio-video multiplexing;

Packaging the edited media content to obtain second media content;

7. The method according to any one of claims 3-6, further comprising:

8. A user generated content UGC media content generating apparatus, comprising:

An extraction module for extracting a first media content from a professional production content, PGC, media content, the first media content may comprise one content segment or a plurality of content segments in the PGC media content; when the extracted first media content comprises a plurality of content segments, the content segments are different from each other or partially overlapped or completely identical;

the packaging module is used for packaging the edited media content according to a preset packaging mode to obtain a first UGC media content;

the editing module is further used for replacing one content or a plurality of content in the PGC media content with the first UGC media content; alternatively, the first UGC media content is inserted at one or more locations in the PGC media content;

the packaging module is further used for multiplexing the first UGC media content and the PGC media content to obtain second UGC media content;

the extracting module is specifically configured to extract a first media content from the PGC media content by means of a selected start time and end time; or,

9. A user generated content UGC media content generating device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing computer-executable instructions stored by the memory causes the at least one processor to perform the method of generating user produced content, UGC, media content as claimed in any one of claims 1 to 7.

10. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement a method of generating user produced content UGC media content as claimed in any one of claims 1 to 7.