CN115119069A

CN115119069A - Multimedia content processing method, electronic device and computer storage medium

Info

Publication number: CN115119069A
Application number: CN202110285818.2A
Authority: CN
Inventors: 詹亚威; 吴玥
Original assignee: Alibaba Singapore Holdings Pte Ltd
Current assignee: Alibaba Innovation Co
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2022-09-27

Abstract

The embodiment of the application provides a multimedia content processing method, electronic equipment and a computer storage medium, wherein the multimedia content processing method comprises the following steps: receiving input information of a user aiming at the multimedia file; according to the input information, selecting a content part corresponding to the input information from the multimedia archive as designated multimedia content; acquiring text information corresponding to the designated multimedia content; and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file through the sharing information. According to the method and the device, the sharing of the sharing users is more targeted, the shared multimedia content can be known through the sharing information under the condition that multimedia is not conveniently played, and the effect of more efficient multimedia content sharing is achieved.

Description

Multimedia content processing method, electronic device and computer storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a multimedia content processing method, electronic equipment and a computer storage medium.

Background

With the development of computer technology, information dissemination, interaction and the like through multimedia content sharing are increasingly and widely applied to the life and work of people.

At present, when multimedia content is shared, one mode is to share based on a link address URL corresponding to the multimedia, and the other mode is to directly share the complete content of the multimedia. However, in any of the above-mentioned methods, there are situations where multimedia is not conveniently played, for example, in a meeting or in sports, and effective multimedia content sharing cannot be realized in these situations. In addition, as described above, in the two manners, there is also a need for the shared user to play the complete multimedia content to acquire the corresponding information, which makes it inconvenient for the shared user to quickly acquire the key information of some multimedia with a long duration.

Therefore, how to effectively process multimedia content to realize more effective multimedia content sharing becomes a problem to be solved urgently.

Disclosure of Invention

In view of the above, embodiments of the present application provide a multimedia content processing scheme to at least partially solve the above problems.

According to a first aspect of embodiments of the present application, there is provided a multimedia content processing method, including: receiving input information of a user aiming at the multimedia file; according to the input information, selecting a content part corresponding to the input information from the multimedia archive as designated multimedia content; acquiring text information corresponding to the designated multimedia content; and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file through the sharing information.

According to a second aspect of embodiments of the present application, there is provided another multimedia content processing method, including: acquiring appointed multimedia content, wherein the appointed multimedia content is part of content or all content in a multimedia archive; carrying out voice recognition on the appointed multimedia content to obtain corresponding text information, and generating text information corresponding to the appointed multimedia content according to the text information; and generating and sending the sharing information of the multimedia file according to the text information.

According to a third aspect of the embodiments of the present application, there is provided a multimedia content processing method, including: providing a multimedia sharing interface for sharing multimedia content, wherein the multimedia sharing interface at least comprises sharing setting options; receiving input information of a user corresponding to the sharing setting option; selecting at least part of contents from the corresponding multimedia files as designated multimedia contents according to the input information; acquiring text information corresponding to the designated multimedia content; and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file by using the sharing information.

According to a fourth aspect of embodiments of the present application, there is provided an electronic apparatus, including: the display, the input device, the processor, the memory, the communication interface and the communication bus are used for completing mutual communication; the display is used for displaying a multimedia sharing interface for sharing multimedia content, wherein the multimedia sharing interface at least comprises sharing setting options; the input device is used for a user to input information corresponding to the sharing setting option; the memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the multimedia content processing method according to the first aspect or the third aspect.

According to a fifth aspect of embodiments of the present application, there is provided another electronic device, including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the multimedia content processing method according to the second aspect.

According to a sixth aspect of embodiments herein, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the multimedia content processing method as described in the first or second or third aspect.

According to the multimedia content processing scheme provided by the embodiment of the application, when the multimedia file is shared, firstly, the part of the content which the user wants to actually share, namely the designated multimedia content which the user wants to share, can be determined according to the input information of the user aiming at the multimedia file, so that the sharing of the user is more targeted. Secondly, the sharing information generated by the scheme provided by the embodiment of the application is generated according to the text information corresponding to the specified multimedia content, that is, the sharing information not only contains the information of the specified multimedia content to be shared, but also contains the text information corresponding to the specified multimedia content. Therefore, even under a scene that multimedia is inconvenient to play, for example, in a meeting or in motion, the user can know the shared multimedia content through the text information in the sharing information. In addition, no matter the text information or the shared designated multimedia content, the shared user can quickly acquire the key information without playing the complete multimedia content. Thus, more efficient multimedia content sharing is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1A is a flowchart illustrating steps of a multimedia content processing method according to a first embodiment of the present application;

FIG. 1B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 1A;

FIG. 2A is a flowchart illustrating steps of a method for processing multimedia content according to a second embodiment of the present application;

FIG. 2B is a diagram of a user interaction interface in the embodiment of FIG. 2A;

FIG. 2C is a diagram illustrating sharing information according to the embodiment shown in FIG. 2A;

FIG. 2D is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 2A;

FIG. 3A is a flowchart illustrating steps of a method for processing multimedia content according to a third embodiment of the present application;

FIG. 3B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

fig. 4A is a flowchart illustrating steps of a method for processing multimedia contents according to a fourth embodiment of the present application;

FIG. 4B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 4A;

fig. 5A is a flowchart illustrating steps of a method for processing multimedia contents according to a fifth embodiment of the present application;

FIG. 5B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

fig. 6 is a flowchart illustrating steps of a method for processing multimedia contents according to a sixth embodiment of the present application;

fig. 7A is a flowchart illustrating steps of a method for processing multimedia contents according to a seventh embodiment of the present application;

FIG. 7B is a diagram illustrating a multimedia sharing interface according to the embodiment shown in FIG. 7A;

FIG. 7C is a diagram illustrating another multimedia sharing interface in the embodiment shown in FIG. 7A;

FIG. 7D is a diagram illustrating another multimedia sharing interface according to the embodiment shown in FIG. 7A;

fig. 8 is a schematic structural diagram of an electronic device according to an eighth embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to a ninth embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application shall fall within the scope of the protection of the embodiments in the present application.

The following further describes specific implementations of embodiments of the present application with reference to the drawings of the embodiments of the present application.

Example one

Referring to fig. 1A, a flowchart illustrating steps of a multimedia content processing method according to a first embodiment of the present application is shown.

The multimedia content processing method of the embodiment comprises the following steps:

step S102, receiving input information of a user aiming at the multimedia file.

In the embodiment of the present application, the multimedia archive may be any suitable multimedia file, including but not limited to: multimedia archive in audio only form (hereinafter also referred to as audio file), multimedia archive in video only form (hereinafter also referred to as video file), multimedia archive in audio video form (hereinafter also referred to as audio video file).

The input information of the user for the multimedia file is used to indicate the multimedia content that the user actually wants to share, and although the multimedia content that the user actually wants to share may be the whole multimedia content, in many cases, the user actually wants to share may be only a part of the multimedia content. For example, for an audio file that is 10 minutes in duration, the user may want to share only the most exciting segment of the audio file at 5-8 minutes. Based on this, the user can indicate the portion that the user actually wants to share through the input information.

It should be noted that the input information may be in a suitable form, including but not limited to: inputting corresponding information through an input box, such as inputting a starting time point and an ending time point of the content to be shared; or, a data frame or a playing waveform of the multimedia file is displayed in a corresponding manner, and a portion to be actually shared is input by operating the data frame or the playing waveform.

Step S104: and according to the input information, selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content.

As described above, the corresponding portion can be selected from the multimedia archive as the designated multimedia content to be shared according to the input information of the user.

Still taking the foregoing example as an example, for an audio file with a duration of 10 minutes, if the input information of the user indicates that the start time point of the audio file to be shared is 5 th minute and the end time point of the audio file is 8 th minute, an audio clip of 5 th to 8 th minute in the audio file may be intercepted as an audio clip to be shared, that is, the specified multimedia content.

Step S106: and acquiring text information corresponding to the specified multimedia content.

In the embodiment of the application, after the designated multimedia content is determined, the corresponding text information is acquired, and the text information can effectively describe the content information of the designated multimedia content in a text form. For example, voice recognition may be performed on the audio segments of the 5 th to 8 th minutes, and corresponding text information may be obtained according to the recognition result. Taking a song as an example, the text information may be lyrics; taking a podcast as an example, the text information may be the text content of the podcast clip; taking broadcast as an example, the content may be the text content of the broadcast segment; taking a movie or a television as an example, the text information may be a speech of a corresponding segment; taking a conference as an example, the text corresponding to the speaking content of the person in the conference segment may be used. But not limited thereto, in practical applications, a person skilled in the art may also perform processing and processing based on the obtained original text information, so that the obtained text information more effectively characterizes the specified multimedia content.

Step S108: and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file through the sharing information.

In this embodiment, the generated sharing information includes the designated multimedia content and text information corresponding to the designated multimedia content. Therefore, after the sharing user shares the multimedia file by using the sharing information, on one hand, the shared user can acquire the key information of the corresponding multimedia content through the text information; on the other hand, if the designated multimedia content is only part of the multimedia file, the shared user only needs to play the part of the content, and does not need to play all the content of the multimedia file, so that the interaction between the sharing user and the shared user can be realized, and the time cost is saved for the shared user.

The above process is exemplified below with a specific scenario example, as shown in fig. 1B.

In fig. 1B, it is assumed that the multimedia file X is an audio/video file with a duration of 10 minutes, and the sharing user inputs a part of the file content that the sharing user wishes to share, such as the content of 5-8 minutes, through the provided interface. The content of the 5 th to 8 th minutes in the audio-video file will be intercepted, which is schematically indicated as designated multimedia content X' in fig. 1B. Further, the designated multimedia content X' is subjected to speech recognition, for example, speech recognition may be performed based on the audio data in the segment, and the text information P corresponding to the segment is obtained. Then, based on the text information P and the designated multimedia content X', the sharing information of the corresponding multimedia file X can be generated. Furthermore, the sharing user can perform corresponding sharing operation based on the sharing information, so as to share the text information P and the specified multimedia content X' at the same time. Then, the shared user receiving the sharing information can obtain the key content information of the designated multimedia content X 'through the text information P without playing the designated multimedia content X'. In addition, the appointed multimedia content X' can be played and watched, so that the essence content in the multimedia file X can be obtained without playing all the contents of the multimedia file X. Therefore, the sharing efficiency and the sharing pertinence of the multimedia content are greatly improved.

Through the embodiment, when the multimedia files are shared, firstly, the part of the content which the sharing user wants to actually share, namely the appointed multimedia content which the sharing user wants to share, can be determined according to the input information of the sharing user aiming at the multimedia files, so that the sharing of the sharing user is more targeted. Second, the sharing information generated by the scheme provided in this embodiment is generated according to the text information corresponding to the specified multimedia content, that is, the sharing information includes not only the information of the specified multimedia content to be shared, but also the text information corresponding to the specified multimedia content. Therefore, even under a scene that multimedia is inconvenient to play, for example, in a meeting or in motion, the shared user can know the shared multimedia content through the text information in the sharing information. In addition, no matter the text information or the shared designated multimedia content, the shared user can quickly acquire the key information without playing the complete multimedia content. Thus, more efficient multimedia content sharing is achieved.

Example two

Referring to fig. 2A, a flowchart illustrating steps of a multimedia content processing method according to a second embodiment of the present application is shown.

The present embodiment takes the implementation of multimedia content processing at a client as an example, and explains the multimedia content processing scheme provided in the present embodiment.

step S202: and receiving input information of a user aiming at the multimedia file.

Generally, the user's input information for the multimedia archive can be received through a user interactive interface provided, which can be set appropriately by those skilled in the art according to actual needs.

The input information input by the user may include at least one of: the method comprises the steps of aiming at the start time information to be shared and the end time information to be shared of a multimedia file, and aiming at the start frame position information to be shared and the end frame position information to be shared in the multimedia file. Fig. 2B shows a user interaction interface for inputting the input information, and fig. 2B shows the manner in which the two types of input can be performed, but in practical applications, one setting may be selected. Of course, other interface modes for inputting the input information are also suitable. If the input information adopts the mode of the start time information to be shared and the end time information to be shared, the setting is simple, the input is simple for a user, and the user operation is convenient. If the input information adopts the mode of the position information of the initial frame to be shared and the position information of the end frame to be shared, the selection accuracy can be greatly improved, and more accurate content sharing setting is realized.

In one possible approach, the input information further includes: identification information of the multimedia archive. The identification information may be any suitable information that can uniquely identify the multimedia file, including but not limited to the name, identification number, etc. of the multimedia file. In this case, the user can input information without playing a multimedia file, or input information to another multimedia file when playing a multimedia file. Therefore, the flexibility of operation is improved, and the sharing flexibility is further improved.

Furthermore, when the multimedia archive is an audio file or an audio-video file, in a possible manner, receiving the input information of the user for the multimedia archive from the user interaction interface may be implemented as: displaying waveforms and/or frequency spectrums corresponding to audio data in the multimedia files in a user interaction interface; receiving the interception operation of a user on the displayed waveform and/or frequency spectrum; and determining the input information according to the intercepting operation. Because the audio data can be presented in the form of a waveform or a spectrum, in the present embodiment, the audio data in the multimedia file is presented in the form of a waveform and/or a spectrum. On one hand, the display mode and the display page are more attractive and beautiful; on the other hand, the user can perform an accurate audio frame capturing operation based on this, so that the content desired to be shared can be accurately determined.

Similarly, for the video file, the video file can also be displayed in the form of video data frames, so as to facilitate the user to perform accurate intercepting operation.

Step S204: and according to the input information, selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content.

After the input information is input in the manner described above, a corresponding content part can be selected from the multimedia file as the designated multimedia content to be shared. For example, if the starting time point of 5 minutes and the ending time point of 8 minutes are inputted, the content between 5-8 minutes will be intercepted from the multimedia file as the designated multimedia content. If the 10 th frame of the start frame and the 50 th frame of the end frame are inputted, the content between the 10 th frame and the 50 th frame is intercepted from the multimedia file as the designated multimedia content.

It should be noted that, for some multimedia files, the sharing user may want to share all the multimedia files, for example, a video with a duration of 3 minutes, in this case, the selection of all the contents of the video may be realized through the input information, and at this time, the designated multimedia content is the whole content of the multimedia file.

Step S206: and locally acquiring text information corresponding to the specified multimedia content.

Since the designated multimedia content is from a multimedia archive, as previously described, the multimedia archive may be implemented in a variety of forms such as an audio file, a video file, and an audio-video file.

When the multimedia file is an audio file or an audio/video file, acquiring text information corresponding to the specified multimedia content, wherein the text information comprises: acquiring audio information in appointed multimedia content; and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

The audio/video file has data of two tracks of audio stream and video stream, so that the data of the audio stream track can be obtained from the audio/video file, wherein the data corresponding to the designated multimedia content part is the audio information in the designated multimedia content, and then the subsequent voice recognition is carried out.

In a feasible manner, in order to increase the speed of acquiring the text information, local speech recognition may be performed on the audio information to acquire text information corresponding to the specified multimedia content. For example, the audio information may be subjected to speech recognition using a speech recognition algorithm, such as an ASR algorithm, embedded in the application of the multimedia archive, so as to obtain text information corresponding to the specified multimedia content. Or, the application of the multimedia archive can call a locally stored speech recognition algorithm to perform speech recognition on the audio information, so as to obtain text information corresponding to the specified multimedia content.

That is, the speech recognition algorithm may be embedded in the application of the multimedia file, or may be installed locally at the client, and the application of the multimedia file is called by a calling method to realize speech recognition. Through an embedded mode, a voice recognition algorithm and application are integrated, and voice recognition can be realized more efficiently and quickly; and through the calling mode, external resources can be effectively utilized, and the design complexity and the implementation cost of the application where the multimedia file is located are reduced.

Step S208: and acquiring the sharing information of the multimedia file generated according to the text information.

After the designated multimedia content and the corresponding text information are obtained, the sharing information of the multimedia file can be generated according to the designated multimedia content and the corresponding text information.

In this embodiment, a manner of locally generating shared information is adopted. For example, the sharing information may be generated according to the designated multimedia content and the corresponding text information thereof by the application of the multimedia file.

In a feasible manner, the sharing information of the multimedia file can be generated according to a preset format and the text information. The preset format may be used to indicate a presentation form of the generated sharing information, for example, how the interface is arranged, how the layout formats of the specified multimedia content and the text information are set, how the specified multimedia content carried in the sharing information is presented, and the like. Based on the method, the shared information with different styles and different display modes can be realized.

During specific implementation, optionally, after the text information and the specified multimedia content are arranged in a mixed manner according to a preset layout format, corresponding sharing information is generated, where the sharing information includes at least one of the following: and pictures, or audio, or video, or audio and video for sharing. Since the designated multimedia content itself may be in the form of audio, video or audio-video, the shared information may also be presented in a corresponding form. But not limited thereto, the specified multimedia content may also be represented in the form of a picture, such as a picture corresponding to audio (e.g. a cover picture representing audio), a picture corresponding to video (e.g. a cover picture representing video or a frame in video), and a picture corresponding to audio/video (e.g. a cover picture representing audio or a frame in audio/video). An example of shared information generated after shuffling is shown in fig. 2C, where fig. 2C shows a name "2035, where do you go? "sharing information of the podcast, which designates the multimedia content as 02 in the podcast: 06 to 04: 00, the corresponding text information is shown as the text portion marked in the lower dashed box in fig. 2C. Meanwhile, the sharing information further includes a cover picture (as shown in the upper left corner of fig. 2C) corresponding to the podcast, and the text below the picture is identification information of the podcast, including name information, anchor information of the podcast, album information to which the podcast belongs, and the like. In addition, the waveform shown in the middle of fig. 2C is an audio indication of the designated multimedia content of the podcast, and the designated multimedia content can be played by clicking the triangle play button on the left side of the waveform.

Step S210: and sharing the multimedia file through the sharing information.

For example, the sharing information is used for sharing operation, and the sharing information is shared to a corresponding target application or a social network, so that the multimedia file can be shared.

The above process is exemplified below in the form of a simple scenario example, as shown in fig. 2D.

In fig. 2D, an audio file is set to be played in an application, and the corresponding spectrum data is set to be displayed in the playing interface of the audio file, again in the aforementioned "2035, where are you going? The podcast of "is an example, and the user can also perform audio interception through the playing interface while playing the podcast. Furthermore, if the user clicks a "share" button in the interface, local speech recognition is performed on the captured audio clip to obtain corresponding text information. Further, corresponding sharing information is locally generated based on the text information and the intercepted audio clip. Optionally, corresponding sharing information may be generated locally in conjunction with the cover page of the podcast, in addition to the text information and the intercepted audio clip. The user can share the content part which is wanted to be shared in the podcast by using the sharing information.

Therefore, according to the embodiment, when the multimedia file is shared, firstly, the part of the content which the sharing user wants to actually share, namely the designated multimedia content which the sharing user wants to share, can be determined according to the input information of the sharing user for the multimedia file, so that the sharing of the sharing user is more targeted. Second, the sharing information generated by the scheme provided in this embodiment is generated according to the text information corresponding to the specified multimedia content, that is, the sharing information includes not only the information of the specified multimedia content to be shared, but also the text information corresponding to the specified multimedia content. Therefore, even under a scene that multimedia is inconvenient to play, for example, in a meeting or in motion, the shared user can know the shared multimedia content through the text information in the sharing information. In addition, no matter the text information or the shared designated multimedia content, the shared user can quickly acquire the key information without playing the complete multimedia content. Thus, more efficient multimedia content sharing is achieved.

EXAMPLE III

Referring to fig. 3A, a flowchart illustrating steps of a multimedia content processing method according to a third embodiment of the present application is shown.

In this embodiment, a multimedia content processing scheme provided in this embodiment is described by taking a client as an example to implement multimedia content processing in combination with a cloud.

step S302: input information of a user for the multimedia file is received.

Step S304: and according to the input information, selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content.

The specific implementation of the steps S302-S304 can refer to the related description in the steps S202-S204 in the second embodiment, and will not be described herein again.

Step S306: and acquiring text information corresponding to the specified multimedia content through a cloud terminal.

Different from the second embodiment in which the text information corresponding to the specified multimedia content is obtained locally, in this embodiment, the text information is obtained through a cloud.

In a feasible mode, the appointed multimedia content can be uploaded to the cloud, the audio information in the appointed multimedia content is obtained by the cloud, and the audio information is subjected to voice recognition at the cloud to obtain text information corresponding to the appointed multimedia content.

In another possible mode, the audio information in the designated multimedia content can be obtained at the client, then the audio information is sent to the cloud, and the text information corresponding to the designated multimedia content returned by the cloud after the audio information is subjected to voice recognition is received.

In the mode, the cloud end is provided with the corresponding voice recognition algorithm, so that the voice recognition burden of the client end can be greatly reduced, the requirements on the equipment software and hardware of the client end are met, and the implementation cost of the client end is reduced. Moreover, the sharing of the voice recognition algorithm resources can be realized through the cloud.

In addition, if the multimedia file is a video file, the multimedia file can be processed by the cloud in the step, and corresponding text information, such as word line information, can be obtained. In one mode, text recognition can be carried out on a speech area of a video frame image through a cloud end to obtain speech information corresponding to the speech area; in another mode, if the video file carries independent speech-line data, the data can be analyzed through the cloud, and corresponding speech-line information can be obtained. Of course, other methods of processing the video frame data to obtain the corresponding text information are also applicable to the embodiment of the present application.

Step S308: and acquiring sharing information of the multimedia file generated according to the text information.

Step S310: and sharing the multimedia file through the sharing information.

The above steps S308 to S310 are still executed locally at the client, and the specific implementation thereof can refer to the description of the corresponding steps in the second embodiment, which is not described herein again.

That is, in this embodiment, the audio recognition portion is handed to the cloud to be completed, and the other portions are still completed locally at the client, and finally, corresponding sharing information is generated to be shared.

The above process is exemplified below in the form of a scenario example, as shown in fig. 3B.

In fig. 3B, it is set that an audio file is played in an application, and it is set that settings such as a sharing button and the like that enable sharing are displayed in a playing interface of the audio file are displayed, and the aforementioned "2035, where are you going? The podcast of "is an example, and the user can also perform audio interception through the playing interface while playing the podcast. And then, if the user clicks a sharing button in the interface, the intercepted audio clip is uploaded to the cloud, and the cloud performs the voice recognition operation on the audio clip to obtain the corresponding text information. Further, the cloud end returns the text information to the client, and the client locally generates corresponding sharing information based on the text information and the intercepted audio clip. Optionally, corresponding sharing information may be generated locally in conjunction with the cover page of the podcast, in addition to the text information and the intercepted audio clip. The user can share the content part which is wanted to be shared in the podcast by using the sharing information.

Example four

Referring to fig. 4A, a flowchart illustrating steps of a multimedia content processing method according to a fourth embodiment of the present application is shown.

In this embodiment, the multimedia content processing scheme provided in the embodiment of the present application is still described by taking an example in which a client side is combined with a cloud side to implement multimedia content processing.

step S402: and receiving input information of a user aiming at the multimedia file.

Step S404: and according to the input information, selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content.

The specific implementation of the steps S402-S404 can refer to the related description in the steps S202-S204 in the second embodiment, and will not be described herein again.

Step S406: and acquiring text information corresponding to the specified multimedia content through a cloud.

For the specific implementation of this step, reference may be made to the related description of step S306 in the third embodiment, which is not described herein again.

Step S408: and generating sharing information of the multimedia file according to the text information through the cloud.

Different from the foregoing embodiments, in the present embodiment, the generation of the sharing information is completed by the cloud. That is, after the text information corresponding to the specified multimedia content is obtained through voice recognition, the cloud end can generate corresponding sharing information according to the text information.

Similar to the client, the cloud end can also generate sharing information of the multimedia file according to a preset format and the text information. For example, after the text information and the specified multimedia content are arranged in a mixed manner according to a preset layout format, corresponding sharing information is generated, where the sharing information includes at least one of the following: and pictures, or audio, or video, or audio and video for sharing.

Step S410: and receiving sharing information returned by the cloud end, and sharing the multimedia file through the sharing information.

After the cloud generates the sharing information, the cloud sends the sharing information back to the client, and the client shares the sharing information.

That is, in this embodiment, the generation of the audio identification and the sharing information is completed by the cloud, and the other parts are completed locally at the client, and finally the client performs the sharing operation. Therefore, the data processing burden of the client is further reduced, requirements on equipment software and hardware of the client are further met, and the implementation cost of the client is reduced. And moreover, the sharing of the voice recognition algorithm and other resources can be realized through the cloud.

The above process is exemplified below in the form of a scenario example, as shown in fig. 4B.

In fig. 4B, it is set that an audio file is played in a certain application, and settings such as a sharing button and the like that can be shared are displayed in a playback interface of the audio file, which is the case of you going in the aforementioned "2035? The podcast of "is an example, and the user can also perform audio interception through the playing interface while playing the podcast. Furthermore, if the user clicks a sharing button in the interface, the intercepted audio clip is uploaded to the cloud, and the cloud performs the voice recognition operation on the audio clip to obtain the corresponding text information. Further, the cloud end can also generate corresponding sharing information based on the text information and the intercepted audio clip. Optionally, corresponding shared information may be generated in conjunction with the cover page of the podcast, in addition to the text information and the intercepted audio clip. And then, the cloud end returns the sharing information to the client end, and a user of the client end uses the sharing information to share the content part which is wanted to be shared in the podcast.

Therefore, according to the embodiment, when the multimedia file is shared, firstly, the part of the content which the sharing user wants to actually share, namely the designated multimedia content which the sharing user wants to share, can be determined according to the input information of the sharing user for the multimedia file, so that the sharing of the sharing user is more targeted. Then, the shared information generated by the scheme provided in this embodiment is generated according to the text information corresponding to the specified multimedia content, that is, the shared information includes not only the information of the specified multimedia content to be shared, but also the text information corresponding to the specified multimedia content. Therefore, even under a scene that multimedia is inconvenient to play, for example, in a meeting or in motion, the shared user can know the shared multimedia content through the text information in the sharing information. In addition, no matter the text information or the shared designated multimedia content, the shared user can quickly acquire the key information without playing the complete multimedia content. Thus, more efficient multimedia content sharing is achieved.

EXAMPLE five

Referring to fig. 5A, a flowchart illustrating steps of a multimedia content processing method according to a fifth embodiment of the present application is shown.

In the foregoing embodiments, the operation on a single multimedia file is taken as an example, and in this embodiment, the operation on multiple multimedia files is taken as an example, so as to describe the multimedia content processing method in the embodiment of the present application.

step S502: input information is received from a user for each of a plurality of multimedia files.

In this step, the input information of the user for the multimedia files is received and realized as the input information for a plurality of multimedia files. In specific implementation, corresponding input information is input for each of the plurality of multimedia files, and the input information of different multimedia files may be the same or different. The input of the input information may be in the form of multi-interface input, each interface corresponds to a multimedia file, and the implementation of each interface may refer to the foregoing fig. 2B. Or, the input of the input information can also adopt a single interface list form, and the corresponding setting for inputting the input information is set for each multimedia file on the same interface. It should be apparent to those skilled in the art that other forms of inputting information are also applicable to the embodiments of the present application.

Step S504: and selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content according to the input information.

In one possible approach, a plurality of content parts may be obtained according to the content part corresponding to the input information in each multimedia archive; and synthesizing the plurality of content parts, and taking the synthesized content as the designated multimedia content.

In this way, after the input information of each multimedia file is determined, a plurality of corresponding multimedia contents can be obtained and synthesized, for example, the specified multimedia contents can be generated by performing splicing synthesis according to a preset sequence or a content acquisition time sequence. Therefore, the designated multimedia content includes the content that the user wants to share in each multimedia file. By the method, the multimedia content parts are combined into a whole, centralized processing is facilitated, and processing efficiency is improved.

In another possible approach, the content portions of each multimedia archive corresponding to the input information may be treated as a plurality of designated multimedia contents. That is, each multimedia file corresponds to one designated multimedia content to form a plurality of designated multimedia contents. By the method, the obtained plurality of multimedia contents do not need to be merged, and the speed of determining the designated multimedia contents is improved.

Step S506: and acquiring text information corresponding to the specified multimedia content.

If the designated multimedia content is the combined multimedia content, the corresponding text information can be obtained for the whole designated multimedia content, and higher text information obtaining efficiency can be achieved.

If the designated multimedia content includes a plurality of designated multimedia contents, the designated multimedia contents need to be processed respectively to obtain respective corresponding text information, so that the obtained respective text information is more accurate.

The specific manner of obtaining the text information corresponding to the multimedia content may refer to the description of the relevant parts in the foregoing embodiments, and is not described herein again.

Step S508: and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file through the sharing information.

After the designated multimedia content is the merged multimedia content and the corresponding text information is obtained, the sharing information may be generated according to the designated multimedia content obtained by merging and the corresponding text information with reference to the manner of generating the sharing information in the foregoing embodiment, and then sharing is performed based on the sharing information.

If the designated multimedia contents include a plurality of designated multimedia contents and the text information corresponding to each designated multimedia content is obtained, in the step, the plurality of designated multimedia contents can be synthesized to obtain the synthesized multimedia contents; and acquiring the shared information generated according to the synthesized multimedia content and the text information corresponding to each appointed multimedia content. That is, the merging of the multimedia contents is realized in the step, and each text message can describe the corresponding designated multimedia content more accurately, so that the text messages in the shared information generated in the manner are relatively more accurate.

The above process is exemplified below in the form of a scenario example, as shown in fig. 5B.

In fig. 5B, an audio file X is set to be played in a certain application, a "share" button is set to be displayed in a playing interface of the audio file X, and if a user clicks the "share" button in the process of playing the audio file X, a share interface is displayed, and in addition to a setting for inputting input information corresponding to the audio file X, a setting for adding sharing of other multimedia files is set in the share interface, as shown by a plus sign in fig. 5B. Clicking on the plus sign triggers presentation of audio files played by multiple users, illustrated as audio files Y and Z in fig. 5B. If a certain audio file is selected, the audio file is additionally displayed in the sharing interface, and the corresponding input information setting is correspondingly displayed. In this example, the example is that both audio files Y and Z are selected, and the input information for them is as shown in fig. 5B.

In fig. 5B, the setting input information is: corresponding to the 5 th-8 th minutes of audio file X, corresponding to the 3 rd-5 th minutes of audio file Y, and corresponding to the 4 th-6 th minutes of audio file Z. When a 'confirm' button in the sharing interface is clicked, intercepting the 5 th to 8 th minutes of the audio file X, and recording the intercepted audio file as a fragment 1; intercepting the 3 rd to 5 th minutes of the audio file Y, and recording as a segment 2; the 4-6 minutes of audio file Z are intercepted and recorded as clip 3. Segments 1, 2 and 3 are combined to generate the specified multimedia content. And then, obtaining the text information corresponding to the appointed multimedia content. Then, sharing information is generated based on the text information and the multimedia content obtained by synthesis. The user can share the audio files X, Y and Z by publishing the sharing information.

It can be seen that, according to the embodiment, in addition to the effect that can be achieved by the foregoing embodiment, the sharing user can share different contents in multiple multimedia files at the same time, and it should be noted that, when multiple multimedia files have relevance, for example, the same song sung by multiple performers, the sharing user can also perform singing comparison in this way while achieving sharing. For another example, if there are multiple multimedia files with the same actor, such as multiple parts of a tv show, or multiple movies or tv shows performed by the same actor, then by this means, editing of different parts of the same actor can be achieved, which is convenient for the shared users to view with pertinence.

EXAMPLE six

Referring to fig. 6, a flowchart illustrating steps of a multimedia content processing method according to a sixth embodiment of the present application is shown.

In this embodiment, the cloud is set to participate in the multimedia content processing, so that the multimedia content processing method provided in the embodiment of the present application is described from the perspective of the cloud.

step S602: the specified multimedia content is acquired.

Wherein, the appointed multimedia content is part of or all of the content in the multimedia archive. The specified multimedia content can be uploaded to the cloud after being acquired by the client, and can also be acquired by the cloud according to information sent by the client.

In the case of obtaining information sent by the client by the cloud, the following may be used: for example, the cloud may obtain start position information and end position information for indicating the specified multimedia content in a multimedia archive, and intercept the multimedia archive according to the start position information and the end position information to obtain the specified multimedia content. Or the cloud end acquires start time information and end time information which are used for indicating the designated multimedia content in a multimedia archive, and intercepts the multimedia archive according to the start time information and the end time information to acquire the designated multimedia content.

In the above manner, the cloud end needs to store the corresponding multimedia file, and the client end only needs to upload the corresponding input information, such as the time information or the position information, and the cloud end captures the content of the multimedia file to obtain the specified multimedia content to be shared. Through the mode that high in the clouds acquireed, can alleviate the data processing and the data transmission burden of client greatly.

Step S604: and carrying out voice recognition on the appointed multimedia content to obtain corresponding text information, and generating text information corresponding to the appointed multimedia content according to the text information.

In specific implementation, the audio data may be parsed from the specified multimedia content; and then carrying out voice recognition on the audio data to obtain corresponding character information, and generating text information corresponding to the specified multimedia content according to the character information. In actual voice recognition, the recognition result is usually character information, on one hand, the step can be combined based on the character information to obtain text information; on the other hand, text information can be obtained after further processing is performed on the basis of the word information, such as sensitive word screening processing, replacement processing and the like. Therefore, the obtained text information is more in line with the actual service requirement.

Step S606: and generating and sending the sharing information of the multimedia file according to the text information.

The specific implementation of this step may refer to the related implementation in the foregoing embodiment, for example, the text information and the specified multimedia content are arranged in a mixed manner according to a preset layout format, so as to generate the shared information. Optionally, the shared information may include at least one of the following: the device is used for sharing pictures, audio, video and audio/video.

Step S608: and sending the generated sharing information to the client.

That is, the generated sharing information is sent to the client, so that the client can share the corresponding multimedia file based on the sharing information.

Through this embodiment, participate in multimedia content by the high in the clouds and share, except that can make sharing user's share more corresponding, under the scene of not being convenient for broadcast multimedia, also can know the multimedia content of sharing through sharing information, realize more efficient multimedia content and share, still alleviateed the data processing and the data transmission burden of client greatly, reduced the software and hardware requirement to the client.

EXAMPLE seven

Referring to fig. 7A, a flowchart illustrating steps of a multimedia content processing method according to a seventh embodiment of the present application is shown.

In this embodiment, the multimedia content processing method provided in the embodiment of the present application is still described from the perspective of the client, but the difference between the foregoing embodiments of the client is that the present embodiment focuses on an exemplary description of an interface provided by the client.

step S702: a multimedia sharing interface for sharing multimedia content is provided.

The multimedia sharing interface at least comprises sharing setting options and is used for setting the content to be shared.

In one possible approach, the sharing setup option includes: and carrying out content interception options based on the waveform and/or the frequency spectrum corresponding to the audio data in the multimedia archive. An exemplary interface is shown in fig. 7B.

In another possible manner, the sharing setting option includes: an intercept start time information option and an intercept end time information option for indicating that content is to be intercepted from the multimedia archive, an exemplary interface is shown in fig. 7C. Or, the sharing setting option includes: an interception start location information option and an interception end location information option for indicating content to be intercepted from the multimedia archive, an exemplary interface is shown in fig. 7D.

In addition, optionally, in the interfaces shown in fig. 7C and 7D, an identifier input option of the multimedia archive may be added to select the multimedia archive, and more multimedia archives may be added through their corresponding plus signs for setting. In fig. 7C and 7D, the multimedia archive identifier takes the form of a name, but it will be apparent to those skilled in the art that other identifier forms are equally applicable.

Step S704: and receiving input information of the user corresponding to the sharing setting option.

The user can input the input information aiming at the multimedia files based on the multimedia sharing interface.

Step S706: and selecting at least part of contents from the corresponding multimedia files as appointed multimedia contents according to the input information.

For example, the content corresponding to the input information is intercepted from the corresponding multimedia archive as the designated multimedia content.

Step S708: and acquiring text information corresponding to the specified multimedia content.

For example, audio information in the specified multimedia content may be obtained; and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

When the audio information is subjected to voice recognition to obtain the text information corresponding to the specified multimedia content, in a feasible manner, the audio information may be subjected to local voice recognition to obtain the text information corresponding to the specified multimedia content. For example, a speech recognition algorithm embedded in an application of the multimedia archive may be used to perform speech recognition on the audio information to obtain text information corresponding to the specified multimedia content; or calling a locally stored voice recognition algorithm through the application of the multimedia archive, and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

In another possible way, the audio information may be sent to a cloud, and text information corresponding to the specified multimedia content and returned by the cloud after performing voice recognition on the audio information is received.

Step S710: and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file by using the sharing information.

For example, the sharing information of the multimedia archive generated according to the preset format and the text information may be acquired. In specific implementation, the shared information generated after the text information and the specified multimedia content are mixed and arranged according to a preset typesetting format can be acquired, and the shared information comprises at least one of the following: the device is used for sharing pictures, audio, video and audio/video.

In a feasible manner, the sharing information of the multimedia file can be locally generated at the client according to a preset format and the text information.

In another possible way, the client may receive the sharing information of the multimedia file, which is generated and returned by the cloud according to a preset format and the text information.

It should be noted that, the above process descriptions in this embodiment are relatively brief, and specific implementation manners thereof can be realized by referring to the descriptions of relevant parts in the foregoing embodiments, and detailed descriptions thereof are omitted here.

Through the embodiment, the multimedia sharing interface is provided, so that the user can conveniently share the setting through the interface, and can perform subsequent multimedia content processing based on the setting, so that the sharing of the sharing user is more targeted, the shared multimedia content can be known through the sharing information under the scene that the multimedia is not convenient to play, and the effect of more efficient multimedia content sharing is realized.

Example eight

Referring to fig. 8, a schematic structural diagram of an electronic device according to an eighth embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 8, the electronic device may include: a display 800, an input device 801, a processor 802, a communication Interface 804, a memory 806, and a communication bus 808.

Wherein:

the display 800, input device 801, processor 802, communication interface 804, and memory 806 communicate with each other via a communication bus 808. The display 800 and the input device 801 may be provided separately or may be combined, for example, as a touch input screen.

A communication interface 804 for communicating with other electronic devices or servers.

The display 800 is configured to display a multimedia sharing interface for sharing multimedia content, where the multimedia sharing interface at least includes a sharing setting option.

An input device 801, configured to allow a user to input information corresponding to the sharing setting option.

The processor 802 is configured to execute the program 810, and may specifically execute the relevant steps in the above-described multimedia content processing method embodiment of the client.

In particular, the program 810 may include program code comprising computer operating instructions.

The processor 802 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 806 stores a program 810. The memory 806 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 810 may be specifically configured to cause the processor 802 to perform the following operations: receiving input information of a user aiming at the multimedia file; according to the input information, selecting a content part corresponding to the input information from the multimedia archive as designated multimedia content; acquiring text information corresponding to the designated multimedia content; and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file through the sharing information.

In an optional embodiment, the sharing setting option includes: and carrying out content interception options based on the waveform and/or the frequency spectrum corresponding to the audio data in the multimedia archive.

In an optional embodiment, the sharing setting option further includes: an identification input option for the multimedia archive.

In an alternative embodiment, the program 810 is further configured to cause the processor 802, when obtaining text information corresponding to the specified multimedia content: acquiring audio information in the designated multimedia content; and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

In an alternative embodiment, the program 810 is further configured to cause the processor 802, when performing speech recognition on the audio information to obtain text information corresponding to the specified multimedia content: performing local voice recognition on the audio information to obtain text information corresponding to the specified multimedia content; or sending the audio information to a cloud end, and receiving text information corresponding to the specified multimedia content returned by the cloud end after the audio information is subjected to voice recognition.

In an alternative embodiment, the program 810 is further configured to enable the processor 802, when performing local speech recognition on the audio information to obtain text information corresponding to the specified multimedia content: performing voice recognition on the audio information by using a voice recognition algorithm embedded in an application of the multimedia file to obtain text information corresponding to the specified multimedia content; or calling a locally stored voice recognition algorithm through the application of the multimedia archive, and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

In an alternative embodiment, the program 810 is further configured to enable the processor 802, when obtaining the sharing information of the multimedia archive generated according to the text information: and acquiring the sharing information of the multimedia file generated according to a preset format and the text information.

In an alternative embodiment, the program 810 is further configured to enable the processor 802, when obtaining the sharing information of the multimedia archive generated according to the preset format and the text information: obtaining sharing information generated after the text information and the specified multimedia content are mixed and arranged according to a preset typesetting format, wherein the sharing information comprises at least one of the following: the picture, audio, video and audio and video sharing device is used for sharing pictures, audio, video and audio and video.

In an alternative embodiment, the program 810 is further configured to enable the processor 802, when obtaining the sharing information of the multimedia archive generated according to the preset format and the text information: locally generating sharing information of the multimedia file according to a preset format and the text information; or receiving the sharing information of the multimedia file, which is generated and returned by the cloud according to a preset format and the text information.

In an alternative embodiment, the program 810 is further configured to cause the processor 802, upon receiving user input information for a multimedia archive: receiving input information of a user for the multimedia archive from a user interaction interface, the input information comprising one of: and aiming at the information of the start time to be shared and the information of the end time to be shared of the multimedia file, and aiming at the information of the position of the start frame to be shared and the position of the end frame to be shared in the multimedia file.

In an optional embodiment, the input information further comprises: identification information of the multimedia archive.

In an alternative embodiment, the program 810 is further operative to cause the processor 802, upon receiving user input information for the multimedia archive from the user interaction interface: displaying a waveform and/or a frequency spectrum corresponding to the audio data in the multimedia archive in a user interaction interface; receiving interception operation of the waveform and/or the frequency spectrum displayed by the user; and determining the input information according to the intercepting operation.

In an alternative embodiment, the program 810 is further configured to cause the processor 802, upon receiving user input information for a multimedia archive: receiving input information of a user for each of a plurality of multimedia files; the program 810 is further for causing the processor 802, when selecting a content portion corresponding to the input information from the multimedia archive as the specified multimedia content based on the input information: obtaining a plurality of content parts according to the content part corresponding to the input information in each multimedia file; and synthesizing the plurality of content parts, and taking the synthesized content as the designated multimedia content.

In an alternative embodiment, the program 810 is further operative to cause the processor 802, upon receiving user input information for the multimedia archive: receiving input information of a user for each multimedia file in a plurality of multimedia files; the program 810 is further for causing the processor 802, when selecting a content portion from the multimedia archive corresponding to the input information as the designated multimedia content based on the input information, to: taking the content part corresponding to the input information in each multimedia file as a plurality of designated multimedia contents; the program 810 is further configured to cause the processor 802, when obtaining the sharing information of the multimedia archive generated according to the text information: synthesizing the plurality of designated multimedia contents to obtain synthesized multimedia contents; and acquiring the shared information generated according to the synthesized multimedia content and the text information corresponding to each appointed multimedia content.

For specific implementation of each step in the program 810, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing embodiment of the method for processing multimedia content at the client, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

Through the electronic equipment of the embodiment, the sharing of the sharing users is more targeted, the shared multimedia content can be known through the sharing information under the scene that multimedia is not conveniently played, and the effect of more efficient multimedia content sharing is achieved.

Example nine

Referring to fig. 9, a schematic structural diagram of an electronic device according to a ninth embodiment of the present application is shown, and the specific embodiment of the present application does not limit a specific implementation of the electronic device.

As shown in fig. 9, the electronic device may include: a processor (processor)902, a communication Interface 904, a memory 906, and a communication bus 908.

Wherein:

the processor 902, communication interface 904, and memory 906 communicate with one another via a communication bus 908.

A communication interface 904 for communicating with other electronic devices or servers.

The processor 902 is configured to execute the program 910, which may specifically execute the relevant steps in the foregoing cloud-based multimedia content processing method embodiment.

In particular, the program 910 may include program code that includes computer operating instructions.

The processor 902 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present application. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

A memory 906 for storing a program 910. The memory 906 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 910 may specifically be configured to cause the processor 902 to perform the following operations: acquiring appointed multimedia content, wherein the appointed multimedia content is part of content or all content in a multimedia file; carrying out voice recognition on the appointed multimedia content to obtain corresponding character information, and generating text information corresponding to the appointed multimedia content according to the character information; and generating and sending the sharing information of the multimedia file according to the text information.

In an alternative embodiment, the program 910 further causes the processor 902, in obtaining the specified multimedia content, to: acquiring starting position information and ending position information for indicating the designated multimedia content in a multimedia archive, and intercepting the multimedia archive according to the starting position information and the ending position information to acquire the designated multimedia content; or, acquiring start time information and end time information for indicating the designated multimedia content in a multimedia archive, and intercepting the multimedia archive according to the start time information and the end time information to acquire the designated multimedia content.

In an optional implementation manner, the program 910 is further configured to, when the processor 902 performs speech recognition on the specified multimedia content, obtains corresponding text information, and generates text information corresponding to the specified multimedia content according to the text information: analyzing audio data from the designated multimedia content; and carrying out voice recognition on the audio data to obtain corresponding character information, and generating text information corresponding to the specified multimedia content according to the character information.

In an alternative embodiment, the program 910 is further configured to enable the processor 902, when generating and sending the sharing information of the multimedia archive according to the text information, to: and performing mixed arrangement on the text information and the specified multimedia content according to a preset typesetting format to generate shared information, wherein the shared information comprises at least one of the following: the device is used for sharing pictures, audio, video and audio/video.

For specific implementation of each step in the program 910, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing embodiment of the method for processing multimedia content at the cloud, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

Through the electronic equipment of the embodiment, the cloud end participates in multimedia content sharing, sharing of sharing users is more targeted, the shared multimedia content can be known through sharing information under the scene that multimedia is not conveniently played, more efficient multimedia content sharing is achieved, data processing and data transmission burden of a client side is greatly reduced, and software and hardware requirements on the client side are reduced.

It should be noted that, according to implementation needs, each component/step described in the embodiment of the present application may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present application.

The above-described methods according to embodiments of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the methods described herein may be stored in such software processes on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the multimedia content processing methods described herein. Further, when a general-purpose computer accesses code for implementing the multimedia content processing method illustrated herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the multimedia content processing method illustrated herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The above embodiments are only used for illustrating the embodiments of the present application, and not for limiting the embodiments of the present application, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present application, so that all equivalent technical solutions also belong to the scope of the embodiments of the present application, and the scope of the patent protection of the embodiments of the present application should be defined by the claims.

Claims

1. A multimedia content processing method, comprising:

receiving input information of a user aiming at the multimedia file;

according to the input information, selecting a content part corresponding to the input information from the multimedia archive as designated multimedia content;

acquiring text information corresponding to the designated multimedia content;

and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file through the sharing information.

2. The method of claim 1, wherein the obtaining of the text information corresponding to the specified multimedia content comprises:

acquiring audio information in the designated multimedia content;

and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

3. The method of claim 2, wherein the performing speech recognition on the audio information to obtain text information corresponding to the specified multimedia content comprises:

performing local voice recognition on the audio information to obtain text information corresponding to the specified multimedia content;

or,

and sending the audio information to a cloud end, and receiving text information corresponding to the specified multimedia content returned by the cloud end after carrying out voice recognition on the audio information.

4. The method of claim 3, wherein the performing local speech recognition on the audio information to obtain text information corresponding to the specified multimedia content comprises:

performing voice recognition on the audio information by using a voice recognition algorithm embedded in an application of the multimedia file to obtain text information corresponding to the specified multimedia content;

or,

and calling a locally stored voice recognition algorithm through the application of the multimedia archive, and performing voice recognition on the audio information to obtain text information corresponding to the specified multimedia content.

5. The method of claim 1, wherein the obtaining of the shared information of the multimedia file generated according to the text information comprises:

and acquiring the sharing information of the multimedia file generated according to a preset format and the text information.

6. The method of claim 5, wherein the obtaining of the shared information of the multimedia file generated according to a preset format and the text information comprises:

obtaining sharing information generated after the text information and the specified multimedia content are mixed and arranged according to a preset typesetting format, wherein the sharing information comprises at least one of the following: the device is used for sharing pictures, audio, video and audio/video.

7. The method of claim 5, wherein the obtaining of the shared information of the multimedia file generated according to a preset format and the text information comprises:

locally generating sharing information of the multimedia file according to a preset format and the text information;

or,

and receiving the sharing information of the multimedia file, which is generated and returned by the cloud according to a preset format and the text information.

8. The method of claim 1, wherein the receiving user input information for a multimedia archive comprises:

receiving input information of a user for the multimedia archive from a user interaction interface, the input information comprising one of: and aiming at the information of the start time to be shared and the information of the end time to be shared of the multimedia file, and aiming at the information of the position of the start frame to be shared and the position of the end frame to be shared in the multimedia file.

9. The method of claim 8, wherein the inputting information further comprises: identification information of the multimedia archive.

10. The method of claim 8 or 9, wherein the receiving user input information for the multimedia archive from the user interaction interface comprises:

displaying a waveform and/or a frequency spectrum corresponding to the audio data in the multimedia archive in a user interaction interface;

receiving interception operation of the waveform and/or the frequency spectrum displayed by the user;

and determining the input information according to the intercepting operation.

11. The method of claim 1, wherein,

the receiving of the input information of the user for the multimedia archive comprises: receiving input information of a user for each multimedia file in a plurality of multimedia files;

the selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content according to the input information comprises: obtaining a plurality of content parts according to the content part corresponding to the input information in each multimedia file; and synthesizing the plurality of content parts, and taking the synthesized content as the designated multimedia content.

12. The method of claim 1, wherein,

the receiving of the input information of the user for the multimedia archive comprises: receiving input information of a user for each of a plurality of multimedia files;

the selecting a content part corresponding to the input information from the multimedia archive as the designated multimedia content according to the input information comprises: taking the content part corresponding to the input information in each multimedia file as a plurality of designated multimedia contents;

the acquiring the sharing information of the multimedia file generated according to the text information includes: synthesizing the plurality of designated multimedia contents to obtain synthesized multimedia contents; and acquiring sharing information generated according to the synthesized multimedia content and the text information corresponding to each appointed multimedia content.

13. A multimedia content processing method, comprising:

acquiring appointed multimedia content, wherein the appointed multimedia content is part of content or all content in a multimedia archive;

carrying out voice recognition on the appointed multimedia content to obtain corresponding text information, and generating text information corresponding to the appointed multimedia content according to the text information;

and generating and sending the sharing information of the multimedia file according to the text information.

14. The method of claim 13, wherein the obtaining specified multimedia content comprises:

acquiring starting position information and ending position information for indicating the designated multimedia content in a multimedia archive, and intercepting the multimedia archive according to the starting position information and the ending position information to acquire the designated multimedia content;

or,

and acquiring starting time information and ending time information for indicating the appointed multimedia content in a multimedia archive, and intercepting the multimedia archive according to the starting time information and the ending time information to acquire the appointed multimedia content.

15. The method according to claim 13 or 14, wherein the performing speech recognition on the specified multimedia content to obtain corresponding text information, and generating text information corresponding to the specified multimedia content according to the text information comprises:

analyzing audio data from the designated multimedia content;

and carrying out voice recognition on the audio data to obtain corresponding character information, and generating text information corresponding to the specified multimedia content according to the character information.

16. The method according to claim 13 or 14, wherein the generating and sending the shared information of the multimedia archive according to the text information comprises:

and performing mixed arrangement on the text information and the specified multimedia content according to a preset typesetting format to generate shared information, wherein the shared information comprises at least one of the following: the device is used for sharing pictures, audio, video and audio/video.

17. A multimedia content processing method, comprising:

providing a multimedia sharing interface for sharing multimedia content, wherein the multimedia sharing interface at least comprises sharing setting options;

receiving input information of a user corresponding to the sharing setting option;

selecting at least part of contents from the corresponding multimedia files as designated multimedia contents according to the input information;

acquiring text information corresponding to the designated multimedia content;

and acquiring sharing information of the multimedia file generated according to the text information, and sharing the multimedia file by using the sharing information.

18. The method of claim 17, wherein the sharing setup option comprises: and the multimedia archive further comprises an interception start time information option and an interception end time information option for indicating the content intercepted from the multimedia archive, or an interception start position information option and an interception end position information option for indicating the content intercepted from the multimedia archive.

19. The method of claim 17, wherein the sharing setup option comprises: and carrying out content interception options based on the waveform and/or the frequency spectrum corresponding to the audio data in the multimedia archive.

20. The method of claim 18 or 19, wherein the sharing settings option further comprises: an identification input option for the multimedia archive.

21. The method of claim 18, wherein said selecting at least a portion of content from a corresponding multimedia archive as designated multimedia content based on said input information comprises:

and intercepting the content corresponding to the input information from the corresponding multimedia file as the designated multimedia content.

22. The method of claim 17, wherein the obtaining of the text information corresponding to the specified multimedia content comprises:

acquiring audio information in the designated multimedia content;

23. The method of claim 22, wherein the performing speech recognition on the audio information to obtain text information corresponding to the specified multimedia content comprises:

or,

and sending the audio information to a cloud end, and receiving text information corresponding to the specified multimedia content returned by the cloud end after voice recognition is carried out on the audio information.

24. The method of claim 23, wherein the performing local speech recognition on the audio information to obtain text information corresponding to the specified multimedia content comprises:

or,

25. The method of claim 17, wherein the obtaining shared information of the multimedia archive generated according to the text information comprises:

26. The method of claim 25, wherein the obtaining of the shared information of the multimedia file generated according to a preset format and the text information comprises:

27. The method of claim 25, wherein the obtaining of the shared information of the multimedia file generated according to a preset format and the text information comprises:

or,

28. An electronic device, comprising: the display, the input device, the processor, the memory, the communication interface and the communication bus are used for completing mutual communication;

the display is used for displaying a multimedia sharing interface used for sharing multimedia content, wherein the multimedia sharing interface at least comprises sharing setting options;

the input device is used for a user to input information corresponding to the sharing setting option;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the multimedia content processing method according to any one of claims 1-12.

29. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the multimedia content processing method according to any one of claims 13-16.

30. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements a multimedia content processing method as claimed in any one of claims 1-12, or 13-16, or 17-27.