CN110719518A

CN110719518A - Multimedia data processing method, device and equipment

Info

Publication number: CN110719518A
Application number: CN201810766023.1A
Authority: CN
Inventors: 康琳
Original assignee: Alibaba Group Holding Ltd
Current assignee: Banma Zhixing Network Hongkong Co Ltd
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2020-01-21

Abstract

The embodiment of the invention provides a multimedia data processing method, a device and equipment, wherein the method comprises the following steps: acquiring a caption text corresponding to multimedia data, wherein the caption text comprises a plurality of caption items associated with playing time; identifying a plurality of keywords contained in the subtitle text according to the keyword database; according to the playing time of the caption item corresponding to each of the keywords, marking the keywords on the playing progress control of the multimedia data, namely marking the corresponding time point of the playing progress control to indicate that the playing time corresponds to the related content of the keyword, so that when a user wants to browse the content of a certain keyword, the corresponding content can be conveniently positioned according to the marking result.

Description

Multimedia data processing method, device and equipment

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a method, an apparatus, and a device for processing multimedia data.

Background

The rapid development of the mobile internet changes the habits of life, study and entertainment of people, and people can obtain various multimedia data such as audio, video and the like through the network. For example, when a company issues a release meeting for a newly developed product, the user can learn about the product by watching the recorded video of the release meeting without going to the release meeting site. For another example, the online education provider may publish the teaching audio and video on the network, and the user learns the knowledge by watching the teaching audio and video.

In order to enable a user to know content contained in a certain multimedia data, such as a certain audio or a certain video, a conventional method for processing entertainment-type audio/video of a movie, a tv show, and the like, is to manually set key information of the multimedia data, associate the key information with the multimedia data, and display the key information, such as a key story line, a key image frame, and the like contained in the multimedia data, in association with a picture link of the multimedia data, but this method for manually setting key information is inefficient.

Disclosure of Invention

In view of this, embodiments of the present invention provide a multimedia data processing method, apparatus, and device, in which a keyword of multimedia data is automatically obtained and a keyword is marked on a play progress control, so that efficiency of obtaining key information of the multimedia data is improved, and a user can conveniently browse data content concerned by the user based on a marking result on the play progress control.

In a first aspect, an embodiment of the present invention provides a multimedia data processing method, including:

acquiring a subtitle text corresponding to multimedia data, wherein the subtitle text comprises a plurality of subtitle items associated with playing time;

identifying a plurality of keywords contained in the subtitle text according to a keyword database;

and marking the plurality of keywords on the playing progress control of the multimedia data according to the playing time of the caption items corresponding to the plurality of keywords respectively.

In a second aspect, an embodiment of the present invention provides a multimedia data processing apparatus, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a caption text corresponding to multimedia data, and the caption text comprises a plurality of caption items associated with playing time;

the identification module is used for identifying a plurality of keywords contained in the subtitle text according to a keyword database;

and the marking module is used for marking the keywords on the playing progress control of the multimedia data according to the playing time of the caption item corresponding to each of the keywords.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is used to store one or more computer instructions, and when executed by the processor, the one or more computer instructions implement the multimedia data processing method in the first aspect. The electronic device may also include a communication interface for communicating with other devices or a communication network.

An embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to enable a computer to implement the multimedia data processing method in the first aspect when executed.

In the multimedia data processing method provided by the embodiment of the invention, the multimedia data can be audio or video, and in addition, a keyword database is preset and contains a plurality of keywords which may be contained in the same type of multimedia data. Based on this, in order to implement automatic extraction and labeling of the keywords for the multimedia data, a caption text corresponding to the multimedia data needs to be obtained first, where the caption text includes a plurality of caption entries associated with play time, and then a plurality of keywords included in the caption text are identified according to a keyword database, so as to implement automatic extraction of the keywords included in the multimedia data. Then, according to the playing time of the caption item corresponding to each of the identified multiple keywords, marking the multiple keywords on the playing progress control of the multimedia data, namely, according to the playing time corresponding to the caption item containing a certain keyword, marking the corresponding time point of the playing progress control to indicate that the playing time corresponds to the related content of the keyword, so that when a user wants to browse the content of the certain keyword, the corresponding content can be conveniently positioned according to the marking result.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a multimedia data processing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a keyword tagging effect according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating another keyword tagging effect according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating another keyword tagging effect according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating another keyword tagging effect according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating another keyword tagging effect according to an embodiment of the present invention;

FIG. 7 is a flowchart of another multimedia data processing method according to an embodiment of the present invention;

FIG. 8 is a schematic interface diagram illustrating a keyword adding process according to the embodiment shown in FIG. 7;

FIG. 9 is a block diagram of a multimedia data processing apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device corresponding to the multimedia data processing apparatus provided in the embodiment shown in fig. 9.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

Fig. 1 is a flowchart of a multimedia data processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

101. and acquiring a caption text corresponding to the multimedia data, wherein the caption text comprises a plurality of caption items associated with playing time.

The multimedia data processing method in the present embodiment may be executed by a terminal device that plays multimedia data, or may also be executed by a server that provides multimedia data. When executed by the server, the server may trigger execution when receiving the multimedia data, and certainly not limited thereto. When the method is executed by the terminal device, the method can be executed when the user starts playing the multimedia data, and can also be triggered to execute the multimedia data processing method aiming at the multimedia data which are not marked by the key words when the user opens the client bearing the multimedia data.

The multimedia data in this embodiment may be audio data or video data. Moreover, the multimedia data may be multimedia data formed by recording in advance, or may be multimedia data of live broadcast in real time. In practical application, the multimedia data can be video data of a product release meeting or teaching audio and video data.

For video data which is formed by recording, the video data often includes two files, namely a video file and an audio file, a server can separate the audio file from the video data, and then converts the audio file into a text, the converted text is a subtitle text, and the subtitle text includes a plurality of subtitle items associated with playing time, wherein the playing time associated with each subtitle item can be the time for starting playing the subtitle item. It is understood that one subtitle entry is a line of subtitles. For audio data, the subtitle text can be obtained by directly converting the audio into text. After obtaining the caption text, the server may store the corresponding relationship between the caption text and the multimedia data, so that when the terminal device playing multimedia executes the multimedia data processing method provided by the embodiment of the present invention, the caption text may be obtained according to the corresponding relationship.

For live video data, audio data, namely sound signals, generated in the process of playing the live video data can be collected in real time, the audio data collected in real time is converted into texts, and therefore subtitle texts from the beginning of playing the live video data can be obtained. However, it should be noted that, because each subtitle entry is not predefined in the subtitle text at this time and the playing time is associated with each subtitle entry as in the conventional method of extracting recorded video data, for live broadcast data, after audio data is converted into a subtitle text, each subtitle entry included in the subtitle text and the playing time associated with each subtitle entry need to be determined. Specifically, the caption text can be divided into a plurality of sentences by combining pause characteristics of audio data in the live video and semantic analysis on the caption text, and each sentence is determined as a caption entry. When the audio data is collected, the collection starting moment is used as the initial playing time, and the time stamp can be printed on the collected audio data, so that the time stamp of the audio data corresponding to the initial characters of each subtitle item can be used as the playing time corresponding to the subtitle item.

102. A plurality of keywords included in the subtitle text are identified from a keyword database.

In the embodiment of the invention, a keyword database can be set for the same type of multimedia data, and the keyword database contains a plurality of keywords corresponding to the type of multimedia data. For example, for multimedia data of education, the keywords included in the keyword database may be knowledge points corresponding to different disciplines in different grades, such as Pythagorean theorem, arithmetic progression, three-flavored bookhouses, and the like. For another example, for multimedia data introducing a mobile phone product, a plurality of keywords included in the keyword database may be attributes of multiple dimensions of the mobile phone, such as a camera, appearance, security, and the like, which are generally concerned by the user.

The keyword database is provided for the purpose of realizing automatic recognition of keywords contained in multimedia data. Therefore, after the subtitle text is obtained, word segmentation processing may be performed on the subtitle text to obtain each word included in the subtitle text. And then, determining a corresponding keyword database according to the type of the current multimedia data, and comparing the words contained in the obtained caption text with a plurality of keywords contained in the keyword database so as to identify the keywords contained in the caption text.

It should be noted that the language corresponding to the audio data of the multimedia data may not be consistent with the language corresponding to the keyword database, so that the language corresponding to the subtitle text may not be consistent with the language corresponding to the keyword database. Therefore, after the word segmentation processing is carried out on the subtitle text to obtain each word contained in the subtitle text, the matching degree between each word in the subtitle text and each keyword in the keyword database can be calculated, and a plurality of keywords contained in the subtitle text can be identified according to the matching degree.

In practical applications, the execution sequence of the language conversion step and the word segmentation step is not strictly limited. Alternatively, however, whether the subtitle text is converted into the second language or the keyword database is converted into the first language may be determined according to a comparison result of the number of words contained in the subtitle text and the number of keywords contained in the keyword database, and in this case, the subtitle text may be subjected to word segmentation processing first. For example, if the number of words contained in the subtitle text is greater than the number of keywords contained in the keyword database, each keyword in the keyword database may be translated into a first language, and conversely, if the number of words contained in the subtitle text is less than the number of keywords contained in the keyword database, each word in the subtitle text may be translated into a second language.

In addition, in the process of calculating the matching degree between each word in the caption text and each keyword in the keyword database, the simplest measure of the matching degree may be consistency, that is, if a word contained in the caption text exists in the keyword database, the word is considered as a keyword. However, in practical applications, there are often cases of synonyms, for example, if a word included in the subtitle text does not exist in the keyword database, but has similar semantics to a keyword in the keyword database, that is, a synonym, at this time, the word may also be regarded as a keyword, and at this time, the matching degree also reflects the similarity between the word included in the subtitle text and the keyword included in the keyword database.

Alternatively, the similarity between two words may be determined based on a recognition model obtained by machine learning, for example, a word vector model based on deep learning, the words included in the subtitle text are encoded into word vectors, the keywords included in the keyword database are also encoded into word vectors, and the distance between the word vectors is calculated to determine the similarity.

Optionally, if the recognition result is that the subtitle text does not include a keyword or includes only one keyword, the processing is ended, because the meaning of labeling the multimedia data is not large in the case that the multimedia data does not include or includes only one keyword, so that the keyword labeling processing can be performed on the multimedia data only in the case that the multimedia data includes a plurality of keywords, that is, at least two keywords.

As opposed to the recognition result that the subtitle text does not include the keyword or only includes one keyword, the number of the keywords included in the subtitle text may be large, and at this time, in order to take account of interface friendliness and avoid the interface from being disordered and complex due to too many marks, optionally, a preset number greater than 1 may be set, so that if the number of the recognized keywords is less than or equal to the preset number, the processing of marking the multimedia data in step 103 is performed. If the number of the identified keywords is greater than the preset number, the keywords of the preset number may optionally be randomly selected from the identified keywords, and the process of marking the multimedia data in step 103 is performed for the selected keywords.

103. And marking the plurality of keywords on the playing progress control of the multimedia data according to the playing time of the subtitle item corresponding to each of the plurality of keywords.

After a plurality of keywords included in the subtitle text are identified, a subtitle entry including each keyword can be located, and it can be understood that the same keyword may correspond to a plurality of subtitle entries. Therefore, for each subtitle item containing the keyword, a corresponding time node can be positioned on the playing progress control of the multimedia data according to the playing time of each subtitle item, and the corresponding keyword is marked on the time node. Wherein, the playing progress control can be a common playing progress bar.

As shown in fig. 2, for convenience of understanding, taking multimedia data as an example of video data, it is assumed that the identified keywords are AA, BB, and CC, and two subtitle entries include the keyword AA, and the playing times are T1 and T2, respectively; suppose that one subtitle entry contains a keyword BB, and the playing time is T3; it is assumed that one subtitle entry contains the keyword CC and the playback time is T4. Thus, alternatively, a tag containing a corresponding keyword may be marked on the play progress control of the multimedia data according to the play time of the subtitle entry corresponding to each of the plurality of keywords, that is, a tag containing AA may be marked on time nodes corresponding to T1 and T2, respectively, a tag containing BB may be marked on a time node corresponding to T3, and a tag containing CC may be marked on a time node corresponding to T4. In practical applications, the shapes of the tags containing different keywords may be the same or different. Based on each marked label, the user can visually see the playing position of the content corresponding to each keyword, so that the user can drag the progress slider to the required playing position according to the requirement of the user.

In addition, optionally, as shown in fig. 3, when the multimedia data is played on the terminal device, besides the marking of the plurality of keywords on the play progress control, a caption text may be displayed in the current play interface, so that the marking of the plurality of keywords may also be performed in the caption text. Alternatively, the display position of the subtitle text in the current playing interface may be preset, for example, displayed in the right area of the multimedia playing window. Alternatively, marking the plurality of keywords in the subtitle text may be highlighting the plurality of keywords, such as displaying the plurality of keywords differently in different colors or the same color, wherein the same keyword may be rendered in the same color and the rendered colors of different keywords may be different. Fig. 3 shows the effect of the slanting and underlining of the keyword.

In addition, the embodiment of the present invention further provides operation functions for the plurality of keywords, where the operation functions include, for example, a hiding operation, a displaying operation, a deleting operation, and the like, so that, in order to facilitate a user to execute the operation functions, optionally, the plurality of keywords may also be separately displayed in a playing interface of the multimedia data. The independent display here is for marking the plurality of keywords in the subtitle text, and means that the plurality of keywords are displayed in a certain preset target interface area of the playing interface, and the target interface area may be located at the lower side of the playing window, in the playing window, and so on.

At this time, optionally, according to the playing time of the subtitle entry corresponding to each of the plurality of keywords, marking the plurality of keywords on the playing progress control of the multimedia data may also be implemented as:

distributing display styles for the plurality of keywords respectively;

displaying a plurality of keywords in a target interface area according to display styles corresponding to the keywords respectively;

and marking identifiers corresponding to the keywords on the playing progress control according to the playing time of the subtitle items corresponding to the keywords respectively and the display styles corresponding to the keywords respectively.

In order to ensure that the plurality of keywords have a certain degree of distinction, different display styles may be allocated to different keywords, and the display styles may include icon shapes and/or rendering colors, that is, different keywords may be borne on icons of different shapes, or different keywords may be displayed in different rendering colors, or different keywords have different rendering colors and have different icon shapes.

As shown in fig. 4, fig. 4 illustrates a case where the keywords are AA, BB, and CC, where the several keywords are displayed in the lower area of the playing window, and the several keywords have the same default icon shape, that is, a rectangular icon, but rendering colors corresponding to the several keywords are different (that is, the display style is represented as a rendering color), for example, the rendering color corresponding to AA is red, the rendering color corresponding to BB is yellow, and the rendering color corresponding to CC is blue. Based on this, when marking is performed on the corresponding time node on the playing progress bar, accepting the time node illustrated in fig. 2, an identifier, such as a red dot illustrated in fig. 4, may be marked on the time nodes T1 and T2 in a display style corresponding to AA — the rendering color is red, and similarly, a yellow dot is marked on the time node T3, and a blue dot is marked on the time node T4 (a specific color is not illustrated in the figure).

By ensuring that the display style adopted when the keyword is displayed in the target interface area is consistent with the display style when the identifier corresponding to the keyword is marked on the playing progress control, the user can have better visual experience.

Optionally, as shown in fig. 4, during the process of marking the keywords, a caption text may be displayed in the playing interface, so that the keywords may also be marked in the caption text, at this time, in order to maintain consistency, a display style corresponding to the keywords may also be marked in the caption text, for example, the keywords AA are highlighted in red.

After a plurality of keywords are displayed in the target interface area of the playing interface, the user may perform some operations on the plurality of keywords, such as hiding operations, deleting operations, and the like.

In an optional embodiment, in response to a hiding operation triggered by a user on a target keyword in the plurality of keywords, an identifier corresponding to the target keyword on the playing progress control is hidden. In practical applications, the hidden operation triggering manner may have a variety of implementation manners, for example, a user may click the target keyword through a mouse to pop up an operation function item, where the operation function item includes, for example, a hidden operation item, a delete operation item, and the like, and the user selects the hidden operation item, that is, triggers the hidden operation for the target keyword. For another example, it may be set that the user clicks the target keyword to think that the hiding operation is triggered. For another example, the user may trigger the hiding operation by means of a voice instruction, such as the user outputting a voice of "hide AA". As shown in fig. 5, fig. 5 illustrates a case where a user clicks a keyword AA to trigger an operation of hiding the AA, and at this time, when the AA is highlighted in red, based on the hiding operation, optionally, a rendering color of the AA may be updated to a default rendering color, such as white or no color. As shown in fig. 5, based on the hiding operation for the AA, the red identifiers corresponding to the time nodes T1 and T2 on the play progress control are hidden, that is, the mark of the AA is no longer performed on the play progress control, and optionally, the mark of the AA may also be cancelled in the subtitle text at this time.

It will be appreciated that for a target keyword for which a hiding operation has been performed, the user may also then perform a restore display operation to restore the markup of the target keyword. The resume display operation may be triggered, for example, by clicking on the target keyword again.

In another optional embodiment, in response to a deletion operation triggered by a user on a target keyword of the plurality of keywords, the target keyword is deleted in the target interface area and an identifier corresponding to the target keyword on the playing progress control is deleted. In practical applications, similar to the hidden operation triggering, the deletion operation may be triggered in various ways. In this embodiment, the implementation shown in fig. 6 is taken as an example to describe a process of deleting a target keyword: a deletion control may be provided in a target interface area where a plurality of keywords are displayed, and when a user clicks the deletion control, the plurality of keywords are all placed in a state to be deleted, for example, a circle with an "x" character pattern indicated in the upper right corner of the keywords AA, BB, and CC in fig. 6 represents that the keywords are all in the state to be deleted, and further, for a target keyword that the user wants to delete, the user clicks the circle with the "x" character pattern on the AA to trigger an operation of deleting the AA, at this time, the AA is deleted from the target interface area, and an identifier corresponding to the AA on the play progress control is deleted, and meanwhile, a mark for the AA in the subtitle text may also be removed.

In summary, in the embodiment of the present invention, by presetting a keyword database containing a plurality of keywords that may be possessed by multimedia data of the same type and a caption text for obtaining the multimedia data, automatic extraction of the keywords of the multimedia data and a mark on the play progress control can be realized based on the caption text and the keyword database, so that efficiency of obtaining key information of the multimedia data is improved, and when a user wants to browse contents of a certain keyword, the user can conveniently and quickly locate corresponding contents according to a mark result.

On the basis of the foregoing embodiment, the embodiment of the present invention further provides a function that a user can customize a keyword, that is, a function that a user can manually add a keyword.

In an alternative embodiment, the function of manually adding keywords can be implemented as:

in response to a search operation triggered by a user, determining a subtitle item containing a search keyword input by the user in a subtitle text;

and marking the search keyword in the subtitle text in response to a keyword adding operation triggered by the search keyword contained in the subtitle entry by a user, and marking the search keyword on the playing progress control according to the playing time of the subtitle entry.

Specifically, a search control may be displayed on the playing interface of the multimedia data, and since the search operation is to search for a keyword required by the user in the subtitle text, optionally, the search control may be displayed in association with the subtitle text display area, for example, may be located in an upper adjacent area of the subtitle text display area.

When the user clicks on the search control, a search input box may be displayed within which the user may enter search keywords. Thus, the search keyword is retrieved from the subtitle text, and each subtitle entry containing the search keyword is located.

In order to facilitate the user's finding of the search keyword in the subtitle text, the retrieved search keyword may be highlighted, such as highlighted in a certain color. It is worth mentioning that the highlighting may be understood as a temporary highlighting, because in order to ensure the reliability of the keyword addition, it may not be directly assumed that the search keyword input by the user is necessarily the keyword to be added, so that alternatively, only when the user triggers a keyword addition operation with respect to the retrieved search keyword, it is assumed that the user is indeed about to add the search keyword as the keyword. The keyword adding operation may be, for example, a keyword adding operation triggered by a user clicking a search keyword retrieved to display an option "add as a keyword". Therefore, after the user triggers the adding operation, the search keyword is marked in the subtitle text, and the search keyword is marked on the playing progress control according to the playing time of the corresponding subtitle item. In the marking process, a display style, such as a rendering color, may be assigned to the search keyword, so that the search keyword is highlighted in the subtitle text in the rendering color, that is, the initial temporary highlighting effect is replaced.

In the above manner of manually adding the keyword by the user, the user completely determines to search the keyword, and the embodiment of the present invention further provides another optional implementation manner as shown in fig. 7.

Fig. 7 is a flowchart of another multimedia data processing method according to an embodiment of the present invention, as shown in fig. 7, which may include the following steps:

701. and acquiring a caption text corresponding to the multimedia data, wherein the caption text comprises a plurality of caption items associated with playing time.

702. A plurality of keywords included in the subtitle text are identified from a keyword database.

703. And if the number of the plurality of keywords is larger than the preset number, screening out the keywords with the preset number from the plurality of keywords according to the respective corresponding importance level information of the plurality of keywords and/or the occurrence frequency of the plurality of keywords in the caption text respectively.

In this embodiment, when the number of the keywords included in the subtitle text is found to be large by comparing the words included in the subtitle text with the keywords included in the keyword database, and exceeds a preset number, a preset number of the most-critical keywords may be selected from the keywords, that is, the obtained keywords are ranked, and a preset number of the keywords are selected from the keywords.

When the keyword database is formed, importance level information can be set for each keyword according to the importance degree of each keyword. Taking multimedia data of teaching video class as an example, the keywords can be knowledge points, and the importance level of each knowledge point can be preset.

Assuming that the number of the identified keywords is N, the preset number is M, and N is greater than M.

Thus, in an optional embodiment, the N keywords may be sorted according to their respective corresponding importance level information according to the order of importance levels from high to low, and the top M keywords may be selected from the N keywords.

In another optional embodiment, the occurrence times of the N keywords in the subtitle text may also be counted respectively, and the N keywords are sorted in order from high to low according to the occurrence times, and the top M keywords are selected from the N keywords.

In another alternative embodiment, a weighted calculation formula of the importance level information and the occurrence frequency may be preset, so that scores corresponding to the N keywords are calculated based on the formula, the N keywords are sorted in the order of scores from high to low, and the top M keywords are selected from the N keywords.

704. And marking the screened keywords on the playing progress control of the multimedia data according to the playing time of the caption item corresponding to the screened keywords respectively.

The process of performing keyword labeling may refer to the description in the foregoing embodiments, which is not repeated herein.

705. And responding to search operation triggered by the user, and displaying the rest keywords for the user to select, wherein the rest keywords are the keywords left after a preset number of keywords are removed from the plurality of keywords.

706. And taking the keywords selected from the rest keywords by the user as search keywords, and determining the subtitle items containing the search keywords input by the user in the subtitle text.

707. And marking the search keyword in the subtitle text in response to a keyword adding operation triggered by the user on the search keyword contained in the subtitle entry, and marking the search keyword on the playing progress control according to the playing time of the subtitle entry.

In this embodiment, based on the premise that the number N of the keywords included in the identified subtitle text is greater than the preset number M, when the user triggers a search operation, all or part of the remaining (N-M) keywords may be recommended to the user so that the user may select a search keyword from the keywords. As shown in fig. 8, assuming that the remaining keywords include DD and EE, a search input box and the remaining keywords are displayed after the user clicks a search control, so that, if the user selects a certain keyword such as DD from the remaining keywords, the DD is automatically added as a search keyword to the search input box to trigger a search for the search keyword. Of course, if there is no keyword required by the user among the remaining keywords, the user may directly input a search keyword required by the user in the search input box. Having retrieved the DD, the DD is marked in the subtitle text and marked on the play progress control according to the play time T5 of the subtitle item containing the DD, as shown in fig. 8.

Through the embodiment, the user can customize the keywords according to the self requirement, and mark the customized keywords on the playing progress control of the multimedia data, so that the user can conveniently browse the corresponding data content.

Based on the keyword tagging processing of the multimedia data in the foregoing embodiment, the keyword tagging processing may be performed on a plurality of multimedia data in the multimedia database. Based on this, after a plurality of multimedia data in the multimedia database are marked with respective corresponding keywords, for the multimedia data played by the current user, in the process of playing the multimedia data, optionally, at least one multimedia data matched with the plurality of keywords marked by the multimedia data can be screened from the multimedia database, so that the screened at least one multimedia data and the multimedia data can be displayed in an associated manner, at this time, other multimedia data matched with the multimedia data currently viewed by the user is recommended, and the matching performance is expressed as the matching performance of the keywords.

Wherein, assuming that the multimedia data currently viewed by the user is multimedia data a, at least one multimedia data screened for the multimedia data is multimedia data B and multimedia data C, and assuming that the keyword marked on the multimedia data a includes aa and bb, the matching of the keyword may be embodied as: multimedia data B and multimedia data C both contain these two keywords aa and bb, respectively, or multimedia data B contains aa or bb and multimedia data C contains aa or bb.

In practical applications, an upper limit number of the screened multimedia data may be set, for example, 5, so that, when the number of multimedia data including all or part of the plurality of keywords marked on the multimedia data a exceeds the upper limit number, the multimedia data may be sorted according to the matching degree between the keywords marked on each of the multimedia data and the keywords marked on the multimedia data a, and the upper limit number of the multimedia data with the highest matching degree may be selected from the sorted multimedia data. Wherein the matching degree can be measured by the number of keywords marked on the multimedia data a.

The process of filtering the at least one multimedia data may be automatically performed, that is, automatically triggered based on a keyword tagging result of the multimedia data. However, in another alternative embodiment, it may also be performed based on a user operation, specifically: and screening at least one multimedia data marked with the target keyword from the multimedia database in response to the selection operation of the user on the target keyword in the plurality of keywords marked by the multimedia data A. Because the user selecting the target keyword means that the user pays more attention to the related content of the target keyword, based on the selection operation of the user, more multimedia data related to the target keyword can be recommended to the user.

In addition, when the screened out at least one multimedia data is presented in association with the multimedia data currently viewed by the user, optionally, a play link of the at least one multimedia data may be presented in a play interface of the multimedia data, and/or target subtitle entries corresponding to the at least one multimedia data respectively are presented in the play interface of the multimedia data, where the target subtitle entries are subtitle entries corresponding to a plurality of keywords marked by the multimedia data currently viewed by the user. For example, assuming that keywords aa and bb are marked on the multimedia data a currently viewed by the user, if some selected multimedia data B includes the keyword aa, the subtitle item corresponding to the keyword aa in the multimedia data B may be displayed in the playing interface of the multimedia data a as the target subtitle item.

In an optional embodiment, based on the labeling result of the multiple keywords of the multimedia data, the category to which the multimedia data belongs may also be determined according to the multiple keywords. That is, the keyword tagging result can also be used as a basis for classifying multimedia data.

Specifically, for example, the multimedia data related to teaching is taken as an example, a first-level category of the teaching multimedia data may be set according to the disciplines, and further, under the first-level category of each discipline, a hit keyword may be taken as a second-level category, so that if the keyword marked on a certain multimedia data matches with a certain second-level category, it may be determined that the multimedia data belongs to the second-level category. The matching between the keywords marked on the multimedia data and the secondary category can be measured by the distance between the word vectors, and when the distance is smaller than a certain threshold value, the two words are considered to be matched. The hit keywords can be determined by counting the total occurrence times of the keywords in the caption text corresponding to the multimedia data in each subject.

In practical application, the processing result of category attribution is carried out through the keywords based on the multimedia data, so that a user can conveniently detect a plurality of multimedia data under a certain keyword needing to be concerned. For example, in a teaching scene, the keywords may be knowledge points, and based on the classification result, the user may conveniently obtain a plurality of teaching audios and videos corresponding to the same knowledge point for learning.

The multimedia data processing apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these multimedia data processing apparatus can be constructed using commercially available hardware components configured by the steps taught in this scheme.

Fig. 9 is a schematic structural diagram of a multimedia data processing apparatus according to an embodiment of the present invention, as shown in fig. 9, the apparatus includes: the system comprises an acquisition module 11, an identification module 12 and a marking module 13.

The obtaining module 11 is configured to obtain a subtitle text corresponding to the multimedia data, where the subtitle text includes a plurality of subtitle entries associated with playing time.

And the identification module 12 is configured to identify a plurality of keywords included in the subtitle text according to the keyword database.

And a marking module 13, configured to mark the multiple keywords on the play progress control of the multimedia data according to the play time of the subtitle entry corresponding to each of the multiple keywords.

Optionally, the multimedia data is a live video, and the obtaining module 11 may be configured to: collecting audio data of the live video; converting the audio data into the subtitle text; and determining each subtitle item contained in the subtitle text and associating the playing time for each subtitle item.

Optionally, the identification module 12 may be configured to: if the first language corresponding to the subtitle text is not consistent with the second language corresponding to the keyword database, converting the subtitle text into the second language or converting the keyword database into the first language; performing word segmentation processing on the subtitle text; and calculating the matching degree between each word in the caption text and each keyword in the keyword database so as to identify a plurality of keywords contained in the caption text according to the matching degree.

Optionally, the apparatus may further include: the multimedia data screening processing module is used for screening at least one piece of multimedia data matched with the keywords from a multimedia database; and displaying the at least one multimedia data in association with the multimedia data.

Optionally, the multimedia data filtering processing module may be specifically configured to: and displaying the playing link of the at least one multimedia data in the playing interface of the multimedia data, and/or displaying target subtitle items corresponding to the at least one multimedia data in the playing interface of the multimedia data, wherein the target subtitle items are subtitle items corresponding to the plurality of keywords.

Optionally, the multimedia data filtering processing module may be specifically configured to: and responding to the selection operation of a user on a target keyword in the plurality of keywords, and screening out at least one piece of multimedia data marked with the target keyword from the multimedia database.

Optionally, the apparatus may further include: and the category determining module is used for determining the category of the multimedia data according to the keywords.

Optionally, the marking module 13 may be further configured to: and marking the plurality of keywords in the subtitle text.

Optionally, the apparatus may further include: and the distribution module is used for distributing display styles for the keywords respectively. Accordingly, the marking module 13 may be specifically configured to: displaying the plurality of keywords in a target interface area according to display styles corresponding to the plurality of keywords respectively; and marking identifiers corresponding to the keywords on the playing progress control according to the playing time of the subtitle items corresponding to the keywords respectively and in the display styles corresponding to the keywords respectively. Wherein the display style includes an icon shape and/or a rendering color.

Thus, optionally, the apparatus may further comprise: and the operation processing module is used for responding to a hiding operation triggered by a user on a target keyword in the keywords and hiding an identifier corresponding to the target keyword on the playing progress control.

Optionally, the operation processing module may be further configured to: and responding to a deleting operation triggered by a user on a target keyword in the keywords, deleting the target keyword in the target interface area and deleting an identifier corresponding to the target keyword on the playing progress control.

Optionally, the marking module 13 may be further specifically configured to: and marking a label containing the corresponding keyword on the playing progress control of the multimedia data according to the playing time of the caption item corresponding to each of the plurality of keywords.

Optionally, the marking module 13 may be specifically configured to: if the number of the keywords is less than or equal to a preset number, marking the keywords on a playing progress control of the multimedia data according to the playing time of the caption items corresponding to the keywords, wherein the preset number is greater than 1.

Optionally, the apparatus may further include: and the screening module is used for screening the keywords with the preset number from the keywords according to the respective corresponding importance level information of the keywords and/or the occurrence times of the keywords in the caption text respectively if the number of the keywords is larger than the preset number. Thus, the marking module 13 is specifically configured to: and marking the screened keywords on the playing progress control of the multimedia data according to the playing time of the caption items corresponding to the screened keywords, wherein the preset number is more than 1.

Optionally, the operation processing module may be further configured to: and in response to a search operation triggered by a user, determining a subtitle item containing the search keyword input by the user in the subtitle text. Thus, the marking module 13 may also be configured to: and marking the search keywords in the subtitle text in response to the keyword adding operation triggered by the search keywords contained in the subtitle entries by the user, and marking the search keywords on the playing progress control according to the playing time of the subtitle entries.

Optionally, the operation processing module may be specifically configured to: responding to search operation triggered by a user, and displaying remaining keywords for the user to select, wherein the remaining keywords are the keywords remaining after the preset number of keywords are removed from the plurality of keywords; and taking the keyword selected from the residual keywords by the user as the search keyword, and determining the subtitle item containing the search keyword input by the user in the subtitle text.

The apparatus shown in fig. 9 can perform the method of the embodiment shown in fig. 1-8, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-8. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 8, and are not described herein again.

Having described the internal functions and structure of the multimedia data processing apparatus, in one possible design, the structure of the multimedia data processing apparatus may be implemented as an electronic device, such as a terminal device capable of playing multimedia data or a server providing multimedia data, as shown in fig. 10, and the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 is used for storing programs that support the electronic device to execute the multimedia data processing method provided in the embodiments shown in fig. 1-8, and the processor 21 is configured to execute the programs stored in the memory 22.

The program comprises one or more computer instructions which, when executed by the processor 21, are capable of performing the steps of:

Optionally, the processor 21 is further configured to perform all or part of the steps in the embodiments shown in fig. 1 to 8.

The electronic device may further include a communication interface 23 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the multimedia data processing method in the method embodiments shown in fig. 1 to 8.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable multimedia data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable multimedia data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable multimedia data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable multimedia data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for processing multimedia data, comprising:

2. The method of claim 1, wherein the multimedia data is a live video, and the obtaining of the subtitle text corresponding to the multimedia data comprises:

collecting audio data of the live video;

converting the audio data into the subtitle text;

and determining each subtitle item contained in the subtitle text and associating corresponding playing time for each subtitle item.

3. The method of claim 1, wherein identifying the plurality of keywords contained in the caption text according to a keyword database comprises:

if the first language corresponding to the subtitle text is not consistent with the second language corresponding to the keyword database, converting the subtitle text into the second language or converting the keyword database into the first language;

performing word segmentation processing on the subtitle text;

and calculating the matching degree between each word in the caption text and each keyword in the keyword database so as to identify a plurality of keywords contained in the caption text according to the matching degree.

4. The method of claim 1, further comprising:

screening at least one multimedia data matched with the keywords from a multimedia database;

and displaying the at least one multimedia data in association with the multimedia data.

5. The method of claim 4, wherein said presenting the at least one multimedia data in association with the multimedia data comprises:

displaying the playing link of the at least one multimedia data in the playing interface of the multimedia data, and/or,

and displaying target subtitle items corresponding to the at least one piece of multimedia data in a playing interface of the multimedia data, wherein the target subtitle items are subtitle items corresponding to the keywords.

6. The method of claim 4, wherein the filtering out at least one multimedia data from the multimedia database that matches the plurality of keywords comprises:

and responding to the selection operation of a user on a target keyword in the plurality of keywords, and screening out at least one piece of multimedia data marked with the target keyword from the multimedia database.

7. The method of claim 1, further comprising:

and determining the attributive category of the multimedia data according to the plurality of keywords.

8. The method of claim 1, further comprising:

and marking the plurality of keywords in the subtitle text.

9. The method of claim 1, wherein the marking the plurality of keywords on the playing progress control of the multimedia data according to the playing time of the subtitle entry corresponding to each of the plurality of keywords comprises:

distributing display styles for the keywords respectively;

displaying the plurality of keywords in a target interface area according to display styles corresponding to the plurality of keywords respectively;

and marking identifiers corresponding to the keywords on the playing progress control according to the playing time of the subtitle items corresponding to the keywords respectively and in the display styles corresponding to the keywords respectively.

10. The method of claim 9, wherein the display style comprises an icon shape and/or a rendering color.

11. The method of claim 9, further comprising:

and hiding identifiers corresponding to the target keywords on the playing progress control in response to a hiding operation triggered by a user on the target keywords in the plurality of keywords.

12. The method of claim 9, further comprising:

and responding to a deleting operation triggered by a user on a target keyword in the keywords, deleting the target keyword in the target interface area and deleting an identifier corresponding to the target keyword on the playing progress control.

13. The method of claim 1, wherein the marking the plurality of keywords on the playing progress control of the multimedia data according to the playing time of the subtitle entry corresponding to each of the plurality of keywords comprises:

and marking a label containing the corresponding keyword on the playing progress control of the multimedia data according to the playing time of the caption item corresponding to each of the plurality of keywords.

14. The method according to any one of claims 1 to 13, wherein the marking the plurality of keywords on the play progress control of the multimedia data according to the play time of the subtitle entry corresponding to each of the plurality of keywords comprises:

if the number of the keywords is less than or equal to a preset number, marking the keywords on a playing progress control of the multimedia data according to the playing time of the caption items corresponding to the keywords, wherein the preset number is greater than 1.

15. The method according to any one of claims 1 to 13, wherein the marking the plurality of keywords on the play progress control of the multimedia data according to the play time of the subtitle entry corresponding to each of the plurality of keywords comprises:

if the number of the keywords is larger than the preset number, screening the keywords with the preset number from the keywords according to the respective corresponding importance level information of the keywords and/or the occurrence times of the keywords in the caption text;

and marking the screened keywords on the playing progress control of the multimedia data according to the playing time of the caption items corresponding to the screened keywords, wherein the preset number is more than 1.

16. The method of claim 15, further comprising:

in response to a search operation triggered by a user, determining a subtitle entry containing a search keyword input by the user in the subtitle text;

and marking the search keywords in the subtitle text in response to the keyword adding operation triggered by the search keywords contained in the subtitle entries by the user, and marking the search keywords on the playing progress control according to the playing time of the subtitle entries.

17. The method of claim 16, wherein the determining, in response to a user-triggered search operation, a caption entry in the caption text that contains the user-entered search keyword comprises:

responding to search operation triggered by a user, and displaying remaining keywords for the user to select, wherein the remaining keywords are the keywords remaining after the preset number of keywords are removed from the plurality of keywords;

and taking the keyword selected from the residual keywords by the user as the search keyword, and determining the subtitle item containing the search keyword input by the user in the subtitle text.

18. A multimedia data processing apparatus, comprising:

19. An electronic device, comprising: a memory, a processor; wherein the content of the first and second substances,

the memory is configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the multimedia data processing method of any of claims 1 to 17.