CN115484503A

CN115484503A - Bullet screen generation method and device, electronic equipment and storage medium

Info

Publication number: CN115484503A
Application number: CN202110599762.8A
Authority: CN
Inventors: 张怡
Original assignee: Shanghai Hode Information Technology Co Ltd
Current assignee: Shanghai Hode Information Technology Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-12-16
Anticipated expiration: 2041-05-31
Also published as: CN115484503B

Abstract

The disclosure provides a bullet screen generation method and device, electronic equipment and a storage medium, and relates to the technical field of multimedia files. The implementation scheme is as follows: acquiring an audio signal in a multimedia file; acquiring a frequency domain graph corresponding to the audio signal; acquiring one or more key audio clips based on the frequency domain graph, wherein each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and for each key audio clip of the one or more key audio clips, acquiring a target barrage matched with the target plot of the corresponding key multimedia clip based on the key audio clip. According to the method and the device, the automatic addition of the barrage related to the plot of the multimedia file to the multimedia file can be realized.

Description

Bullet screen generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of multimedia file technology, and in particular, to a bullet screen generation method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

With the development of internet and multimedia technology, multimedia files (such as videos) have become a way for mass life and entertainment. The user expresses own feeling by publishing the text comments to the display interface when watching the multimedia file, so that the bullet screen is displayed on the display interface when the multimedia file is played, a real-time interactive feeling is provided for the audience, and the atmosphere of the audience when watching the multimedia file can be improved. Meanwhile, the barrage appearing in the playing process of the multimedia file can help the multimedia file attract more people to watch, and the popularity of the multimedia file is improved.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The disclosure provides a bullet screen generating method and device, electronic equipment, a computer readable storage medium and a computer program product.

According to an aspect of the present disclosure, there is provided a bullet screen generating method, including: acquiring an audio signal in a multimedia file; acquiring a frequency domain graph corresponding to the audio signal; acquiring one or more key audio clips based on the frequency domain graph, wherein each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and for each key audio clip of the one or more key audio clips, acquiring a target barrage matched with the target plot of the corresponding key multimedia clip based on the key audio clip.

According to another aspect of the present disclosure, there is also provided a bullet screen generating apparatus, including: a first obtaining unit configured to obtain an audio signal in a multimedia file; a second obtaining unit, configured to obtain a frequency domain map corresponding to the audio signal; a third obtaining unit, configured to obtain one or more key audio clips based on the frequency domain map, wherein each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and a fourth obtaining unit configured to obtain, for each of the one or more key audio clips, a target barrage matching a target episode of the corresponding key multimedia clip based on the key audio clip.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program which, when executed by the at least one processor, implements a method according to the above.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method according to the above.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the method according to the above when executed by a processor.

According to one or more embodiments of the present disclosure, a frequency domain diagram of an audio signal of a multimedia file is obtained based on the audio signal in the multimedia file, and a target episode and a multimedia segment containing the target episode in the multimedia file are determined according to information related to audio source complexity contained in the frequency domain diagram. And through analyzing the audio signal, obtain the bullet screen corresponding to the target plot, can realize generating the bullet screen for the multimedia file automatically, and the bullet screen that produces is relevant with the plot of this multimedia file, make the bullet screen produced lifelike, keep the interactivity.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

Fig. 1 shows a schematic flow diagram of a bullet screen generation method according to some embodiments of the present disclosure;

fig. 2A illustrates a frequency diagram of an audio signal in a bullet screen generating method according to some embodiments of the present disclosure;

fig. 2B illustrates a frequency domain diagram of an audio signal in a bullet screen generation method according to some embodiments of the present disclosure;

fig. 3 shows a schematic flow diagram of a method of obtaining a frequency domain diagram of an audio signal in a bullet screen generating method according to some embodiments of the present disclosure;

fig. 4 shows a schematic flow diagram of a method of obtaining one or more key audio clips in a bullet screen generation method according to some embodiments of the present disclosure;

fig. 5 shows a schematic flow diagram of a method of obtaining a target bullet screen based on a key audio clip in a bullet screen generating method according to some embodiments of the present disclosure;

fig. 6 illustrates an exemplary flow diagram of a method of obtaining a text bullet screen based on a target text in a bullet screen generation method according to some embodiments of the present disclosure;

fig. 7 illustrates a schematic diagram of a target text and a text bullet screen displayed in a multimedia file in a bullet screen generating method according to some embodiments of the present disclosure;

fig. 8 illustrates an exemplary flow diagram of a method of determining a target bullet screen from matching text bullet screens in a bullet screen generation method according to some embodiments of the present disclosure;

fig. 9 shows a schematic flow diagram of a method of setting an add-on form of a text bullet screen in a bullet screen generation method according to some embodiments of the present disclosure;

fig. 10 shows a schematic block diagram of a bullet screen generating device according to some embodiments of the present disclosure; and

FIG. 11 illustrates a block diagram of an exemplary electronic device that can be used to implement some embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, it will be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, while in some cases they may refer to different instances based on the context of the description.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element may be one or a plurality of. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

According to an aspect of the present disclosure, a bullet screen generating method is provided. Referring to fig. 1, a bullet screen generation method according to some embodiments of the present invention is schematically illustrated. The bullet screen generation method comprises the following steps:

step S110: acquiring an audio signal in a multimedia file;

step S120: acquiring a frequency domain graph corresponding to the audio signal;

step S130: acquiring one or more key audio clips based on the frequency domain graph, wherein each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and

step S140: for each key audio clip of the one or more key audio clips, obtaining a target barrage matching a target episode of the corresponding key multimedia clip based on the key audio clip.

According to the method disclosed by the disclosure, a frequency domain diagram of the audio signal of the multimedia file is obtained based on the audio signal in the multimedia file, and a target plot in the multimedia file and a multimedia segment containing the target plot are determined according to information related to the complexity of a sound source contained in the frequency domain diagram. And through analyzing the audio signal, obtain the bullet screen corresponding to the target plot, can realize generating the bullet screen for the multimedia file automatically, especially for the multimedia file that has just been released, and make the bullet screen that produces correlate with plot of the multimedia file, guarantee that the bullet screen effect that produces is lifelike, keep the interactivity of the multimedia file.

In step S110, the multimedia file includes, without limitation, an image, an audio, a video, or a text file containing an audio signal and having various playable formats of an episode.

In some embodiments, in step S110, the multimedia file may include a video. The video, when played by the video player, can be displayed on a display, and the displayed content has a plot. The video includes an audio signal corresponding to the video for causing the video player to play back matching background music or character conversations, etc. when the video is displayed on the display.

According to some embodiments, the multimedia file may be, for example, a video with an episode, such as a comedy, a movie, a television show, and the like, without limitation.

In some embodiments, in step S110, the multimedia file may include a text file corresponding to the audio signal in the multimedia file, wherein the text file is used for matching display on a display on which the multimedia file is displayed when the multimedia file is played. The text file may be, for example, subtitles, lyrics, etc., without limitation.

In step S120, a frequency domain map corresponding to the audio signal is obtained.

The inventors have found that multimedia files with episodes, such as dramas, movie-tv dramas, etc., often have their corresponding audio signals composed of the speech of character conversations and background music at their target episodes. Therefore, the target plot of the multimedia file can be obtained through processing the audio signal, and the bullet screen related to the target plot is determined, so that the automatic addition of the bullet screen to the multimedia file can be realized.

A frequency domain map of an audio signal (also called a soundtrack frequency domain map) is a map related to the frequency of a sound signal, which reflects the characteristics of the sound signal in terms of frequency. The characteristics of the sound signal in frequency are different from one sound source to another. By analyzing the frequency domain plot of the audio signal, information about the source complexity can be obtained, for example, where the frequency domain span is large. In a multimedia file with episodes, different characters or background music are often available at different episodes, and especially at a target episode, the characters and background music are rich. Abundant characters and background music are reflected in the audio signal, and the complexity of a sound source in the audio signal is high. Therefore, the target plot of the multimedia file can be analyzed based on the frequency domain map corresponding to the audio signal of the multimedia file and reflecting the sound source information. The target scenario may be, for example, a climax scenario, a scenario in which a character appears more or a scene is busy, and the like, and is not limited herein.

In some embodiments, the audio signal may be processed, but not limited to, using a fourier transform to obtain a corresponding frequency domain map of the audio signal. Specifically, the fourier transform can be performed by using formula (1):

referring to fig. 2A and 2B, fig. 2A is a frequency plot of an audio signal according to some embodiments of the present disclosure, with the abscissa shown as frequency (Hz) and the ordinate shown as amplitude (dB) of the signal; fig. 2B is a frequency domain plot of an audio signal, where the abscissa is frequency (Hz) and the ordinate is the magnitude of the frequency variation, according to some embodiments of the present disclosure.

As shown in fig. 2A and 2B, after the audio signal is processed, a frequency domain map corresponding to the audio signal is obtained. The frequency domain map is capable of characterizing the source information of the audio signal, which is related to the span of the frequency domain. For example, there are peaks (peak a and peak B) in the frequency domain diagram of fig. 2B, where the frequency variation (shown by ordinate values) in the frequency domain diagram is larger than a preset value, indicating that the frequency domain span of the audio signal at the peak is large, i.e., the sound source at the peak is high in complexity. The greater the number of peaks, the more complex the sound source in the audio signal is indicated or the more complex the sound source. Therefore, based on the number of peaks (or the distribution density of the peaks) reflecting the sound source complexity in the audio signal, an audio piece with a high sound source complexity can be obtained. The audio source complexity in the audio segment is high, that is, the audio source complexity of the multimedia segment corresponding to the audio segment is high (that is, the character or background music is rich), that is, the multimedia segment corresponding to the audio segment contains the target plot. Therefore, based on the frequency domain map, one or more key audio segments in the audio signal can be obtained, wherein each of the one or more key audio segments corresponds to a key multimedia segment containing the target episode in the multimedia file, and a specific method thereof will be further described below.

According to some embodiments, in step S120, the audio signal may be directly obtained without slicing the audio signal, and one or more audio clips corresponding to the key multimedia clips including the target episode in the multimedia file are cut from the audio signal based on the frequency domain map, and the cut one or more audio clips are determined as the one or more key audio clips.

For example, in step S120, a frequency domain map may be obtained based on the audio signal in step S110 as a whole, one or more sections in the frequency domain map where the peak number distribution is most dense or the peak number distribution density is greater than a preset value are cut, and one or more audio segments in the audio signal are correspondingly cut based on the one or more sections, where the audio segments are one or more key audio segments.

Since one or more sections in the clipped frequency domain map are sections with the most dense peak number distribution or the density of the peak number distribution is greater than the preset value, which indicates that the sound sources in the audio segments corresponding to the one or more sections are more complex than other sections or have reached a preset degree of complexity, one or more audio segments in the audio signal are clipped based on the clipped one or more sections, that is, the sound sources in the multimedia file have higher complexity than other segments or have reached a preset degree of complexity, so that the audio segments are key audio segments.

According to other embodiments, in step S120, the audio signal may be segmented to obtain a plurality of audio segments, a frequency domain map corresponding to each of the plurality of audio segments is obtained, and one or more key audio segments corresponding to the key multimedia segments containing the target episode in the multimedia file are obtained based on the corresponding frequency domain maps.

Fig. 3 illustrates an exemplary flow diagram of a method of obtaining a frequency domain map corresponding to an audio signal according to some embodiments of the present disclosure. Referring to fig. 3, a method for obtaining a frequency domain map corresponding to the audio signal in step 120 will be described as an example.

As shown in fig. 3, in some embodiments, the obtaining of the frequency domain map corresponding to the audio signal in step S120 may include:

step S310: segmenting the audio signal into a plurality of audio segments; and

step S320: for each of the plurality of audio segments, a frequency domain map corresponding to the audio segment is obtained.

In step S310, the audio signal is sliced to obtain audio segments, and in step 320, for example, as described above, the audio signal of the sliced audio segment may be processed by fourier transform to obtain frequency domain maps corresponding to the audio segments. In the subsequent process of obtaining one or more key audio clips based on the frequency domain graphs, one or more key audio clips corresponding to the multimedia clips containing the target plot in the multimedia file are determined based on a plurality of frequency domain graphs corresponding to the plurality of audio clips, so that the determined key audio clips are more accurate.

Specifically, the inventors have found that, if the audio signal is not sliced, the frequency domain span (or peak amplitude) of the overall frequency domain map of the audio signal is related to the overall amplitude of the frequency domain map of the audio signal. When the amplitude distribution of the frequency map is large, the peak amplitude is easily inconspicuous, so that it is difficult to accurately obtain the peak of the frequency domain map, and further, it is difficult to accurately obtain the key audio clip. However, after the audio signal is segmented, the segmented audio segment is processed to obtain the frequency domain graph of the audio segment, so that the amplitude of the peak in the frequency domain graph is only related to the audio segment, the distribution range of the amplitude of the frequency graph related to the peak is narrowed, the peak in the obtained frequency domain graph is more obvious, the peak in the frequency domain graph can be accurately obtained, and the obtained key audio segment is more accurate.

According to some embodiments, in step S310, a plurality of audio segments are obtained by equally dividing and slicing the audio signal. For example, the audio signal is divided into n equal parts to obtain n audio segments, wherein n ≧ 2. Illustratively, the audio signal is sliced according to the play duration. For example, the audio signal may be cut every minute, or every five minutes, etc., without limitation. The audio signal is equally divided, and the audio segments with more peak numbers can be obtained as key audio segments by comparing the peak numbers of the frequency domain graphs of the audio segments. The key audio clip has a large number of peaks, indicating that the audio source complexity is high, i.e. the key audio clip corresponds to a key multimedia clip containing a target plot. The audio signal is equally divided to obtain the peak number, so that the process of acquiring the key audio clip based on the peak in the frequency domain diagram can be quantized, and the acquisition process of the key audio clip is simple and accurate.

The following describes an exemplary process of step S130 to further describe the process of obtaining a key audio clip corresponding to a multimedia clip containing a target episode in the multimedia file based on the frequency domain plot obtained in step S120.

According to some embodiments, in step S120, the audio signal is not segmented, and the overall frequency domain map of the audio signal is directly obtained, and in step S130, one or more audio clips corresponding to the key multimedia clips including the target episode in the multimedia file are cut from the audio signal based on the frequency domain map, and the one or more cut audio clips are one or more key audio clips.

According to other embodiments, in step S120, the audio signal is segmented to obtain a plurality of audio segments, and a frequency domain map corresponding to each of the plurality of audio segments is obtained. In this case, in step S130, one or more audio clips corresponding to the multimedia clip containing the target episode in the multimedia file are determined from the plurality of audio clips based on the corresponding frequency domain map, and the one or more audio clips are one or more key audio clips.

Fig. 4 illustrates an exemplary flow diagram of a method of obtaining one or more key audio segments based on a corresponding frequency domain plot of an audio signal according to some embodiments of the present disclosure. Referring to fig. 4, an exemplary description is provided below in step S130 of the bullet screen generating method 100 according to some embodiments of the present disclosure.

As shown in fig. 4, according to some embodiments, the step S130 of obtaining one or more key audio pieces based on the frequency domain map may include:

step S410: for each of the plurality of audio segments, obtaining a peak in the frequency domain map corresponding to the audio segment, wherein a frequency variation of the frequency domain map at the peak is greater than a preset value; and

step S420: determining the one or more key audio clips from the plurality of audio clips based on corresponding peaks in the frequency domain plot.

For example, in the example illustrated in fig. 2B, the frequency domain map obtained in step S410 includes a peak a and a peak B, where the frequency variation (indicated by ordinate values) in the frequency domain map is greater than a preset value at the peak a and the peak B, indicating that the frequency domain span of the audio signal at the peak a and the peak B is large, and therefore the sound source complexity at the peak a and the peak B is high. Based on the frequency domain map containing the peaks a and b reflecting the source complexity in the audio signal, the audio segments corresponding to the frequency domain map can be obtained. The audio source complexity in the audio clip is high, that is, the multimedia clip corresponding to the audio clip contains the target plot.

According to some embodiments, in step S420, one or more key audio pieces may be determined based on a peak distribution density in the frequency domain plot. Peak distribution density in terms of the number of peaks in the frequency map corresponding to the audio clip per time unit. The high peak distribution density in the frequency diagram corresponding to the audio clip indicates that the complexity of the sound source signal in the audio clip is high, and the multimedia clip corresponding to the audio clip contains the target plot.

According to further embodiments, in case the audio signal is equally split, the audio signal is for example split into n equal parts, where n ≧ 2 and n is a positive integer. Accordingly, in step S420, the plurality of audio segments may be sorted based on the number of the high peaks in the frequency domain map, and the plurality of audio segments may be arranged in a trend of gradually decreasing the number of the high peaks in the corresponding frequency domain map, so as to obtain a sorted sequence of the plurality of audio segments; and screening one or more audio clips ranked at the top in the ranking sequence of the plurality of audio clips to serve as the one or more key audio clips.

Because the number of peaks in the frequency domain graph in the one or more audio segments ranked at the top is greater than the number of peaks in the other audio segments in the plurality of audio segments, it is indicated that the complexity of the sound source of the one or more audio segments ranked at the top is greater than the complexity of the sound source of the other audio segments in the plurality of audio segments, and thus the one or more audio segments ranked at the top are the one or more key audio segments corresponding to the multimedia segment containing the target plot in the audio signal.

According to other embodiments, the number of peaks of each of the plurality of audio clips is compared with a preset value, and when the number of peaks of the audio clip is greater than the preset value, the audio clip is determined to be a key audio clip.

Because the number of peaks is greater than the preset value, it indicates that the complexity of the sound source in the audio clip is greater than the preset value, that is, the complexity of the sound source of the multimedia clip corresponding to the audio clip is high enough, indicating that the multimedia clip corresponding to the audio clip contains the target plot. Therefore, it can be determined that the audio clip is a key audio clip corresponding to the multimedia clip containing the target episode.

In step S140, for each of the one or more key audio clips acquired in step S130, a target barrage matching the target episode of the corresponding multimedia clip is acquired based on the key audio clip. Therefore, the target barrage is obtained based on the key audio clip, so that the target barrage is related to the target plot of the multimedia clip corresponding to the key audio clip, the target barrage added to the key multimedia clip has high correlation degree with the target plot, and the barrage effect is vivid.

Fig. 5 illustrates an exemplary flow diagram of a method of obtaining a target bullet screen based on a key audio clip according to some embodiments of the present disclosure. Referring to fig. 5, step S140 in the bullet screen generating method 100 according to some embodiments of the present disclosure is exemplarily described below.

As shown in fig. 5, according to some embodiments, the step S140 of obtaining the target barrage based on the key audio clip may include:

step S510: acquiring a target text corresponding to the key audio clip; and

step S520: and acquiring the target barrage matched with the target plot of the corresponding multimedia segment based on the target text.

The target text corresponding to the key audio clip often contains dialog text, prompt text or lyric information, etc. at the target plot in the multimedia file, and these dialog text, prompt text or lyric information are often closely related to the target plot. And obtaining the target barrage based on the conversation text, so that the obtained target barrage is related to the lines, the prompt information or the lyric information and the like of the target plot, the correlation degree of the target barrage and the target plot is further improved, and the barrage effect is more vivid.

According to some embodiments, in step S510, based on the key audio piece, a speech recognition technology is used to obtain a target text corresponding to the key audio piece.

According to other embodiments, the multimedia files may include an audio file, a video file corresponding to the audio file, and a text file corresponding to the audio file. And the text file is used for matching and displaying on a display for playing the video file when the video file is played. The text file may include, for example, a text of a human conversation, a text of background music lyrics, and the like. In step S510, a target text corresponding to the key audio piece may be obtained based on the text file.

According to some embodiments, the target barrage may comprise a text barrage. According to other embodiments, the target bullet screen may also include an expression bullet screen, and the like, which is not limited herein.

FIG. 6 illustrates an exemplary flow diagram of a method of obtaining a text bullet screen based on target text according to some embodiments of the present disclosure; fig. 7 illustrates a schematic diagram of target text and a text bullet shown in a multimedia file according to some embodiments of the present disclosure. Referring to fig. 6 and 7, step S520 according to some embodiments of the present disclosure is exemplarily described below.

As shown in fig. 6, according to some embodiments, the step S520 of obtaining the text bullet screen based on the target text may include:

step S610: acquiring at least one keyword of the target text;

step S620: for each keyword in the at least one keyword, acquiring a matched text bullet screen matched with the keyword from a preset bullet screen database; and

step S630: and determining the target bullet screen from the acquired at least one matched text bullet screen.

And acquiring a target barrage based on keywords in a target text corresponding to the key audio clip, so that the target barrage is associated with the target text, especially when the target text is a character dialog text, the target barrage is highly associated with a target plot corresponding to the key audio clip, the correlation between the acquired target barrage and the target plot is improved, and the authenticity of the target barrage is improved.

According to some embodiments, in step S610, the target text may be split to obtain the segmentation, and the keywords in the target text are obtained based on the segmentation obtained by the splitting. In some embodiments, the keywords are determined based on part-of-speech of the participle. For example, when the segmented word obtained by splitting is an adjective, a noun or an adverb, the segmented word is determined to be a keyword.

According to other embodiments, in step S610, keywords are intercepted from the target text based on a preset keyword database. In some embodiments, the keyword is determined from the segmented words based on a preset keyword database, for example, the preset keyword database includes a plurality of segmented words as keywords, the segmented words are retrieved from the preset keyword database, and when the keyword database includes the segmented words, the segmented words are determined as keywords.

According to some embodiments, in step S620, for each keyword, a search is performed in a preset bullet screen database to obtain a matching text bullet screen corresponding to the keyword. The preset bullet screen database can be used for carrying out label classification on the text bullet screen or can be a bullet screen database with a preset text bullet screen-keyword mapping relation.

Referring to fig. 7, a method for obtaining a text bullet screen based on a target text according to some embodiments is described below.

For example, in step S610, taking the target text as "terror smell as if it is shaky even deep in the bone marrow" as an example, the target text is split to obtain the segmented words "bone marrow", "shiver", "terror"; and searching the segmented words 'marrow', 'trembling' and 'horror' in the split target text in a preset keyword database, and determining the 'trembling' and the 'horror' as keywords. Further, in step S620, retrieving the preset bullet screen database based on the two keywords "judder" and "horror", and obtaining the matched text bullet screen corresponding to the keyword "judder" includes: "spinal cord tremble", "frighten ghost", "ghost me fearing ghost", "frighten child" and "fearing death"; and obtaining the matched text barrage matched with the keyword ' horror ' correspondingly, wherein the matched text barrage comprises ' Wudeyaya ', ' Diobulan ', ' three-blade stream! ".

After step S620 is completed, step S630 is performed to determine a target bullet screen from the acquired at least one matching text bullet screen. According to some embodiments, in step S630, the obtained at least one matching text bullet is determined as a target bullet for adding to a multimedia clip of the multimedia file corresponding to the target episode. Since the matching text barrage is obtained based on the key audio clip corresponding to the multimedia clip containing the target episode in the multimedia file, which is related to the target episode of the multimedia file, the barrage added to the multimedia file is similar to a barrage artificially added after being watched by people, so that the content of the barrage is vivid.

According to other embodiments, as shown in fig. 6, the target bullet screen determined in step S630 has a higher correlation with the target episode than other matching text bullet screens in the at least one matching text bullet screen. Because the relevance of the target barrage and the target plot is higher, the barrage effect is more vivid.

In some embodiments, the multimedia file includes a video, and the key multimedia clip is the key video clip, and the target barrage is determined from the acquired at least one matching text barrage in step S630.

Fig. 8 illustrates a schematic flow chart of a method for determining a target bullet screen based on key video snippets according to some embodiments of the present invention. Referring to fig. 8, a process of determining a target barrage based on key video snippets in step S630 according to some embodiments is exemplarily described below.

As shown in fig. 8, according to some embodiments, determining a target bullet screen from the matching text bullet screens based on the key audio pieces comprises:

step S810: acquiring the key video clip corresponding to the key audio clip;

step S820: acquiring related image information based on the video frames in the key video clips; and

step S830: and determining the target bullet screen from the acquired at least one matching text bullet screen based on the image information.

In step S810, a key video clip corresponding to the key audio clip is obtained based on the key audio clip. According to some embodiments, a key video clip in the video, which is the same as the playing time point of the key audio clip, is obtained based on the playing time point corresponding to the key audio clip.

In step S820, image information is acquired based on the video frames in the key video snippets. According to some embodiments, the image information is obtained using a method of image analysis. For example, a face recognition method is adopted to identify whether a video frame in a key video clip includes image information of a portrait.

In step S830, the target bullet screen is determined from the acquired at least one matching text bullet screen based on the image information. For example, still taking fig. 7 as an example, according to the fact that the video frames in the key video segment identified in step S810 include image information of a portrait, the matching text bullet screen obtained from step S620 corresponding to the keyword "judder" includes: the matched text barrage including the description of the character is screened out from trembling of the spinal cord, frightening ghost, ghost my frightening ghost and children, and is used as the target barrage. Therefore, the obtained target barrage is not only obtained based on the conversation text, but also obtained based on the image information at the key video segment, so that the target barrage is related to the conversation text and the image information at the target plot in the key video segment, the relevance between the target barrage and the target plot is higher, and the reality effect of the barrage added after the video is watched by people is more similar.

According to other embodiments, in step S610, a plurality of keywords of the target text are obtained, and in step S620, for each keyword of the plurality of keywords, a matching text bullet screen matched with the keyword is obtained from a preset database, so as to obtain a plurality of matching text bullet screens; in step S630, determining the target bullet screen from the at least one acquired matching text bullet screen includes: and screening the target bullet screen from the obtained multiple matched text bullet screens based on the multiple keywords.

Based on the keywords, the target barrage is screened out from the obtained multiple matched text barrages, the relevance of the target barrage and the target plot is higher than that of other matched text barrages, and the target barrage has a vivid effect which is closer to the barrage added after the videos are watched artificially. Meanwhile, the matched text bullet screens are obtained based on the keywords, new information does not need to be introduced in the screening of the matched text bullet screens, the screening process of the matched text bullet screens can be realized through the existing keyword information, the information processing amount in the process of determining the target bullet screen from a plurality of matched text bullet screens is less, the processing process is simplified, and meanwhile, the calculation amount can be saved.

In some embodiments, the matching text bullet is screened according to the number of matched keywords, for example, a matching text bullet which matches a preset number of keywords at the same time is screened from a plurality of matching text bullets as a target bullet. In other embodiments, the target barrage is selected according to the speech intensity of the audio segment in the audio signal corresponding to the keyword, for example, a matching text barrage of the keyword with the strongest speech intensity of the audio segment in the corresponding audio signal is selected as the target barrage.

According to some embodiments, the method 100 further comprises setting, for each of the one or more key audio clips, an addition form of the corresponding target barrage based on the key audio clip.

According to some embodiments, the target barrage comprises a text barrage. According to other embodiments, the target barrage comprises an expression barrage or the like. Based on the key audio piece, the adding form of the target bullet screen is set, for example, different suffixes are set for text bullet screens, the number of repetitions is set for expression bullet screens, and the like.

By setting the adding form for the target bullet screen, when the target bullet screen is added to the multimedia file, the target bullet screen appears on the display interface of the multimedia file in different display forms, the bullet screen form is enriched, and the bullet screen effect is improved. In the embodiment of the disclosure, the adding form of the target barrage is set based on the key audio clip, so that the adding form of the target barrage is related to the target plot, the relevance of the barrage effect and the target plot is further improved, and the target barrage has a realistic effect which is closer to the barrage added after the user watches videos.

Fig. 9 illustrates a schematic flow diagram of setting up an add-on form of a text bullet screen according to some embodiments of the invention. Referring to fig. 9, a process for setting up an add-on form of a text bullet in method 100 according to some embodiments is illustratively described below.

As shown in fig. 9, according to some embodiments, setting an added form of a text bullet screen includes:

step S910: obtaining audio information of the keywords corresponding to the text bullet screen based on the key audio clips; and

step S920: determining the addition form of the text bullet screen based on the audio information.

According to some embodiments, in the process of obtaining the matching text bullet screen based on the keywords obtained by splitting the dialog text, the audio signal of the dialog text is split, and in step S910, audio information of the keywords corresponding to the text bullet screen is obtained based on the keywords corresponding to the text bullet screen.

According to other embodiments, in step S910, a key audio clip corresponding to the keyword is directly captured through the keyword corresponding to the text bullet screen, so as to obtain audio information corresponding to the keyword.

According to some embodiments, the audio information may include audio intensity and/or speech duration, etc., without limitation.

According to some embodiments, step S920 determines an addition form of the text bullet according to the audio information, where the addition form of the text bullet includes repeated addition times, addition in a highlighted manner, and the like, which is not limited herein. Through setting up different bullet screen addition forms, make the bullet screen show on key multimedia fragment with different forms, enrich the bullet screen form, promote the bullet screen effect.

In some embodiments, the number of times the text bullet is repeatedly added is set according to the duration of speech of the keyword. For example, the keywords corresponding to the text bullet screen have a long voice duration, and the text bullet screen is repeatedly added for multiple times. For the keywords with longer voice duration, the keywords correspond to more key information and content in the target text, and the corresponding text barrage is repeatedly added to the key multimedia clip for many times, so that the barrage is close to the plot of the key multimedia clip, and the barrage effect is more vivid.

In other embodiments, the text bullet screen is added in a highlighted manner according to the voice intensity of the keyword corresponding to the text bullet screen. Similarly, for keywords with longer voice duration, which correspond to more key information and content in the target text, the corresponding text barrage is repeatedly added to the key multimedia clip for multiple times, so that the barrage can be close to the plot of the key multimedia clip, and the barrage effect is more vivid.

According to some embodiments, the highlighting may be, for example, adding a suffix to the text bullet, changing the font size, color, etc. of the text bullet, and is not limited herein. For example, according to the fact that the speech intensity of the keyword corresponding to the text bullet screen is high, the text bullet screen is added after a suffix is added. The suffix may be, for example, "-", "! "," _ and "", etc., or a combination thereof.

According to another aspect of the present disclosure, a bullet screen generating device is also provided. As shown in fig. 10, the apparatus 1000 may include: a first obtaining unit 1010 configured to obtain an audio signal in a multimedia file; a second obtaining unit 1020 configured to obtain a frequency domain map corresponding to the audio signal; a third obtaining unit 1030 configured to obtain one or more key audio clips based on the frequency domain map, wherein each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and a fourth obtaining unit 1040 configured to, for each key audio clip of the one or more key audio clips, obtain a target barrage matching a target episode of the corresponding key multimedia clip based on the key audio clip.

Referring to fig. 11, a block diagram of a structure of an electronic device 1100, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. The electronic devices may be different types of computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the electronic device 1100 may include at least one processor 1110, a working memory 1120, an input unit 1140, a display unit 1150, a speaker 1160, a storage unit 1170, a communication unit 1180, and other output units 1190, which may be capable of communicating with each other via a system bus 1130.

Processor 1110 may be a single processing unit or multiple processing units, all of which may include single or multiple computing units or multiple cores. Processor 1110 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitry, and/or any devices that manipulate signals based on operational instructions. The processor 1110 may be configured to retrieve and execute computer-readable instructions stored in the working memory 1120, the storage unit 1170, or other computer-readable medium, such as program code for an operating system 1120a, program code for an application program 1120b, and so forth.

Working memory 1120 and storage 1170 are examples of computer-readable storage media for storing instructions that are executed by processor 1110 to perform the various functions described above. The working memory 1120 may include both volatile and non-volatile memory (e.g., RAM, ROM, etc.). Further, storage unit 1170 may include a hard disk drive, solid state drive, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CD, DVD), storage arrays, network attached storage, storage area networks, and so forth. Both working memory 1120 and storage unit 1170 may be collectively referred to herein as memory or computer-readable storage medium, and may be a non-transitory medium capable of storing computer-readable, processor-executable program instructions as computer program code, which may be executed by processor 1110 as a particular machine configured to implement the operations and functions described in the examples herein.

The input unit 1140 may be any type of device capable of inputting information to the electronic device 1100, and the input unit 1140 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output units may be any type of device capable of presenting information and may include, but are not limited to, a display unit 1150, speakers 1160, and other output units 1190, other output units 1190 may include, but are not limited to, video/audio output terminals, vibrators, and/or printers. The communication unit 1180 allows the electronic device 1100 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 1302.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The application 1120b in the working register 1120 may be loaded to perform the various methods and processes described above, such as steps S110-S130 in fig. 1. For example, in some embodiments, the bullet screen generation method can be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1170. In some embodiments, some or all of the computer programs may be loaded and/or installed on the electronic device 1100 via the storage unit 1170 and/or the communication unit 1180. When loaded and executed by processor 1110, may perform one or more of the steps of the bullet screen generation method described above. Alternatively, in other embodiments, processor 1110 may be configured to perform the bullet screen generation method by any other suitable means (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

While embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely illustrative embodiments or examples and that the scope of the invention is not to be limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, the various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. A bullet screen generation method comprises the following steps:

acquiring an audio signal in a multimedia file;

acquiring a frequency domain graph corresponding to the audio signal;

acquiring one or more key audio clips based on the frequency domain graph, wherein each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and

for each key audio clip of the one or more key audio clips, obtaining a target barrage matching a target episode of the corresponding key multimedia clip based on the key audio clip.

2. The method of claim 1, wherein obtaining the frequency domain map corresponding to the audio signal comprises:

segmenting the audio signal into a plurality of audio segments; and

for each of the plurality of audio segments, obtaining a frequency domain map corresponding to the audio segment,

wherein obtaining one or more key audio snippets based on the frequency domain map comprises:

for each of the plurality of audio segments, obtaining a peak in the frequency domain map corresponding to the audio segment, wherein a frequency variation of the frequency domain map at the peak is greater than a preset value, and

determining the one or more key audio segments from the plurality of audio segments based on corresponding peaks in the frequency domain plot.

3. The method of claim 2, wherein the plurality of audio segments are obtained by equally dividing the audio signal,

and the number of peaks in the frequency domain graph corresponding to each key audio clip is greater than the number of peaks in the frequency domain graphs corresponding to other audio clips in the plurality of audio clips.

4. The method of any one of claims 1-3, wherein obtaining a target barrage matching a target episode of a corresponding key multimedia clip based on the key audio clip comprises:

acquiring a target text corresponding to the key audio clip; and

and acquiring the target barrage matched with the target plot of the corresponding key multimedia clip based on the target text.

5. The method of claim 4, wherein the target barrage comprises a text barrage.

6. The method of claim 5, wherein obtaining the target barrage matching the target episode of the corresponding key multimedia clip based on the target text comprises:

acquiring at least one keyword of the target text;

for each keyword in the at least one keyword, acquiring a matched text bullet screen matched with the keyword from a preset bullet screen database; and

and determining the target bullet screen from the acquired at least one matched text bullet screen.

7. The method of claim 6, wherein the target bullet is more relevant to the target episode than other matching text bullets in the at least one matching text bullet.

8. The method of claim 6 or 7, wherein the multimedia file comprises a video, the key multimedia clip is the key video clip, and wherein determining the target barrage from the obtained at least one of the matching text barrages comprises:

acquiring the key video clip corresponding to the key audio clip;

acquiring related image information based on the video frames in the key video clip; and

and determining the target bullet screen from the acquired at least one matching text bullet screen based on the image information.

9. The method of claim 6 or 7, wherein the at least one keyword comprises a plurality of keywords, and wherein determining the target bullet screen from the obtained at least one of the matching text bullet screens comprises:

and screening the target bullet screen from the obtained multiple matched text bullet screens based on the multiple keywords.

10. The method of claim 9, wherein the target barrage includes the matching text barrage that matches each of a preset number of keywords.

11. The method of any of claims 1-10, further comprising:

setting an adding form of the corresponding target barrage according to each key audio clip in the one or more key audio clips based on the key audio clip.

12. The method of claim 11, wherein the target barrage comprises a text barrage, and wherein the text barrage is obtained based on keywords in corresponding target text in the key audio clip, and wherein

The setting of the corresponding adding form of the target bullet screen comprises the following steps:

obtaining audio information of the keywords corresponding to the text bullet screen based on the key audio clips; and

determining the addition form of the text bullet screen based on the audio information.

13. A bullet screen generating device comprising:

a first obtaining unit configured to obtain an audio signal in a multimedia file;

a second obtaining unit, configured to obtain a frequency domain map corresponding to the audio signal;

a third obtaining unit, configured to obtain one or more key audio clips based on the frequency domain map, where each key audio clip corresponds to a key multimedia clip containing a target episode in the multimedia file; and

a fourth obtaining unit, configured to obtain, for each key audio clip of the one or more key audio clips, a target bullet screen matching a target episode of a corresponding key multimedia clip based on the key audio clip.

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores a computer program that, when executed by the at least one processor, implements the method of any one of claims 1-12.

15. A non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-12.

16. A computer program product comprising a computer program, wherein the computer program realizes the method according to any one of claims 1-12 when executed by a processor.