CN115209211A

CN115209211A - Subtitle display method, subtitle display apparatus, electronic device, storage medium, and program product

Info

Publication number: CN115209211A
Application number: CN202211109657.2A
Authority: CN
Inventors: 洪嘉慧
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-09-13
Filing date: 2022-09-13
Publication date: 2022-10-18

Abstract

The disclosure provides a subtitle display method, a subtitle display device, electronic equipment, a storage medium and a program product, and relates to the technical field of computers. The method comprises the following steps: displaying a multimedia page, wherein the multimedia page comprises an audio insertion control and an audio playing control; responding to the triggering operation aiming at the audio insertion control, and determining the audio information of the target audio, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and the word-by-word time information of the target subtitle; and responding to the triggering operation aiming at the audio playing control, playing the target audio, and displaying the target caption word by word on the multimedia page according to the word by word time information of the target caption. The method can display the caption information corresponding to the audio on the multimedia page word by word when the audio is played, and solves the problem that the word by word animation displayed by sound and pictures is not matched.

Description

Subtitle display method, subtitle display apparatus, electronic device, storage medium, and program product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for displaying subtitles, an electronic device, a storage medium, and a program product.

Background

With the rapid development of internet technology and intelligent terminal equipment, video editing technology is widely applied. In the field of video editing, users have a video demand of making a 'following sound content and displaying a character-by-character animation on a picture'. In the related art, animation of each word is uniformly distributed according to the duration corresponding to subtitle information. Due to the diversity of sound sources, the subtitle information may be incomplete, which easily results in mismatching of the sound and the character-by-character animation displayed on the picture.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure provides a subtitle display method, a subtitle display device, an electronic device and a computer-readable storage medium, which at least solve the problem that the character-by-character animation of sound and picture display is not matched.

According to an aspect of an embodiment of the present disclosure, there is provided a subtitle display method including: displaying a multimedia page, wherein the multimedia page comprises an audio inserting control and an audio playing control; responding to a trigger operation aiming at the audio insertion control, and determining audio information of a target audio, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and word-by-word time information of the target subtitle; and responding to the triggering operation aiming at the audio playing control, playing the target audio, and displaying the target caption on the multimedia page word by word according to the word by word time information of the target caption.

In some embodiments of the present disclosure, the multimedia page includes an animation effect control; wherein, before the target caption is displayed on the multimedia page word by word, the method further comprises: determining a target animation effect in response to a selected instruction for the animation effect control; and displaying the target subtitles word by word on the multimedia page, wherein the method comprises the following steps: and displaying the target subtitles word by word on the multimedia page according to the target animation effect.

In some embodiments of the present disclosure, the target subtitle comprises one or more subtitle segments; wherein the method further comprises: responding to the calibration operation aiming at the caption segment, and adjusting the display progress of the caption segment so as to enable the audio corresponding to the caption segment and the caption segment to be played synchronously.

In some embodiments of the present disclosure, adjusting the display progress of the caption segment in response to the calibration operation for the caption segment includes: responding to the calibration operation aiming at the caption segment, and circularly playing the audio corresponding to the caption segment; and in the audio cycle playing process corresponding to the subtitle segment, responding to the display progress adjusting operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment.

In some embodiments of the present disclosure, after adjusting the display progress of the subtitle segment, the method further includes: and stopping circularly playing the audio corresponding to the subtitle segment in response to the completion of the calibration operation aiming at the subtitle segment.

In some embodiments of the present disclosure, determining audio information of a target audio in response to a triggering operation for the audio insertion control includes: in response to the triggering operation of the audio insertion control, displaying an audio text input box on the multimedia page; in response to an operation instruction aiming at the audio text input box, searching audio information of the target audio; and if the search fails, determining the audio information of the target audio through voice recognition.

In some embodiments of the present disclosure, the failure to find includes: the searched audio information of the target audio is incomplete, and the audio information of the target audio is not searched.

In some embodiments of the present disclosure, determining audio information of a target audio in response to a triggering operation for the audio insertion control includes: responding to the triggering operation aiming at the audio insertion control, and displaying an audio recording control on the multimedia page; and responding to the triggering operation of the audio recording control, recording the target audio, and determining the audio information of the target audio through voice recognition.

According to another aspect of the disclosed embodiments, there is provided a subtitle display apparatus including: the page display module is used for displaying a multimedia page, and the multimedia page comprises an audio insertion control and an audio playing control; the audio information determination module is used for responding to the triggering operation aiming at the audio insertion control, and determining the audio information of the target audio, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and the word-by-word time information of the target subtitle; the audio playing module is used for responding to the triggering operation aiming at the audio playing control and playing the target audio; and the subtitle display module is used for displaying the target subtitle on the multimedia page word by word according to the word by word time information of the target subtitle.

In some embodiments of the present disclosure, the multimedia page includes an animation effect control; wherein the apparatus further comprises an animation effect selection module configured to: determining a target animation effect in response to a selected instruction for the animation effect control; and the subtitle display module is further configured to: and displaying the target subtitles word by word on the multimedia page according to the target animation effect.

In some embodiments of the present disclosure, the target subtitle comprises one or more subtitle segments; wherein the apparatus further comprises a subtitle calibration module configured to: and responding to the calibration operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment so as to enable the audio corresponding to the subtitle segment and the subtitle segment to be played synchronously.

In some embodiments of the present disclosure, the caption calibration module is further configured to: responding to the calibration operation aiming at the caption segment, and circularly playing the audio corresponding to the caption segment; and in the audio circulating playing process corresponding to the subtitle segment, responding to the display progress adjusting operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment.

In some embodiments of the present disclosure, the caption calibration module is further configured to: and stopping circularly playing the audio corresponding to the subtitle segment in response to the completion of the calibration operation aiming at the subtitle segment.

In some embodiments of the present disclosure, the audio information determination module is further configured to: in response to a triggering operation for the audio insertion control, displaying an audio text input box on the multimedia page; in response to an operation instruction aiming at the audio text input box, searching audio information of the target audio; and if the search fails, determining the audio information of the target audio through voice recognition.

In some embodiments of the present disclosure, the audio information determination module is further configured to: responding to the triggering operation aiming at the audio insertion control, and displaying an audio recording control on the multimedia page; and responding to the triggering operation of the audio recording control, recording the target audio, and determining the audio information of the target audio through voice recognition.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the subtitle display method described above.

According to still another aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having instructions therein, which when executed by a processor of an electronic device, enable the electronic device to perform the subtitle display method described above.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer program product including a computer program that, when executed by a processor, implements the subtitle display method described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: on one hand, when a user wants to insert the target audio, the audio information of the target audio can be determined, namely the target subtitle contained in the target audio and the word-by-word time information of the target subtitle are determined, and the word-by-word time information can comprise the starting time, the ending time and the duration of each word, so that the audio information of the target audio can be ensured to be refined to the time progress of each word; on the other hand, when the target audio is played, each word in the target caption is displayed on the multimedia page word by word according to the determined word by word time information of the target caption, so that the matching of the played audio and the displayed word by word animation of the page can be ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram of an exemplary system architecture for a subtitle display method shown in accordance with one exemplary embodiment;

FIG. 2 is a flow diagram illustrating a subtitle display method according to an example embodiment;

FIG. 3 is a schematic diagram of a multimedia page shown in accordance with an exemplary embodiment;

FIG. 4 is a diagram illustrating an interaction procedure page in accordance with an illustrative embodiment;

FIG. 5 is a schematic illustration of an interactive process page shown in accordance with yet another exemplary embodiment;

FIG. 6 is a schematic illustration of an interaction process page shown in accordance with yet another exemplary embodiment;

FIG. 7 is a schematic diagram of a multimedia page shown in accordance with yet another embodiment;

FIG. 8 is a schematic diagram illustrating calibration of caption segments in accordance with an exemplary embodiment;

fig. 9 is a block diagram of a subtitle display apparatus according to an exemplary embodiment;

fig. 10 is a block diagram illustrating the structure of an electronic device according to an example embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

The described features, structures, or characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in at least one hardware module or integrated circuit, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and steps, nor do they necessarily have to be performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In this specification, the terms "a", "an", "the", "said" and "at least one" are used to indicate the presence of at least one element/component/etc.; the terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. other than the listed elements/components/etc.; the terms "first," "second," and "third," etc. are used merely as labels, and are not limiting on the number of their objects.

It should be noted that the user information referred to in the present disclosure, including but not limited to user device information, user personal information, etc., is information authorized by the user or sufficiently authorized by each party.

Fig. 1 is a schematic diagram of an exemplary system architecture of a subtitle display method shown according to an exemplary embodiment. As shown in fig. 1, the system architecture may include a server 101, a network 102, and a client 103. Network 102 serves as a medium for providing communication links between clients 103 and server 101. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

In an exemplary embodiment, the client 103 performing data transmission with the server 101 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an AR (Augmented Reality) device, a VR (Virtual Reality) device, a smart wearable device, and other types of electronic devices, or the client 103 may be a personal computer such as a laptop computer, a desktop computer, and the like. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

The server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform. In some practical applications, the server 101 may also be a server of a network platform, and the network platform may be, for example, a transaction platform, a live broadcast platform, a social platform, or a music platform, which is not limited in this disclosure. The server may be one server or a cluster formed by a plurality of servers, and the specific architecture of the server is not limited in the present disclosure.

In an exemplary embodiment, the subtitle display method may be implemented by the client 103, for example, displaying a multimedia page by the client 103, wherein the multimedia page includes an audio insertion control and an audio play control; determining audio information of a target audio by the client 103 in response to a triggering operation for the audio insertion control, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and the word-by-word time information of the target subtitle; and responding to the triggering operation aiming at the audio playing control by the client 103, playing the target audio, and displaying the target caption word by word on the multimedia page according to the word by word time information of the target caption.

In an exemplary embodiment, the server 101 and the client 103 may also jointly implement a subtitle display method, for example, a multimedia page including an audio insertion control and an audio play control is displayed on the client 103; the client 103 may receive a trigger operation for the audio insertion control, and then the client 103 sends a request for determining audio information of the target audio to the server 101, and then the server 101 determines the audio information of the target audio and returns the determined audio information to the client 103, where the audio information includes the target subtitle and the verbatim time information of the target subtitle; the client 103 may receive a trigger operation for the audio playing control, then play the target audio, and display the target subtitle on the multimedia page word by word according to the word by word time information of the target subtitle. It should be noted that the execution subject of the method provided by the present disclosure may be any electronic device, and the present embodiment is not limited herein.

In addition, it should be noted that fig. 1 illustrates only one application environment of the subtitle display method provided by the present disclosure. The number of clients, networks and servers in fig. 1 is merely illustrative, and there may be any number of clients, networks and servers, as desired.

In order to make those skilled in the art better understand the technical solution of the present disclosure, the following describes each step of the subtitle display method in the exemplary embodiment of the present disclosure in more detail with reference to the drawings and the embodiment.

Fig. 2 is a flowchart illustrating a subtitle display method according to an exemplary embodiment, and an execution subject of the method provided in the embodiment of fig. 2 may be any electronic device, for example, the client 103 in the embodiment of fig. 1, but the disclosure is not limited thereto. As shown in fig. 2, the subtitle display method may include the following steps S201 to S203.

Step S201, displaying a multimedia page, wherein the multimedia page comprises an audio inserting control and an audio playing control.

In the disclosed embodiment, the multimedia page may be a display interface installed in a multimedia application on the client 103. The multimedia application may provide a video clip making function, and the specific type of the multimedia application is not limited, for example, the multimedia application may be a video clip application program, a short video application program, a live broadcast application program, or the like, and may also be other types of application programs, and a user may register an account in the multimedia application and log in.

FIG. 3 is a schematic diagram of a multimedia page shown in accordance with an exemplary embodiment. As shown in fig. 3, the multimedia page may include a video preview area 301 and an edit area 302. The video preview area 301 is used to display the display result of the video being edited, and the editing area 302 is used to edit the produced video. Note that the editing and creating may be performed by video, pictures, animation, and the like, and the embodiment of the present disclosure is not limited thereto. The edit section 302 may include an audio insertion control 3021 and an audio playback control 3022.

Among other things, the audio insertion control 3021 may be used to insert audio information. If the user clicks on the audio insertion control 3021, the user may select the desired audio information to insert into the video or picture or animation being clipped. Of course, the insertion of the audio may be implemented in other ways, such as the user clicks the right button in the editing area 302 to pop up the audio insertion option, so that the insertion of the audio can be implemented through the audio insertion option.

The audio play control 3022 can be used to control the playing of audio, such as starting playing and pausing playing. If the user clicks on the audio play control 3022, the user-inserted audio may be played. Of course, the audio playing may also be implemented in other manners, for example, after the user inserts the audio, the user may click a right button in the editing area 302 to pop up an audio playing option, so that the audio playing may be controlled through the audio playing option. In the embodiment of the present disclosure, the playing sequence of the audio and the playing speed of the audio may also be controlled by the audio playing option, for example, if a user inserts a plurality of audio information, the playing sequence such as circular playing, sequential playing or random playing may be selected by the audio playing option, and 2-speed playing and 0.5-speed playing may also be selected by the audio playing option.

In fig. 3, the edit section 302 may further include an animation effect control 3023, and the animation effect control 3023 may be used to select an animation effect of a subtitle. The animation effects control 3023 may include a first child control 30231, a second child control 30232, and a third child control 30233. Wherein the first sub-control 30231 indicates that the animation effect of the subtitle is feather writing. After the user clicks the first sub widget 30231, the subtitle is displayed in a feather writing effect while the subtitle of the target audio is displayed. For example, if the word "a" is displayed, the animation effect is to display "a" in the form of writing with feathers. The second child control 30232 represents that the animation effect of the subtitle is a karaoke display. After the user clicks the second sub control 30232, the subtitle is displayed with the karaoke effect while the subtitle of the target audio is displayed. For example, if the word "a" is displayed, the animation effect is that the display of "a" is changed from gray level to bright. The third child control 30233 represents that the animation effect of the subtitle is a jump love display. After the user clicks the third sub-widget 30233, the subtitle is displayed in a jumping love effect while the subtitle of the target audio is displayed. For example, if the word "A" is displayed, the animation effect is that a love mark appears above "A". Of course, other animation effects may also be provided, such as writing a rose, writing a maple leaf, jumping a note, jumping a snowflake, jumping a star, and the like, which is not limited in this disclosure.

Step S202, responding to the triggering operation aiming at the audio insertion control, and determining the audio information of the target audio, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and the word-by-word time information of the target subtitle.

The time information of the target caption may include the start time, the end time and the duration of each word in the target caption, so that what time each word is played, what time it is ended, and how long it lasts can be obtained through the time information of the target caption. If the target audio is a song that the user wants to insert, the target caption corresponding to the target audio refers to the lyrics of the song, and the word-by-word time information of the target caption refers to the playing time, the ending time and the duration of each word in the lyrics.

In the embodiment of the present disclosure, in response to a triggering operation for the audio insertion control, determining the audio information of the target audio may include: responding to the triggering operation aiming at the audio insertion control, and displaying an audio text input box on the multimedia page; in response to an operation instruction aiming at the audio text input box, searching audio information of the target audio; and if the search fails, determining the audio information of the target audio through voice recognition.

The search failure may be that the audio information of the searched target audio is incomplete, for example, the word-by-word time information of the target subtitle is not searched; the failure to find may be that the audio information of the target audio is not found.

In the embodiment of the present disclosure, the trigger operation may include, but is not limited to, a click operation, a slide operation, a long press operation, a double-click operation, a voice trigger operation, and the like.

When a user wants to insert a target audio, clicking an audio insertion control on a multimedia page, and then displaying an audio text input box on the multimedia page; then, the user may input the name of the target audio that the user wants to insert or input the link of the target audio that the user wants to insert in the audio text input box, and the client may search the audio database for the audio information of the target audio. If the audio information of the target audio is not found or the found audio information of the target audio is incomplete, the client may identify the audio information of the target audio through a speech recognition technology. Specifically, if the search fails, a text box pops up on the multimedia page, and the text box displays that "the search of the target audio fails, please play the target audio to be inserted", and then the user can play the target audio which the user wants to insert, so that the client can identify the target subtitle corresponding to the target audio and the character-by-character time information of the target subtitle through a voice recognition technology.

In the embodiment of the present disclosure, the server may also search the audio information of the target audio in the audio database, and then return the audio information of the target audio to the client. If the server fails to search, the server can return search failure information to the client, and then the client can recognize the audio information of the target audio through a voice recognition technology. Of course, in the embodiment of the present disclosure, the server may also identify the audio information of the target audio by using a speech recognition technology, which is not limited in the embodiment of the present disclosure.

FIG. 4 is a diagram illustrating an interaction procedure page in accordance with an exemplary embodiment. As shown in fig. 4, after the user clicks the audio insertion control 3021, an audio text input box 401 pops up on the multimedia page; the user may enter a target audio desired to be inserted, such as the name "XXX" of the target audio, in audio text entry box 401, so that the client may look up the audio information of the target audio in the audio database; if the search fails, a prompt of 'audio information search failure, please confirm whether the audio information is identified through the voice recognition technology' is popped up on the multimedia page; if the user confirms, the user can play the target audio or the client triggers the playing of the target audio, so that the client can recognize the audio information of the target audio through a voice recognition technology; after the audio information of the target audio is obtained, a prompt of 'audio information acquisition success' is popped up on the multimedia page.

In the embodiment of the disclosure, the audio information of the target audio is determined by combining the audio database query and the voice recognition technology, that is, the target caption corresponding to the target audio and the word-by-word time information of the target caption can be obtained, so that the target caption can be displayed on a multimedia page word-by-word according to the word-by-word time information of the target caption when the target audio is played, and the problem that the displayed word-by-word animation of sound and picture is not matched is solved.

In the embodiment of the present disclosure, in response to a triggering operation for an audio insertion control, determining audio information of a target audio may include: responding to the triggering operation aiming at the audio insertion control, and displaying an audio recording control on the multimedia page; and responding to the triggering operation of the audio recording control, recording the target audio, and determining the audio information of the target audio through voice recognition.

When the target audio which the user wants to insert is the recorded audio, the user can click the audio inserting control on the multimedia page and then can display the audio recording control on the multimedia page; then, the user may click the audio recording control, and then the client starts recording the audio, and when recording the audio, the client may identify the audio information of the audio being recorded through a speech recognition technology, that is, the client may identify the subtitle corresponding to the audio being recorded and the word-by-word time information of the subtitle.

FIG. 5 is a schematic diagram illustrating an interaction procedure page in accordance with yet another exemplary embodiment. As shown in fig. 5, after the user clicks the audio insertion control 3021, an audio recording control 501 pops up on the multimedia page; after the user clicks the audio recording control 501, the client starts recording the audio, and when recording the audio, the client can identify the audio information of the audio being recorded through a voice recognition technology; after the audio information of the audio being recorded is obtained, a prompt of 'audio information acquisition success' is popped up on the multimedia page.

In the embodiment of the disclosure, the audio recording is performed through the audio recording control, and meanwhile, the audio information of the recorded audio can be identified through the voice recognition technology, that is, the caption corresponding to the recorded audio and the word-by-word time information of the caption can be obtained, so that the recorded audio can be played subsequently, and the caption of the recorded audio can be displayed word-by-word on the multimedia page when the recorded audio is played.

Step S203, responding to the trigger operation aiming at the audio playing control, playing the target audio, and displaying the target caption word by word on the multimedia page according to the word by word time information of the target caption.

The verbatim time information of the target caption may include a start time, an end time, and a duration of each word of the target caption. After the user clicks the audio playing control, the target audio can be played, and simultaneously, each character can be displayed on the multimedia page according to the starting time, the ending time and the duration of each character in the target caption.

In embodiments of the present disclosure, the target caption may include one or more caption segments, for example, the target audio is a song and the target caption is lines of lyrics of the song. As another example, the target audio is a segment of audio recorded by the user, and the target subtitle is a multiple of the segment of audio. When the target audio is played, the currently played subtitle segment can be displayed on the multimedia page according to the audio playing progress.

As already explained above, the multimedia page may include an animation effect control 3023, the animation effect control 3023 being used to select an animation effect of the subtitle. In this embodiment of the present disclosure, before displaying the target subtitles on the multimedia page word by word, the subtitle displaying method may further include: responding to a selected instruction aiming at the animation effect control, and determining a target animation effect; and displaying the target caption word by word on the multimedia page, including: and displaying the target subtitles word by word on the multimedia page according to the target animation effect.

After determining the audio information for the target audio, the user may select a child control of the animation effect controls to determine the target animation effect. Thus, on the multimedia page, each word in the target subtitle can be displayed word by word according to the target animation effect. The specific implementation may be that after determining a target subtitle corresponding to the target audio and the word-by-word time information of the target subtitle, the word-by-word time information of the target subtitle is put into a key frame of an animation effect, so that each word in the target subtitle may be displayed on the multimedia page in the form of the animation effect according to the start time, the end time, and the duration of each word in the target subtitle.

FIG. 6 is a schematic diagram illustrating an interaction process page in accordance with yet another illustrative embodiment. As shown in fig. 6, after determining the audio information of the target audio, the user clicks the second sub-control 30232, i.e., the user selects the animation effect as a karaoke display; then, after the user clicks the audio play control 3022, the target audio is played, and the subtitle segments included in the target subtitle are displayed word by word on the multimedia page in accordance with the karaoke animation effect.

In the embodiment of the present disclosure, the subtitle display method may further include: and responding to the calibration operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment so as to enable the audio corresponding to the subtitle segment and the subtitle segment to be played synchronously.

In the process of displaying the caption segment in the target caption on the multimedia page, the displayed caption segment can be calibrated, and the display progress of the displayed caption segment is adjusted, so that the displayed caption segment and the audio corresponding to the caption segment can be ensured to be synchronously played. Synchronous playing refers to that when a certain word in the played audio is played, the word is displayed on the multimedia page according to the animation effect.

Further, in response to the calibration operation for the subtitle segment, adjusting the display progress of the subtitle segment may include: responding to the calibration operation aiming at the caption segment, and circularly playing the audio corresponding to the caption segment; and in the audio cycle playing process corresponding to the subtitle segment, responding to the display progress adjustment operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment.

After the user selects the calibration operation caption segment, it indicates that the caption segment needs to be adjusted word by word, and under this condition, the audio corresponding to the caption segment is played in a circulating manner. Therefore, in the process of circular playing, a user can adjust the display progress of the subtitle segment according to the audio playing progress of the subtitle segment.

Fig. 7 is a schematic diagram of a multimedia page shown in accordance with yet another embodiment. As shown in fig. 7, the multimedia page includes: a play progress 701, a subtitle segment 702, and a calibration operation control 703. The playing progress 701 is used for displaying the playing progress of the target audio, the caption segment 702 is used for representing the caption segment displayed on the multimedia page, and the calibration operation control 703 is used for performing calibration operation on the displayed caption segment.

When the user wants to perform a calibration operation on the displayed caption segment 702, the user clicks the calibration operation control 703, and then can select to play the audio corresponding to the displayed caption segment 702. Next, the user can adjust the display progress of the displayed caption segment 702 according to the audio that is played back cyclically.

FIG. 8 is a diagram illustrating calibration of caption segments according to an exemplary embodiment. As shown in fig. 8, the subtitle segment that needs to be adjusted word by word is "ABCDEFG", and the audio playback progress of the subtitle segment is F, but the display progress is D, so the display progress needs to be adjusted to F.

In addition, after the display progress of the subtitle segment is adjusted, the subtitle display method may further include: and stopping circularly playing the audio corresponding to the subtitle segment in response to the completion of the calibration operation for the subtitle segment.

After the subtitle segment is calibrated, the user can confirm that the calibration is completed, and thus, the audio corresponding to the subtitle segment can be stopped from being played circularly. For example, the user may click the calibration operation control 703 again to confirm that the calibration is complete. The target audio may then continue to be played and the user may perform calibration operations on other caption segments.

In the embodiment of the disclosure, when a user wants to insert a target audio, the target audio may include a target subtitle and verbatim time information of the target subtitle through an audio library and a speech recognition technology, and the verbatim time information may include a start time, an end time and a duration of each word, so that it may be ensured that the audio information of the target audio can be refined to the time progress of each word. And the animation display effect of the caption can be selected, and when the target audio is played, the target caption is displayed on the multimedia page word by word according to the word by word time information of the target caption according to the selected animation effect of the caption. In addition, in order to avoid inaccurate identification, calibration operation can be performed on each subtitle segment contained in the target subtitle, so that matching of played sound and character-by-character animation displayed on the page can be ensured.

Examples of the subtitle display method provided by the present disclosure are described above in detail. It will be appreciated that the computer device, in order to implement the above-described functions, comprises corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 9 is a block diagram of a subtitle display apparatus according to an exemplary embodiment. Referring to fig. 9, the apparatus 900 includes: a page display module 901, an audio information determination module 902, an audio play module 903, and a subtitle display module 904.

The page display module 901 may be used to: and displaying the multimedia page. The multimedia page comprises an audio inserting control and an audio playing control.

The audio information determination module 902 may be operable to: and in response to the triggering operation aiming at the audio insertion control, determining the audio information of the target audio. The audio information of the target audio comprises a target subtitle corresponding to the target audio and the character-by-character time information of the target subtitle.

The audio playback module 903 may be configured to: and responding to the triggering operation aiming at the audio playing control, and playing the target audio.

The subtitle display module 904 may be operable to: and displaying the target caption word by word on the multimedia page according to the word by word time information of the target caption.

In some embodiments of the present disclosure, the multimedia page includes an animation effect control. The subtitle display apparatus 900 further includes an animation effect selection module 905 configured to: and determining a target animation effect in response to the selected instruction aiming at the animation effect control. And, the subtitle display module 904 may also be configured to: and displaying the target subtitles word by word on the multimedia page according to the target animation effect.

In some embodiments of the present disclosure, the target subtitle comprises one or more subtitle segments. The subtitle display apparatus 900 further includes a subtitle calibration module 906, configured to: and responding to the calibration operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment so as to enable the audio corresponding to the subtitle segment and the subtitle segment to be played synchronously.

In some embodiments of the present disclosure, the caption calibration module 906 may also be configured to: responding to the calibration operation aiming at the caption segment, and circularly playing the audio corresponding to the caption segment; and in the audio cycle playing process corresponding to the subtitle segment, responding to the display progress adjusting operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment.

In some embodiments of the present disclosure, the caption calibration module 906 may also be configured to: and stopping circularly playing the audio corresponding to the subtitle segment in response to the completion of the calibration operation for the subtitle segment.

In some embodiments of the present disclosure, the audio information determination module 902 is further operable to: responding to the triggering operation aiming at the audio insertion control, and displaying an audio text input box on the multimedia page; in response to an operation instruction aiming at the audio text input box, searching audio information of the target audio; and if the search fails, determining the audio information of the target audio through voice recognition.

In some embodiments of the present disclosure, the audio information determination module 902 is further operable to: responding to the triggering operation aiming at the audio insertion control, and displaying an audio recording control on the multimedia page; and responding to the triggering operation of the audio recording control, recording the target audio, and determining the audio information of the target audio through voice recognition.

It is noted that the block diagrams shown in the above figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor terminal devices and/or microcontroller terminal devices.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 10 is a block diagram illustrating the structure of an electronic device according to an example embodiment. An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting different system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.

Where the storage unit stores program code that may be executed by the processing unit 1010 to cause the processing unit 1010 to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may perform various steps as shown in fig. 2.

As another example, the electronic device may implement the various steps shown in FIG. 2.

The memory unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 1021 and/or a cache memory unit 1022, and may further include a read only memory unit (ROM) 1023.

Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.

The electronic device 1000 may also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus to perform the above-described method is also provided. Alternatively, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product comprising a computer program/instructions which, when executed by a processor, implement the processing method of the encoding unit in the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A subtitle display method, comprising:

displaying a multimedia page, wherein the multimedia page comprises an audio inserting control and an audio playing control;

responding to a trigger operation aiming at the audio insertion control, and determining audio information of a target audio, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and word-by-word time information of the target subtitle;

and responding to the triggering operation aiming at the audio playing control, playing the target audio, and displaying the target caption on the multimedia page word by word according to the word by word time information of the target caption.

2. The method of claim 1, wherein the multimedia page includes an animation effect control;

wherein, before the target caption is displayed on the multimedia page word by word, the method further comprises: determining a target animation effect in response to a selected instruction for the animation effect control; and (c) a second step of,

displaying the target caption word by word on the multimedia page, including: and displaying the target subtitles word by word on the multimedia page according to the target animation effect.

3. The method of claim 1, wherein the target caption comprises one or more caption segments; wherein the method further comprises:

and responding to the calibration operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment so as to enable the audio corresponding to the subtitle segment and the subtitle segment to be played synchronously.

4. The method of claim 3, wherein adjusting the display progress of the caption segment in response to the calibration operation for the caption segment comprises:

responding to the calibration operation aiming at the caption segment, and circularly playing the audio corresponding to the caption segment;

and in the audio cycle playing process corresponding to the subtitle segment, responding to the display progress adjusting operation aiming at the subtitle segment, and adjusting the display progress of the subtitle segment.

5. The method of claim 4, wherein after adjusting the display progress of the caption segment, the method further comprises:

and stopping circularly playing the audio corresponding to the subtitle segment in response to the completion of the calibration operation aiming at the subtitle segment.

6. The method of claim 1, wherein determining audio information of a target audio in response to a triggering operation for the audio insertion control comprises:

in response to the triggering operation of the audio insertion control, displaying an audio text input box on the multimedia page;

in response to an operation instruction aiming at the audio text input box, searching audio information of the target audio;

and if the search fails, determining the audio information of the target audio through voice recognition.

7. The method of claim 6, wherein the failure to find comprises: the searched audio information of the target audio is incomplete, and the audio information of the target audio is not searched.

8. The method of claim 1, wherein determining audio information of a target audio in response to a triggering operation for the audio insertion control comprises:

responding to the triggering operation aiming at the audio insertion control, and displaying an audio recording control on the multimedia page;

and responding to the triggering operation of the audio recording control, recording the target audio, and determining the audio information of the target audio through voice recognition.

9. A subtitle display apparatus, comprising:

the page display module is used for displaying a multimedia page, and the multimedia page comprises an audio insertion control and an audio playing control;

the audio information determination module is used for responding to the triggering operation aiming at the audio insertion control, and determining the audio information of the target audio, wherein the audio information of the target audio comprises a target subtitle corresponding to the target audio and the word-by-word time information of the target subtitle;

the audio playing module is used for responding to the triggering operation aiming at the audio playing control and playing the target audio;

and the subtitle display module is used for displaying the target subtitle on the multimedia page word by word according to the word by word time information of the target subtitle.

10. An electronic device, comprising: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the subtitle display method according to any one of claims 1 to 8.

11. A computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the subtitle display method of any one of claims 1-8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the subtitle display method according to any one of claims 1 to 8.