CN112530472B

CN112530472B - Audio and text synchronization method and device, readable medium and electronic equipment

Info

Publication number: CN112530472B
Application number: CN202011355874.0A
Authority: CN
Inventors: 张玮维; 陈冬禹; 杨华鹏; 郑佳锋
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2022-06-21
Anticipated expiration: 2040-11-26
Also published as: CN112530472A

Abstract

The present disclosure relates to a method, an apparatus, a readable medium and an electronic device for synchronizing audio and text, and relates to the technical field of electronic information, including: determining a first content unit corresponding to a first playing progress in a text according to the first playing progress of an audio, wherein the audio has an association relation with the text, determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface, if the preset number of elements at the tail end in the first content unit are not displayed in the target area, determining a first proportion of the elements displayed in the target area in the first content unit and the elements included in the first content unit, and determining whether to execute a page turning operation according to the first playing progress and the first proportion. According to the method and the device, the corresponding content unit in the text is determined according to the playing progress of the audio, whether page turning operation is executed or not is determined according to the position of the content unit displayed on the text display interface, the audio and the text can be kept synchronous, and the page turning accuracy is improved.

Description

Audio and text synchronization method and device, readable medium and electronic equipment

Technical Field

The present disclosure relates to the field of electronic information technologies, and in particular, to a method and an apparatus for synchronizing audio and text, a readable medium, and an electronic device.

Background

In the technical field of electronic information, as intelligent terminals are more and more popularized in daily life, users can read at any time and any place through reading software on the intelligent terminals. However, in many usage scenarios, the requirement of the user cannot be met simply by reading, and in the process of reading the novel by the user, the audio corresponding to the novel can be added for synchronous playing, so that the user can read and listen at the same time, the novel information is obtained from two dimensions of vision and hearing, and the reading experience of the user is improved. Therefore, how to keep the audio synchronized with the novel is a key to improve the reading experience of the user.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a method for synchronizing audio and text, the method comprising:

determining a first content unit corresponding to a first playing progress in a text according to the first playing progress of the audio; wherein the audio has an association relationship with the text;

determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface;

if the last preset number of elements in the first content unit are not displayed in the target area, determining a first proportion of the elements displayed in the target area in the first content unit to the elements included in the first content unit;

and determining whether to execute page turning operation according to the first playing progress and the first proportion.

In a second aspect, the present disclosure provides an apparatus for synchronizing audio and text, the apparatus comprising:

the progress determining module is used for determining a first content unit corresponding to a first playing progress in a file according to the first playing progress of the audio; wherein the audio has an association relationship with the text;

the display determining module is used for determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface or not;

a ratio determining module, configured to determine a first ratio between an element, which is shown in the target area, in the first content unit and an element included in the first content unit if a preset number of elements at the end of the first content unit are not shown in the target area;

and the control module is used for determining whether to execute page turning operation according to the first playing progress and the first proportion.

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect of the present disclosure.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the method of the first aspect of the present disclosure.

According to the technical scheme, the method comprises the steps of firstly determining a first content unit corresponding to a text with an association relation with the audio according to a first playing progress of the audio, then determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface, determining a first proportion between the elements displayed in the target area in the first content unit and the elements included in the first content unit under the condition that the preset number of elements at the tail end in the first content unit are not displayed in the target area, and finally determining whether page turning operation is executed according to the first playing progress and the first proportion. According to the method and the device, the corresponding content unit in the text is determined according to the playing progress of the audio, whether page turning operation is executed or not is determined according to the position of the content unit displayed on the text display interface, the audio and the text can be kept synchronous, and the page turning accuracy is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow diagram illustrating a method for synchronizing audio and text in accordance with an exemplary embodiment;

FIG. 2 is a schematic diagram of a text presentation interface shown in accordance with an exemplary embodiment;

FIG. 3 is a flow chart illustrating another method of audio and text synchronization in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating another method of synchronizing audio and text in accordance with an exemplary embodiment;

FIG. 5 is a flow diagram illustrating another method of audio and text synchronization in accordance with an exemplary embodiment;

FIG. 6 is a flow diagram illustrating another method of audio and text synchronization in accordance with an exemplary embodiment;

FIG. 7 is a flow chart illustrating another method of audio and text synchronization in accordance with an exemplary embodiment;

FIG. 8 is a schematic diagram of a text presentation interface shown in accordance with an exemplary embodiment

FIG. 9 is a block diagram illustrating an apparatus for audio and text synchronization in accordance with an exemplary embodiment;

FIG. 10 is a block diagram illustrating another audio and text synchronization apparatus in accordance with an exemplary embodiment;

FIG. 11 is a block diagram illustrating another audio and text synchronization apparatus in accordance with an exemplary embodiment;

FIG. 12 is a block diagram illustrating another audio and text synchronization apparatus in accordance with an exemplary embodiment;

FIG. 13 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flow chart illustrating a method for synchronizing audio and text, as shown in fig. 1, according to an exemplary embodiment, which may include the steps of:

step 101, determining a first content unit corresponding to a first playing progress in a text according to the first playing progress of the audio. Wherein the audio and the text have an association relationship.

For example, a user can read a text through reading software installed on the terminal device, the reading software can display the content in the text on a text display interface (e.g., a display screen) of the terminal device, and can play audio associated with the text through a playing device (e.g., a speaker) of the terminal device. It can be understood that the reading software includes two parts, namely a reader and a player, the reader is used for controlling the content displayed on the display interface by the text, and the player is used for controlling the playing device to play audio. The text may be an electronic book (e.g., a novel), or may be a chapter or a paragraph in the electronic book. The text may be any Format file, such as txt,. chm (english: formatted HTML Help file),. pdf (english: Portable Document Format),. epub (english: Electronic Publication),. mobi, and the like, and the disclosure is not limited thereto. Accordingly, the audio may be, for example, an audio file obtained by converting a Text by a TTS (Text To Speech, chinese) service, or an audio file recorded according To a Text. The Audio may be a file of any format, such as MP3,. WAV,. WMA (english: Windows Media Audio),. AMR (english: Adaptive Multi-Rate), etc., and the disclosure is not limited thereto.

The mapping relationship between the playing progress of the audio and the content unit in the text can be pre-established, a plurality of mapping records are stored in the mapping relationship, and each mapping record comprises the corresponding relationship between the content unit and the playing progress range. The playing progress range is used to indicate one or more audio frames to which audio is played, and the playing progress range may be, for example, a frame number range or a playing time range. A content unit may be understood as one or more sentences in the text, or may be one or more paragraphs, or one or more chapters. Therefore, the terminal equipment can determine which content unit in the text corresponds to the playing progress of the audio at a certain moment according to the mapping relation.

When the terminal equipment plays the audio, the content in the text is displayed on the text display interface, so that the user can read and listen at the same time. At the current moment, the playing progress of the audio is a first playing progress, the text presentation interface presents first content, and the first content can be understood as a segment in the text and can include one or more content units, for example, the first content unit can be 102 th segment to 105 th segment in the text. Then it may be determined that the first playing progress corresponds to the first content unit in the text according to the mapping relationship. For example, the first content displayed on the text display interface is as shown in (a) of fig. 2, and the first content unit corresponding to the first playing progress is "thank you |)! "this statement.

Step 102, determining whether a preset number of elements at the end of the first content unit are displayed in a target area of the text display interface.

For example, after determining the first content unit corresponding to the first playing progress, it may be determined whether the first content unit is completely displayed in the target area of the text display interface. When each content unit in the text is displayed on the text display interface, a phenomenon of page crossing often occurs, that is, one content unit cannot be completely displayed in a target area of the text display interface at one time, so that the corresponding content unit is determined according to the playing progress of the audio, the page is difficult to be accurately turned, and the playing progress of the audio and the content unit in the text are difficult to keep synchronous. Therefore, it is necessary to determine whether the first content unit is completely displayed in the target area of the text display interface at the current time. Each content unit may comprise one or more elements, an element being understood as the smallest unit that constitutes a content unit, an element being a word (or a word) if the content unit is a sentence or paragraph. The target area may be, for example, an area where a last content unit in the text presentation interface is located, or may be a pre-specified area (e.g., a middle area, an upper area, a lower area, etc. of the text presentation interface).

The first content unit is completely displayed in the target area of the text display interface, and represents the current moment, and all elements forming the first content unit are displayed in the target area. The first content unit is not completely displayed in the target area of the text display interface, and can be divided into three scenes: scene one, the first predetermined number of elements in the first content unit are not shown in the target area. Scenario two, the last predetermined number of elements in the first content unit are not shown in the target area. In a third scenario, neither the first preset number of elements nor the last preset number of elements in the first content unit are shown in the target area. The preset number may be, for example, 1 or 2, and may also be other numbers, which are not specifically limited in the present disclosure. In order to keep the playing progress of the audio and the content units in the text synchronous, the page turning operation needs to be performed at a proper time determined in the scene two and the scene three. Therefore, it may be determined whether a preset number of elements at the end of the first content unit are displayed in the target area of the text display interface.

Step 103, if the last predetermined number of elements in the first content unit are not displayed in the target area, determining a first ratio between the elements displayed in the target area in the first content unit and the elements included in the first content unit.

And step 104, determining whether to execute page turning operation according to the first playing progress and the first proportion.

For example, if a predetermined number of elements at the end of the first content unit are not displayed in the target area, indicating that the first content unit is not completely displayed in the target area of the text display interface, a ratio of the elements displayed in the target area in the first content unit to the elements included in the first content unit, i.e., a first ratio, may be determined. The first ratio may be understood as a ratio of the number of elements shown in the target area to the total number of all elements constituting the first content unit, may also be understood as a ratio of an area occupied by the elements shown in the target area to the total area of all elements of the first content unit, and may also be understood as a ratio of the number of bytes of the elements shown in the target area to the total number of bytes of all elements of the first content unit.

For example, as shown in fig. 2 (b), the first content displayed by the display interface is a sentence in which the first content unit corresponding to the first playing progress is "this is a sample text", and only "this is a sample" of the first content unit is displayed in the text display interface, that is, the last three elements "sample text" of the first content unit are not displayed in the target area. At this point, the number of elements in the first content unit that are shown in the target area is a first proportion of the total number of all elements in the first content unit, which is 5/8. Finally, whether to perform a page turning operation may be determined according to the first play progress and the first ratio. Specifically, it may be considered that the speed when the sound corresponding to each element in the audio is played is the same, which element of the first content unit the audio is played to may be determined according to the first playing progress, if the element of the audio played is determined according to the first playing progress and corresponds to the last element of the first content unit displayed in the target area, a page turning operation may be performed, and if the element of the audio played is determined according to the first playing progress and does not correspond to the last element of the first content unit displayed in the target area, the page turning operation may not be performed, and the audio continues to be played.

Taking the example of (b) in fig. 2, if the element to which the audio is played is determined according to the first playing progress and corresponds to "sample", the page turning operation is performed, and if the element to which the audio is played is determined according to the first playing progress and corresponds to "piece", the page turning operation is not performed, and the audio continues to be played.

In summary, the present disclosure first determines, according to a first playing progress of an audio, a first content unit corresponding to a text having an association relationship with the audio, then determines whether a preset number of elements at the end of the first content unit are displayed in a target area of a text display interface, determines, under a condition that the preset number of elements at the end of the first content unit are not displayed in the target area, a first ratio between the elements displayed in the target area in the first content unit and the elements included in the first content unit, and finally determines whether to perform a page turning operation according to the first playing progress and the first ratio. According to the method and the device, the corresponding content unit in the text is determined according to the playing progress of the audio, whether page turning operation is executed or not is determined according to the position of the content unit displayed on the text display interface, the audio and the text can be kept synchronous, and the page turning accuracy is improved.

Fig. 3 is a flowchart illustrating another audio and text synchronization method according to an exemplary embodiment, as shown in fig. 3, before step 102, the method may further include:

step 105, determining whether an element in the first content unit is displayed in the text display interface.

Accordingly, the implementation manner of step 102 may be:

if any element in the first content unit is displayed in the text display interface, whether a preset number of elements at the tail end in the first content unit are displayed in the target area or not is determined.

In an implementation scenario, after determining the first content unit according to the first playing progress, it may be further determined whether the content displayed in the current text display interface includes the first content unit. That is, it is determined whether the content displayed in the text display interface and the first play progress are synchronized. The content displayed on the audio and text display interfaces is not synchronized because the user may manually turn pages while reading the text. For example, the current audio playing progress corresponds to a sentence with a sentence mark of 0x330 in the text, the content displayed on the text display interface includes the sentence with the sentence mark of 0x330, and then the user manually turns the page forward twice, so that the content displayed on the text display interface does not include the sentence with the sentence mark of 0x330, and the content displayed on the text display interface and the first playing progress are not synchronized any more.

Specifically, it may be determined whether any element of the first content unit is displayed in the text display interface. If none of the elements in the first content unit are displayed in the text display interface, the audio and the text are not synchronized at this time, and the audio can be controlled to continue playing and the original content can continue to be displayed on the text display interface (i.e. no page turning operation is performed). If there are one or more elements in the first content unit that are presented in the text presentation interface, indicating that the audio is now synchronized with the text, it may be further determined whether a last predetermined number of elements in the first content unit are presented in the target area.

Fig. 4 is a flowchart illustrating another audio and text synchronization method according to an exemplary embodiment, and as shown in fig. 4, step 101 may be implemented by:

step 1011, determining a first mapping record matched with the first playing progress in a preset mapping relation, where the first mapping relation includes at least one mapping record, the mapping record includes a corresponding relation between the content unit and an audio playing progress range, and the first playing progress belongs to the playing progress range in the first mapping record.

Step 1012, the content unit in the first mapping record is taken as the first content unit.

For example, to determine the first content unit corresponding to the first playing progress, a mapping relationship between the audio and the text may be obtained first, where a plurality of mapping records are stored in the mapping relationship, and each mapping record includes a corresponding relationship between the content unit and the playing progress range. Where a content unit may be understood as one or more sentences in the text, or may be one or more paragraphs. The content unit can be represented by a sentence mark or a paragraph mark. The playing progress range is used to indicate the start and stop time of the audio corresponding to the corresponding content unit, and may be, for example, a frame number range or a time range. For example, if a mapping record includes a playing progress range of 5s-20s and a corresponding sentence is identified as 0x57AD, the mapping record indicates that the audio is played to the 5s to 20s, and the sentence is identified as 0x57AD in the text. Or, another mapping record includes a playing progress range of 35 frames to 80 frames, and the corresponding paragraph is identified as 0106, so that the mapping record indicates that when the audio is played to the 35 th frame to the 80 th frame, the paragraph in the text is identified as 0106. Then, a first mapping record of which the corresponding play progress range includes the first play progress (i.e. the first play progress belongs to the play progress range in the first mapping record) is searched in the mapping relationship, and the content unit in the first mapping record is taken as the first content unit.

It should be noted that the mapping relationship between the audio and the text may be pre-established and stored in the server, and when the terminal device plays the audio, the mapping relationship may be obtained from the server. The mapping relationship may be obtained when the text is converted by using a TTS service to obtain an audio, or may be obtained according to a correspondence between sound content included in each audio frame in the audio and text content included in each content unit in the text.

Fig. 5 is a flowchart illustrating another audio and text synchronization method according to an exemplary embodiment, which may further include, as shown in fig. 5:

and 106, if a preset number of elements at the tail of the first content unit are displayed in the target area, determining a second playing progress, wherein the second playing progress is the playing progress ending in the playing progress range in the first mapping record.

And step 107, under the condition that the audio playing progress is detected to be the second playing progress, executing page turning operation to enable the text display interface to display the second content unit. The second content unit is a content unit which is adjacent to the first content unit and is positioned behind the first content unit in the text.

For example, if a preset number of last elements in the first content unit are displayed in the target area, that is, the last content unit in the text display interface is the first content unit, and the last element in the first content unit is the last element in the text display interface. Then, the page turning operation may be performed when the audio has finished playing the first content unit (i.e., the second progress of playing). Specifically, the next content unit of the first content unit and the second content unit may be searched in the text. Then, according to the first mapping record determined in step 1011, a play progress ending in the play progress range included in the first mapping record is determined as a second play progress. For example, the playing progress range included in the first mapping record is 100s-200s, and then the second playing progress is 200s, and a page turning operation may be performed when the audio is played to 200s, so that the text presentation interface presents the second content unit.

Fig. 6 is a flowchart illustrating another audio and text synchronization method according to an exemplary embodiment, and as shown in fig. 6, the implementation of step 104 may include:

step 1041, determining a playing duration corresponding to the playing progress range in the first mapping record, and a third playing progress, where the third playing progress is an initial playing progress in the playing progress range in the first mapping record.

And 1042, determining whether to execute a page turning operation according to the first playing progress, the third playing progress, the playing time length and the first ratio.

For example, the specific implementation manner of determining whether to execute the page turning operation may first determine the playing duration corresponding to the playing progress range in the first mapping record and the initial playing progress (i.e., the third playing progress) in the playing progress range in the first mapping record, and then determine whether to execute the page turning operation according to the first playing progress, the third playing progress, the playing duration and the first ratio. Specifically, step 1042 can be implemented by the following steps:

step 1) determining the difference between the first playing progress and the third playing progress, which accounts for a second proportion of the playing time length.

And 2) if the second proportion is larger than or equal to the first proportion, executing page turning operation.

And 3) if the second ratio is smaller than the first ratio, controlling the audio to continue playing until the playing progress of the audio is a fourth playing progress, and executing page turning operation, wherein the difference between the fourth playing progress and the third playing progress accounts for the ratio of the playing time length to be equal to the first ratio.

Illustratively, the first play schedule is T₁The third playing progress is T₃The playing time is T_sumThe number of elements in the first content unit shown in the target area is N₁The total number of all elements constituting the first content unit is N_sumThen the first ratio R₁＝N₁/N_sumThe second ratio is R₂＝(T₁-T₃)/T_sum. If R is₂Greater than or equal to R₁Indicating that the element to which the audio is played is determined according to the first playing progress, and the last element corresponding to the first content unit displayed in the target area, a page turning operation may be performed. If R is₂Less than R₁If the difference between the fourth playing progress and the third playing progress occupies the first proportion, that is, the element played by the audio is determined according to the fourth playing progress and corresponds to the last element displayed by the first content unit in the target area.

FIG. 7 is a flowchart illustrating another audio and text synchronization method according to an exemplary embodiment, where, as shown in FIG. 7, the page turning operation includes: a page turn operation in a lateral direction, or a page turn operation in a scrolling direction. Prior to step 102, the method may further comprise:

and step 108, if the page turning operation is a transverse page turning operation, taking the area where the last content unit in the text display interface is located as a target area. And if the page turning operation is a rolling page turning operation, determining a target area according to the visual center of the user.

It should be noted that the page turning operation in the above embodiment may include: a page turn operation in a lateral direction, or a page turn operation in a scrolling direction.

For example, the horizontal page turning operation may be understood as switching a current content to a next content to be displayed on the text display interface, and in the switching process, various preset page turning effects may be added, for example: page flip animation, overlay animation, pan animation, etc. The page turning animation can be understood as animation capable of showing the actual page turning effect of the paper, the covering animation can be understood as animation capable of showing the effect that the paper covers the other paper, and the translation animation can be understood as animation capable of showing the effect that the paper is removed from the other paper. The preset page turning effect may also be other animations, which are not specifically limited by this disclosure. The scrolling page turning operation may be understood as gradually scrolling the current content on the text display interface to the next content to be displayed, and during the scrolling process, the current content may be scrolled downwards for a certain time (e.g., 5s) at a preset speed (e.g., 3 lines/second), or the current content may be scrolled downwards to a specified position at a preset speed.

If the page turning operation is a horizontal page turning operation, the area where the last content unit in the text display interface is located can be used as the target area. That is, when the playing progress of the audio corresponds to the last element in the text presentation interface, the content to be presented next is switched. If the page turning operation is a scrolling page turning operation, the target region may be determined according to a pre-specified region, and the target region may be a middle region, an upper region, a lower region, and the like of the text presentation interface. Further, the target area may also be determined based on the visual center of the user. For example, the user's sight is usually focused on the middle area of the text presentation interface (which may be understood as the visual center of the user), so that the middle area can be used as the target area, and thus, when the audio playing progress corresponds to the last element in the middle area, the current content is gradually scrolled, so that the target area presents the content unit to be presented next (which may still be the first content unit or the second content unit).

It should be noted that, while the first content unit is displayed on the text display interface, in order to enable the user to visually view the first content unit corresponding to the first playing progress of the audio at the current time in the text, the first content unit may also be labeled according to a preset display mode. The preset display mode includes: at least one of highlight display, underline display, and bold display. For example, in the content displayed in the text display interface, the sentence that the first content unit is "this is a sample text" may label the first content unit according to highlighting, and the effect is as shown in fig. 8.

Fig. 9 is a block diagram illustrating an apparatus for synchronizing audio and text according to an exemplary embodiment, and as shown in fig. 9, the apparatus 200 includes:

the progress determining module 201 is configured to determine, according to a first playing progress of the audio, a first content unit corresponding to the first playing progress in the file. Wherein the audio and the text have an association relationship.

A display determining module 202, configured to determine whether a preset number of elements at the end of the first content unit are displayed in a target area of the text display interface.

The proportion determining module 203 is configured to determine a first proportion between an element, which is shown in the target area, in the first content unit and an element included in the first content unit if a preset number of elements at the end of the first content unit are not shown in the target area.

The control module 204 is configured to determine whether to execute a page turning operation according to the first playing progress and the first ratio.

In one application scenario, the control module 204 is further configured to:

before determining whether the last preset number of elements in the first content unit are displayed in the target area of the text display interface, determining whether the elements in the first content unit are displayed in the text display interface.

Accordingly, the presentation determination module 202 is configured to:

Fig. 10 is a block diagram illustrating another audio and text synchronization apparatus according to an exemplary embodiment, and as shown in fig. 10, the progress determination module 201 includes:

the first determining sub-module 2011 is configured to determine, in a preset mapping relationship, a first mapping record matched with the first play progress, where the first mapping relationship includes at least one mapping record, the mapping record includes a corresponding relationship between a content unit and an audio play progress range, and the first play progress belongs to the play progress range in the first mapping record.

The second determining sub-module 2012 is used for determining the content unit in the first mapping record as the first content unit.

In another application scenario, the control module 204 is further configured to:

and if a preset number of elements at the tail of the first content unit are displayed in the target area, determining a second playing progress, wherein the second playing progress is the playing progress ending in the playing progress range in the first mapping record.

And under the condition that the audio playing progress is detected to be the second playing progress, executing page turning operation to enable the text display interface to display the second content unit. The second content unit is a content unit which is adjacent to the first content unit and is positioned behind the first content unit in the text.

Fig. 11 is a block diagram illustrating another audio and text synchronization apparatus according to an exemplary embodiment, and as shown in fig. 11, the control module 204 may include:

the third determining submodule 2041 is configured to determine a playing duration corresponding to the playing progress range in the first mapping record, and a third playing progress, where the third playing progress is an initial playing progress in the playing progress range in the first mapping record.

The control submodule 2042 is configured to determine whether to execute a page turning operation according to the first playing progress, the third playing progress, the playing time length, and the first ratio.

In one application scenario, the control sub-module 2042 may be configured to perform the following steps:

And step 2) if the second proportion is larger than or equal to the first proportion, executing page turning operation.

Fig. 12 is a block diagram illustrating another audio and text synchronization apparatus according to an exemplary embodiment, where, as shown in fig. 12, a page turning operation includes: a page turn operation in a lateral direction, or a page turn operation in a scrolling direction. The device also includes:

the area determining module 205 is configured to, before determining whether a preset number of elements at the end of the first content unit are displayed in a target area of the text display interface, take an area where a last content unit in the text display interface is located as the target area if the page turning operation is a horizontal page turning operation. And if the page turning operation is a rolling page turning operation, determining a target area according to the visual center of the user.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Referring now to fig. 13, a schematic structural diagram of an electronic device (e.g., the execution body of the above-described audio and text synchronization method) 300 suitable for implementing an embodiment of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 13 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 13, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 13 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the terminal devices, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a first content unit corresponding to a first playing progress in a text according to the first playing progress of the audio; wherein the audio has an association relationship with the text; determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface; if the last preset number of elements in the first content unit are not displayed in the target area, determining a first proportion of the elements displayed in the target area in the first content unit to the elements included in the first content unit; and determining whether to execute page turning operation according to the first playing progress and the first proportion.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation on the module itself, for example, the progress determination module may also be described as a "module for acquiring a first content unit corresponding to a first playing progress".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides a method of synchronizing audio and text, according to one or more embodiments of the present disclosure, including: determining a first content unit corresponding to a first playing progress in a text according to the first playing progress of the audio; wherein the audio has an association relationship with the text; determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface; if the last preset number of elements in the first content unit are not displayed in the target area, determining a first proportion of the elements displayed in the target area in the first content unit to the elements included in the first content unit; and determining whether to execute page turning operation according to the first playing progress and the first proportion.

Example 2 provides the method of example 1, before the determining whether the last preset number of elements in the first content unit are presented within the target area of the text presentation interface, the method further comprising: determining whether an element in the first content unit is shown within the text display interface; the determining whether a preset number of elements at the end of the first content unit are displayed in a target area of a text display interface includes: if any element in the first content unit is displayed in the text display interface, determining whether a preset number of elements at the tail end in the first content unit are displayed in the target area.

Example 3 provides the method of example 1, the determining, from a first playback progress of audio, that the first playback progress corresponds to a first content unit that corresponds in text, including: determining a first mapping record matched with the first playing progress in a preset mapping relation, wherein the first mapping relation comprises at least one mapping record, the mapping record comprises a corresponding relation between a content unit and the audio playing progress range, and the first playing progress belongs to the playing progress range in the first mapping record; and taking the content unit in the first mapping record as the first content unit.

Example 4 provides the method of example 3, further comprising, in accordance with one or more embodiments of the present disclosure: if a preset number of elements at the tail of the first content unit are displayed in the target area, determining a second playing progress, wherein the second playing progress is the playing progress finished in the playing progress range included in the first mapping record; under the condition that the audio playing progress is detected, page turning operation is executed so that a second content unit is displayed on the text display interface; the second content unit is a content unit which is adjacent to the first content unit and is positioned behind the first content unit in the text.

Example 5 provides the method of example 3, wherein determining whether to perform a page turn operation according to the first play progress and the first ratio includes: determining a playing duration corresponding to the playing progress range in the first mapping record and a third playing progress, wherein the third playing progress is an initial playing progress in the playing progress range in the first mapping record; and determining whether to execute page turning operation according to the first playing progress, the third playing progress, the playing time and the first proportion.

Example 6 provides the method of example 5, wherein determining whether to perform a page turning operation according to the first playing progress, the third playing progress, the playing time length, and the first ratio includes: determining a difference between the first playing progress and the third playing progress, which accounts for a second proportion of the playing time; if the second proportion is larger than or equal to the first proportion, page turning operation is executed; and if the second proportion is smaller than the first proportion, controlling the audio to continue playing until the playing progress of the audio is a fourth playing progress, and executing page turning operation, wherein the difference between the fourth playing progress and the third playing progress accounts for the proportion of the playing time length, and is equal to the first proportion.

Example 7 provides the methods of examples 1-6, the page turning operation including: a transverse page turning operation or a rolling page turning operation; before the determining whether a last preset number of elements in the first content unit are displayed in a target area of a text display interface, the method further includes: if the page turning operation is a transverse page turning operation, taking the area where the last content unit in the text display interface is located as the target area; and if the page turning operation is a rolling page turning operation, determining the target area according to the visual center of the user.

Example 8 provides an apparatus for audio and text synchronization, according to one or more embodiments of the present disclosure, comprising: the progress determining module is used for determining a first content unit corresponding to a first playing progress in the file according to the first playing progress of the audio; wherein the audio has an association relationship with the text; the display determining module is used for determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface or not; a ratio determining module, configured to determine a first ratio between an element, which is shown in the target area, in the first content unit and an element included in the first content unit if a preset number of elements at the end of the first content unit are not shown in the target area; and the control module is used for determining whether to execute page turning operation according to the first playing progress and the first proportion.

Example 9 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the methods of examples 1-7, in accordance with one or more embodiments of the present disclosure.

Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the methods of examples 1-7.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A method for synchronizing audio and text, the method comprising:

if a preset number of elements at the tail of the first content unit are displayed in the target area, indicating that the first content unit does not span pages;

if a preset number of elements at the end of the first content unit are not displayed in the target area, determining a first ratio of the elements displayed in the target area in the first content unit to the elements included in the first content unit;

2. The method of claim 1, wherein prior to the determining whether the last predetermined number of elements in the first content unit are presented in the target area of the text presentation interface, the method further comprises:

determining whether an element in the first content unit is displayed within the text display interface;

the determining whether a preset number of elements at the end of the first content unit are displayed in a target area of a text display interface includes:

if any element in the first content unit is displayed in the text display interface, determining whether a preset number of elements at the tail end in the first content unit are displayed in the target area.

3. The method of claim 1, wherein determining that the first playback progress corresponds to a corresponding first content unit in the text according to the first playback progress of the audio comprises:

determining a first mapping record matched with the first playing progress in a preset mapping relation, wherein the mapping relation comprises at least one mapping record, the mapping record comprises a corresponding relation between a content unit and the audio playing progress range, and the first playing progress belongs to the playing progress range in the first mapping record;

and taking the content unit in the first mapping record as the first content unit.

4. The method of claim 3, further comprising:

if a preset number of elements at the tail of the first content unit are displayed in the target area, determining a second playing progress, wherein the second playing progress is the playing progress finished in the playing progress range in the first mapping record;

under the condition that the audio playing progress is detected, page turning operation is executed so that a second content unit is displayed on the text display interface; the second content unit is a content unit which is adjacent to the first content unit and is positioned behind the first content unit in the text.

5. The method according to claim 3, wherein the determining whether to perform a page turning operation according to the first playing progress and the first ratio comprises:

determining a playing time length corresponding to the playing progress range in the first mapping record and a third playing progress, wherein the third playing progress is an initial playing progress in the playing progress range in the first mapping record;

and determining whether to execute page turning operation according to the first playing progress, the third playing progress, the playing time and the first proportion.

6. The method according to claim 5, wherein the determining whether to perform a page turning operation according to the first playing progress, the third playing progress, the playing time length and the first ratio comprises:

determining the difference between the first playing progress and the third playing progress, which accounts for a second proportion of the playing time length;

if the second proportion is larger than or equal to the first proportion, page turning operation is executed;

and if the second proportion is smaller than the first proportion, controlling the audio to continue playing until the playing progress of the audio is a fourth playing progress, and executing page turning operation, wherein the difference between the fourth playing progress and the third playing progress accounts for the proportion of the playing time length, and is equal to the first proportion.

7. The method according to any one of claims 1-6, wherein the page turning operation comprises: a transverse page turning operation or a rolling page turning operation; before the determining whether a last preset number of elements in the first content unit are displayed in a target area of a text display interface, the method further includes:

if the page turning operation is a transverse page turning operation, taking an area where the last content unit in the text display interface is located as the target area;

and if the page turning operation is a rolling page turning operation, determining the target area according to the visual center of the user.

8. An apparatus for audio and text based synchronization, the apparatus comprising:

the progress determining module is used for determining a first content unit corresponding to a first playing progress in the file according to the first playing progress of the audio; wherein the audio has an association relationship with the text;

the display determining module is used for determining whether a preset number of elements at the tail end in the first content unit are displayed in a target area of a text display interface or not; if a preset number of elements at the tail of the first content unit are displayed in the target area, indicating that the first content unit does not span pages;

9. A computer-readable medium, on which a computer program is stored which, when being executed by a processing means, carries out the steps of the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.