CN114501106A

CN114501106A - Manuscript display control method and device, electronic equipment and storage medium

Info

Publication number: CN114501106A
Application number: CN202210119024.3A
Authority: CN
Inventors: 刘同和; 易锌波; 韦添元; 詹德超; 罗焱
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2022-05-13
Also published as: CN111970257B; CN111970257A

Abstract

The application relates to the technical field of artificial intelligence, in particular to a manuscript display control method, a manuscript display control device, electronic equipment and a storage medium, which are used for providing a manuscript display method of audio program contents, so that a user can review corresponding texts while listening to audio, wherein the method comprises the following steps: and responding to the playing operation of the playing target audio program content, displaying a playing control page and playing the audio content of the target audio program content, wherein the playing control page comprises a playing control area and a manuscript display area of the audio content, and according to the playing progress of the target audio program content, displaying the manuscript content corresponding to the current playing progress in a rolling mode by taking sentences as units in the manuscript display area. According to the method and the device, the manuscript content corresponding to the audio content is obtained through voice recognition, and the manuscript is displayed in the manuscript display area of the playing control page, so that a user can directly and synchronously view the corresponding manuscript in the playing control page while listening to the audio program.

Description

Manuscript display control method and device, electronic equipment and storage medium

The application is a divisional application, the application number of the original application is 202010774616.X, the date of the original application is 04.08.2020, and the name of the original application is 'a manuscript display control method, device, electronic device and storage medium', and the entire contents of the original application are incorporated by reference in the application.

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for controlling document display, an electronic device, and a storage medium.

Background

Various social software has also come into play against the rapid development of internet technology. In which audio program content sharing platforms are also getting more and more attention and enjoyed by more people. The podcast platform is a very common audio program content sharing platform, and many network friends like to record and share audio programs through the podcast platform, or listen to audio programs shared by other users, including audios, commentaries, voices, talk shows and the like.

However, in various audio program content sharing platforms in the related art, the storage cost of the audio program content information is high, so that when the audio program content is played, only corresponding audio data is played for a user to listen to, when the user has a place where the user cannot listen clearly, the user needs to listen to and confirm repeatedly, however, when the user listens repeatedly, the user needs to continuously adjust the progress bar, which is too cumbersome, and the listening efficiency is low.

Disclosure of Invention

The embodiment of the application provides a manuscript display control method and device, electronic equipment and a storage medium, and is used for providing a manuscript display method of audio program contents, so that a user can browse corresponding manuscript contents while listening to audio, and the listening efficiency of the audio program contents is improved.

A first method for controlling presentation display of audio program content provided in an embodiment of the present application includes:

responding to the playing operation of playing the target audio program content, displaying a playing control page and playing the audio content of the target audio program content, wherein the playing control page comprises a playing control area and a manuscript display area of the audio content, and the playing control area is used for controlling the playing of the audio program content; and

and according to the playing progress of the target audio program content, scroll-displaying the manuscript content corresponding to the current playing progress in the manuscript display area by taking sentences as units, wherein the manuscript content is obtained by performing voice recognition on the target audio program content.

Optionally, the playing control area includes a first selection control for confirming playing of audio and presentation of a manuscript, and the audio content of the target audio program content is played, which specifically includes:

and responding to the triggering operation of the first selection control, and playing the audio content of the target audio program content.

Optionally, the play control area includes a second selection control for confirming that audio is not played but a document is displayed, and the method further includes:

and responding to the triggering operation of the second selection control, displaying the manuscript content in the manuscript display area and prohibiting playing the audio content.

Optionally, the play control page further includes a video play area, and the target audio program content further includes video content corresponding to the audio content; the method further comprises the following steps:

and playing the video content in the video playing area.

A second method for controlling manuscript display of audio program content provided in an embodiment of the present application includes:

performing voice recognition on target audio program content to obtain manuscript content corresponding to the audio content in the target audio program content, and adding a corresponding timestamp to the manuscript content according to the audio content;

after receiving a playing request for the target audio program content sent by a client, sending the manuscript content corresponding to the audio content to the client, so that the client plays the audio content, and according to the playing progress of the target audio program content, displaying the manuscript content corresponding to the current playing progress in a manuscript display area of a playing control page in a rolling manner by taking sentences as units, wherein the playing request is sent after the client responds to a playing operation of playing the target audio program content, the playing control page further comprises a playing control area of the audio content, and the playing control area is used for controlling the playing of the audio program content.

Optionally, the performing voice recognition from the target audio program content to obtain the document content corresponding to the audio content in the target audio program content specifically includes:

dividing a text obtained by speech recognition in the target audio program content into sentences based on punctuation marks;

and dividing paragraphs according to the playing interval between sentences, wherein every two adjacent paragraphs are in the corresponding audio content, the playing interval between the last sentence in the previous paragraph and the first sentence in the next paragraph is greater than a preset time threshold, and the playing interval between every two adjacent sentences in the same paragraph is not greater than the preset time threshold.

A first apparatus for controlling document display of audio program content provided in an embodiment of the present application includes:

the first response unit is used for responding to the playing operation of playing the target audio program content, displaying a playing control page and playing the audio content of the target audio program content, wherein the playing control page comprises a playing control area and a manuscript display area of the audio content, and the playing control area is used for controlling the playing of the audio program content; and

Optionally, the first response unit is specifically configured to:

the manuscript content comprises sentences divided in the voice recognition process and punctuation marks added among the sentences, wherein the sentences are obtained by dividing texts obtained by voice recognition in the target audio program content on the basis of the punctuation marks.

Optionally, the play control area further includes a play pause control; the device further comprises:

the second response unit is used for responding to the triggering operation of the play pause control and pausing the playing of the target audio program content; and

and pausing and scrolling to display the manuscript content part corresponding to the current playing progress in the manuscript display area.

Optionally, the document display area further includes a document display control, and the apparatus further includes:

a third response unit, configured to display, in response to a trigger operation of the document display control, a document display page, where the document display page displays at least one line of document content part corresponding to a current playing progress, and at least one of front M lines of content adjacent to the at least one line of document content part and rear N lines of content adjacent to the at least one line of document content part, where M and N are positive integers; or

And responding to the triggering operation of the manuscript display control, and displaying at least one line of manuscript content part corresponding to the current playing progress and at least one of the front M lines of content adjacent to the at least one line of manuscript content part and the back N lines of content adjacent to the at least one line of manuscript content part in a manuscript overview area of the playing control page.

Optionally, when the document content includes at least two paragraphs, each two adjacent paragraphs are in the corresponding audio content, a playing interval between a last sentence in a previous paragraph and a first sentence in a next paragraph is greater than a preset time threshold, and a playing interval between each two adjacent sentences in the same paragraph is not greater than the preset time threshold.

Optionally, the third response unit is specifically configured to:

displaying, in the document presentation page or the document overview area, a target paragraph including the at least one line of document content section in a first display mode, and displaying contents of paragraphs other than the target paragraph in a second display mode.

Optionally, the playing control page includes a playing progress control for adjusting the playing progress of the audio content, and the apparatus further includes:

a fourth response unit, configured to, in response to a document dragging operation triggered for the document presentation page or the document overview region, drag, according to the document dragging operation, document content presented in the document presentation page;

and updating the playing progress of the playing progress control, adjusting the playing progress of the target audio program content to a time node corresponding to the manuscript content, and starting to play the target audio program content from the time node.

Optionally, the apparatus further comprises:

a fifth response unit, configured to update the playing progress of the target audio program content in response to an operation on the playing progress control, and start playing the target audio program content from a time node corresponding to the updated playing progress;

and skipping the manuscript content displayed in the manuscript display page or the manuscript overview area to the manuscript content corresponding to the target audio program content started to be played at the time node.

Optionally, a switching control is displayed at a designated position of the document presentation page or the document overview region, and the switching control is used for controlling switching of document contents; the device further comprises:

a sixth response unit, configured to display, in response to a switching operation triggered by the switching control, a manuscript content corresponding to a next audio program content of the target audio program content in the target play queue in the manuscript display area or the manuscript overview area; and

and switching the currently played target audio program content in the playing control page to the next audio program content of the target audio program content, and playing.

Optionally, the play control area includes a first selection control for confirming play of audio and presenting a document, and the first response unit is specifically configured to:

Optionally, the play control area includes a second selection control for confirming that audio is not played but a document is displayed, and the first response unit is further configured to:

Optionally, the play control page further includes a video play area, and the target audio program content further includes video content corresponding to the audio content; the device further comprises:

and the playing unit is used for playing the video content in the video playing area.

A second apparatus for controlling document display of audio program content provided in an embodiment of the present application includes:

the voice transcription unit is used for carrying out voice recognition on target audio program contents to obtain manuscript contents corresponding to the audio contents in the target audio program contents, and adding corresponding timestamps to the manuscript contents according to the audio contents;

the transmission unit is configured to send, after receiving a play request for the target audio program content sent by a client, the manuscript content corresponding to the audio content to the client, so that the client plays the audio content, and scroll-display, in a manuscript display area of a play control page, the manuscript content corresponding to a current play progress in a sentence unit according to a play progress of the target audio program content, where the play request is sent after the client responds to a play operation of playing the target audio program content, the play control page further includes a play control area of the audio content, and the play control area is used to control playing of the audio program content.

Optionally, the voice transcription unit is specifically configured to:

An electronic device provided by an embodiment of the present application includes a processor and a memory, where the memory stores a program code, and when the program code is executed by the processor, the processor is caused to execute the steps of any one of the above-mentioned methods for controlling the presentation display of audio program content.

An embodiment of the present application provides a computer-readable storage medium, which includes program code for causing an electronic device to execute the steps of any one of the above-mentioned method for controlling manuscript display of audio program content when the program code runs on the electronic device.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the steps of the manuscript display control method of any one of the audio program contents.

The beneficial effect of this application is as follows:

the embodiment of the application provides a manuscript display control method, a device, an electronic device and a storage medium of audio program content, because the embodiment of the application applies a voice-to-text technology to the audio program content, a play control page of the audio program content comprises a manuscript display area which is used for displaying the manuscript content corresponding to the current playing progress in a rolling mode by taking sentences as units, the user can synchronously view corresponding texts when playing the audio program content, therefore, when the user has a place which can not be clearly heard, the user can directly browse the corresponding manuscript content without repeatedly listening to confirm, the method is more convenient and time-saving, and the listening efficiency of the user is higher.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is an alternative diagram of a play control page in the related art;

fig. 2 is an alternative schematic diagram of an application scenario in an embodiment of the present application;

fig. 3 is a schematic flowchart of a first method for controlling document display of audio program content according to an embodiment of the present application;

fig. 4A is a schematic diagram of a first play control page in the embodiment of the present application;

fig. 4B is a schematic diagram of a second play control page in the embodiment of the present application;

fig. 4C is a schematic diagram of a play control area in the embodiment of the present application;

fig. 4D is a schematic diagram of a third playback control page in the embodiment of the present application;

fig. 5A is a schematic diagram of a first document presentation page in an embodiment of the present application;

fig. 5B is a schematic diagram of a first type of document overview area in the embodiment of the present application;

fig. 5C is a schematic diagram of a second type of document overview area in the embodiment of the present application;

fig. 6A is a schematic diagram of a second document presentation page in the embodiment of the present application;

fig. 6B is a schematic diagram of a third document presentation page in the embodiment of the present application;

fig. 6C is a schematic diagram of a fourth document presentation page in the embodiment of the present application;

fig. 7 is a schematic diagram of a manuscript display method of audio program content according to an embodiment of the present application;

fig. 8 is a flowchart illustrating a second method for controlling document display of audio program content according to an embodiment of the present application;

FIG. 9 is a flowchart of a complete method for controlling playback of audio program content according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a first audio program content playing control apparatus in an embodiment of the present application;

fig. 11 is a schematic structural diagram of a second audio program content playing control apparatus in an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device in an embodiment of the present application;

fig. 13 is a schematic diagram of a hardware component structure of a terminal device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

Some concepts related to the embodiments of the present application are described below.

The audio program content is: the audio program content in the embodiment of the present application refers to an audio program shared on an instant messaging software or a podcast platform, such as a talking novel, a meeting, a comment, a talk show, a radio station, and the like. The audio novel refers to a general audio program content file. The playing speed can be adjusted when the player plays, and the playing stop time can be automatically remembered so as to facilitate reading. The audio program content in the embodiment of the present application may refer to audio content containing audio data (which refers to text obtained through speech recognition and is not pure music), or may refer to audio and video content containing corresponding video data, for example, the audio program content is an audio file about a book recorded by a book reader.

Audio program content sharing platform and podcasts: the audio program content sharing platform is one of digital broadcasting technologies, can be used for recording audio program contents of network broadcasting or similar network audio programs, and can download the online broadcasting programs to own players for personal listening without sitting in front of a computer or listening in real time, so that the users can enjoy the freedom of anytime and anywhere. In addition, the user can also make audio programs by himself and upload the audio programs to the internet through the podcast platform to share the audio programs with vast internet friends. Can be understood as a client that plays audio program content, video. The audio program content sharing platform is applied to many applications, such as podcasting.

An application operation page: the medium is used for interaction and information exchange between an application system and a user, realizes conversion between an internal form of information and a human-acceptable form, and aims to enable the user to conveniently and efficiently operate an application to achieve bidirectional interaction and complete the work expected to be completed by the application. In the embodiment of the application, the application operation page includes a human-computer interaction and graphical user interface, and the specific application operation page includes a play control page, a document display page, and the like. The different application operation pages are used for displaying different contents to the user, and different information interaction between the user and the application is realized.

The playing control page of the audio program content sharing platform is as follows: the number of the playing control pages set on the audio program content sharing platform is one or more according to needs, and the playing control pages skip according to set logic. In the embodiment of the present application, the play control page mainly refers to a page for controlling the play of audio program content, and includes a play control area and a document display area, where the play control area is mainly used for controlling the play of the audio program content, including the control of play speed, play progress, and the like; the manuscript display area is mainly used for displaying manuscript contents corresponding to audio contents in currently played audio program contents. In addition, the play control page may further include a video play area, a document overview area, and the like.

A document presentation page: and the page added on the basis of the playing control page is user-oriented and is used for displaying the manuscript content corresponding to the audio playing content. In the document presentation page, in addition to presenting at least one line of document content portion corresponding to the current playing progress, at least one of the first M lines of content adjacent to the at least one line of document content portion and the last N lines of content adjacent to the at least one line of document content portion may be presented, so that the user can overview the document content to quickly retrieve and locate a specific segment.

Client (Client) or called Client: refers to a program that corresponds to a server and provides local services to clients. Except for some application programs which only run locally, the application programs are generally installed on common clients and need to be operated together with a server. After the internet has developed, the more common clients include web browsers used on the world wide web, email clients for receiving and sending emails, and client software for instant messaging. For this kind of application, a corresponding server and a corresponding service program are required in the network to provide corresponding services, such as database services, e-mail services, etc., so that a specific communication connection needs to be established between the client and the server to ensure the normal operation of the application program.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to artificial intelligence voice recognition and machine learning technology. Among the key technologies of Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In the embodiment of the application, the voice transcription of the audio program content can be realized through the voice recognition and machine learning technology, and the audio program content data is converted into the text data. The voice transcription can be realized based on a machine learning model, such as a deep full-sequence convolutional neural network, and the network can convert long-segment audio program content (within 5 hours) data into text data.

The following briefly introduces the design concept of the embodiments of the present application:

with the rapid development of science and technology, audio programs of the current generation increasingly depend on networks and intelligent electronic equipment, and podcast platforms also come into play, so that users can make audio programs by themselves and upload the audio programs to the internet to share the audio programs with vast internet friends.

Fig. 1 is a schematic diagram of a play control page in the related art; the page is a play control page of a certain podcast platform, and the podcast platform can realize listening to audio programs, such as radio stations, talking novels, commentary and the like. The page shown in fig. 1 shows that the currently playing audio program is a voiced novel, the current playing progress is 1 minute 23 seconds, the total audio duration is 20 minutes 32 seconds, the picture above the playing progress bar in fig. 1 is the front cover of the novel, and the brief introduction and details of the beginning of the novel, the details of the playing times and other information are below the playing progress bar.

However, at present, each podcast platform does not provide a text draft for an audio program, and cannot play the text draft and the audio program content synchronously, so that a user cannot review the program content when listening to programs such as a lecture, and cannot click specific paragraphs or sentences. The user cannot quickly read the music in a place with unclear hearing, and the user cannot repeatedly listen to the music for confirmation. In addition, since the position of a specific segment cannot be located quickly, when a user wants to search for a specific segment, the user needs to continuously play back the specific segment, so that the user is excessively complicated to search for audio content, consumes much time, and is greatly limited in use.

In view of this, embodiments of the present application provide a method and an apparatus for controlling document display of audio program content, an electronic device, and a storage medium. In the embodiment of the application, the voice-to-text technology is applied to the audio program content, and the play control page of the audio program content includes a document display area for displaying the document content corresponding to the current play progress in a rolling manner by taking sentences as units, so that a user can synchronously view the corresponding text while playing the audio program content, and can also click and select a specific paragraph or sentence for playing, thereby quickly positioning the position of the specific segment, improving the use experience of the user, effectively improving the search efficiency of the user on the audio content, and reducing the search cost.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Fig. 2 is a schematic view of an application scenario according to an embodiment of the present application. The application scenario diagram includes two terminal devices 210 and a server 230, and the terminal devices 210 can log in the related application operation page 220. The terminal device 210 and the server 230 can communicate with each other through a communication network.

In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 210 and the server 230 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In this embodiment, the terminal device 210 is an electronic device used by a user, and the electronic device may be a computer device having a certain computing capability and running instant messaging software and a website or social contact software and a website, such as a personal computer, a mobile phone, a tablet computer, a notebook, an e-book reader, and the like. Each terminal device 210 is connected to the server 230 through a wireless Network, and the server 230 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform.

In the embodiment of the present application, a user may log in the application operation page 220 related to the client through the terminal device 210, and the terminal device 210 may respond to various operations triggered by the user in the application operation page 220. The application operation page 220 may be a play control page, a document presentation page, or the like. For example, when the user a performs a playing operation for the target audio program content, the terminal device 210 displays a playing control page and plays the audio content of the target audio program content in response to the playing operation.

In this embodiment of the application, the client may be social software, such as instant messaging software and podcast software, and may also be an applet, a web page, and the like, which is not limited herein. The terminal device is required to be provided with a client, and the server is a server corresponding to the client of software or web pages, applets and the like.

The user can directly search and play audio program contents which are liked to be listened to through podcast software, can listen to audio program contents shared by friends in software such as instant messaging and the like, or can search or listen to the audio program contents in public numbers, applets and the like. It should be noted that the audio program content in the embodiment of the present application refers to audio or audio/video recorded by a user, for example, the user speaks each chapter in a certain novel and records a corresponding audio file, and then the user shares the recorded audio file to a podcast platform for people to listen to, that is, listen to the book. In the scenario, the audio program content refers to audio recorded by a user, and a user listening to the audio program content can view the manuscript content corresponding to the novel currently played in the play control page by using the method in the embodiment of the application, wherein the manuscript content is obtained by performing voice recognition on an audio file uploaded by the user by a client or a server. In addition, the language may be recorded by a user, such as a commentary, a comment, or the like, and is not particularly limited herein.

Referring to fig. 3, an implementation flow chart of a method for controlling manuscript display of audio program content according to an embodiment of the present application is shown, where the method is applied to a client, and a specific implementation flow is as follows:

s31: responding to the playing operation of playing the target audio program content, displaying a playing control page and playing the audio content of the target audio program content, wherein the playing control page comprises a playing control area and a manuscript display area of the audio content, and the playing control area is used for controlling the playing of the audio program content;

s32: and according to the playing progress of the target audio program content, scroll-displaying the manuscript content corresponding to the current playing progress in the manuscript display area by taking sentences as units, wherein the manuscript content is obtained by performing voice recognition on the target audio program content.

In this embodiment of the present application, the process of voice recognition on the target audio program content may be real-time, that is, when the target audio program content is played, the voice recognition is performed on the audio content being played in real time, and the document content identified at the current time is displayed in the document display area.

In addition, the voice recognition process of the target audio program content can be obtained by pre-recognizing the background server before playing, the server can pre-recognize the whole document content in the mode, for example, a book is taken as an example, the server can pre-recognize all document contents corresponding to the whole book spoken by a speaker and store the document contents in the background, and the server can directly issue the document contents to the client when the client requests.

After identifying the obtained manuscript content, the server can perform sentence segmentation and segmentation according to the following modes, and the specific processing process is as follows:

when the target audio program content is subjected to speech recognition to obtain a text, punctuation marks added among sentences can also be obtained through the speech recognition, so that sentence break is performed based on the punctuation marks, the recognized text is divided into one sentence, for convenience of display, at most 18 characters can be set in each sentence, and sentences larger than 18 characters in the sentences divided based on the punctuation marks are divided again.

Further, segmentation is performed according to the playing interval between sentences, and in the embodiment of the present application, a specific segmentation manner is as follows:

taking 50ms (millisecond) as a preset time threshold, if the playing interval between two adjacent sentences exceeds 50ms, the pause interval between the two sentences exceeds 50ms, and at this time, another section can be started. By combining the above segmentation methods, each two adjacent paragraphs are in the corresponding audio content, the playing interval between the last sentence in the previous paragraph and the first sentence in the next paragraph is greater than 50ms, and the playing interval between each two adjacent sentences in the same paragraph is not greater than 50 ms.

In the above embodiment, a text corresponding to the audio content in the target audio program content is identified through a speech recognition technology, and then based on the rule for dividing sentences and paragraphs given above, the text can be converted into a document content with corresponding timestamps, where the text in sentence unit also has a corresponding timestamp, and each paragraph also has a corresponding timestamp, etc., so as to ensure that the playing time of the document content corresponds to that of the audio content, and on this basis, the audio can be synchronously played and the document content can be displayed, so that the user can overview the program content through the document.

It should be noted that after the text is recognized, the words in which the recognition error occurs and the like can be corrected in the background.

In the following, the target audio program content is mainly exemplified as a voiced novel.

When the manuscript content corresponding to the audio content is displayed in the manuscript display area, if the total duration of the audio content is short, the corresponding manuscript content is relatively less, at the moment, all manuscripts corresponding to the audio content can be directly displayed in the manuscript display area, and sentences corresponding to the current playing progress are highlighted.

Optionally, the document content includes sentences divided in the speech recognition process and punctuation marks added between the sentences, and the document content can be displayed in a rolling manner in the document display area by using the sentences as units, and the specific implementation manner is as follows: and according to the playing progress of the target audio program content, displaying a manuscript content part corresponding to the current playing progress in a manuscript display area in a rolling mode by taking sentences as units, namely displaying one sentence in the manuscript display area every time, and displaying the sentences in the manuscript display area in a rolling mode along with the change of the playing progress.

Fig. 4A is a schematic diagram of an audio program content playing control page in the embodiment of the present application; two dashed boxes are marked in fig. 4A, where the dashed box S41 represents a play control area in the play control page, and the dashed box S42 represents a document presentation area, i.e., an AI voice to document part in fig. 4A. And displaying the sentences corresponding to the currently played audio content in the manuscript display area. The content part of the currently displayed manuscript in the manuscript display area is a sentence of 'four spring years of celebration', and the current playing time is 1 minute and 32 seconds; after 1 minute, that is, when the playing time is 2 minutes 32 seconds, the content part of the document corresponding to the playing progress at this time is "written with composition", and the displayed sentence in the document display area is "written with composition", as shown in fig. 4B. Switching from sentence to sentence may take the form of scrolling or the like.

Optionally, the play control area further includes at least one selection control. In the embodiment of the present application, the selection control is used for the user to confirm whether to play only the document content corresponding to the audio content or to synchronously play the document content and the audio content corresponding to the audio content. Referring to fig. 4C, an alternative schematic diagram of a play control area in the embodiment of the present application is shown, which includes a first selection control S1 for confirming that audio is played and a document is shown, and a second selection control S2 for confirming that audio is not played and a document is shown.

If the user clicks the first selection control, the user confirms that the audio is played and the manuscript is displayed, and at the moment, the corresponding manuscript content can be displayed in the manuscript display area while the audio is played.

Optionally, the user may further select to click the second selection control, and at this time, the client responds to the trigger operation of the second selection control, and displays the document content in the document display area and prohibits playing of the audio content.

It should be noted that, in the embodiment of the present application, the user may click the second selection control at any time, and if the user clicks the second selection control at the beginning, the audio content is prohibited from being played at the beginning of the audio program content; or, the user may also select the control during the playing process, at this time, after the user clicks, the audio content is prohibited from being played, but the corresponding document content is still displayed in the document display area.

In addition, the playing control area further includes a playing progress control for controlling the playing progress of the audio content, a button for switching the audio content, a pause playing control for controlling the playing of the audio content, and the like.

As shown in fig. 4C, wherein S3 is a pause playing control, if the user clicks pause through the pause playing control shown in fig. 4C, the client responds to the operation to pause the playing of the audio content, and synchronously pause and scroll the document content in the document display area. And if the target audio program content also comprises the video content corresponding to the audio content, synchronously pausing the playing of the video content in the video playing area. The progress bar shown in fig. 4C is the playing progress control S4, and the user can update the playing progress of the target audio program content by adjusting the progress bar. For example, before the user adjusts the progress bar, as shown in fig. 4A, the content displayed in the document display area at this time is "celebrating four years and spring"; when the user adjusts the progress bar as shown in fig. 4B, the playback starts from 2 minutes to 23 seconds, and the content displayed in the document display area is "written with composition".

Optionally, when the target audio program content further includes video content corresponding to the audio content, the play control page may further include a video play area at this time, and the video content is synchronously played in the video play area. Such as that shown in fig. 4D, where the video playback area is the portion shown by the dashed box S43.

It should be noted that, in the embodiment of the present application, audio content, video content, and document content corresponding to the audio content of a target audio program content are played synchronously, which is different from the related art.

Optionally, the document display area may further include a document display control in addition to displaying the sentence corresponding to the current playing progress, and after the user clicks the document display control, the client may respond to a triggering operation of the document display control to display at least one line of document content part corresponding to the current playing progress, and at least one of a front M line of content adjacent to the at least one line of document content part and an adjacent back N line of content, where M and N are positive integers, and M and N may be the same or different.

That is to say, after the client responds to the trigger operation of the document presentation control, more document contents can be presented, which is specifically divided into the following three cases:

firstly, displaying at least one line of manuscript content part corresponding to the current playing progress and the previous M lines of content adjacent to the at least one line of manuscript content part;

secondly, displaying at least one line of manuscript content part corresponding to the current playing progress and the next N lines of contents adjacent to the at least one line of manuscript content part;

and thirdly, displaying at least one line of manuscript content part corresponding to the current playing progress, the front M lines of contents adjacent to the at least one line of manuscript content part, and the rear N lines of contents adjacent to the at least one line of manuscript content part.

The method for displaying the content part of the document also includes multiple modes, specifically including displaying in a new page and displaying in a new area in a play control page, and the specific process includes:

the client responds to the triggering operation of the manuscript display control and displays a manuscript display page, the manuscript display page displays at least one line of manuscript content part corresponding to the current playing progress and at least one of front M lines of content adjacent to the at least one line of manuscript content part and back N lines of content adjacent to the at least one line of manuscript content part, wherein M and N are positive integers; or the client responds to the triggering operation of the manuscript display control, and displays at least one line of manuscript content part corresponding to the current playing progress in a manuscript overview area of the playing control page and at least one of the front M lines of content adjacent to the at least one line of manuscript content part and the back N lines of content adjacent to the at least one line of manuscript content part.

Referring to fig. 4A, in S420, which is the document display control listed in this embodiment of the application, after the user clicks the document display control, a new page, that is, a document display page, may be displayed, and a plurality of lines of document content portions are displayed in the document display page; multiple lines of the document content portion may also be displayed in the document overview area in the play control page. It should be noted that fig. 4A is only an example, and the user may also display the document presentation page or the document overview area by clicking any position in the document presentation area.

The document display page is a page that is newly proposed in the embodiment of the present application, and the document display page may be presented in a floating layer form, or may be presented in a new page, a popup window, or an expansion form on a play control page, and the like, which is not specifically limited herein. The user clicks the manuscript display control in the manuscript display area, can expand the floating layer, clicks the icon at the upper left corner of the floating layer again, can pack up the floating layer, and the floating layer can also be packed up in modes such as gesture right slip in addition. In the embodiment of the application, the playing of the target audio program content is not interrupted in the process of expanding and retracting the manuscript display page.

Optionally, if at least two sections of the document content are displayed in the document presentation page or the document summary area, a target paragraph including at least one line of the document content may be displayed in the first display mode, and the contents of paragraphs other than the target paragraph may be displayed in the second display mode. The first display mode may be a highlight mode, such as highlighting, bolding, underlining, etc., and the second display mode is different from the first display mode, such as normal display, and does not need bolding, highlighting, etc., so that the user can quickly locate the portion currently being played.

Fig. 5A is a schematic view of a document presentation page listed in the embodiment of the present application. The sentence currently being displayed in the document display area in the play control page is "celebrating four-year spring", after the user clicks the document display control in the document display area, the document display page shown in fig. 5A is displayed in a floating layer form on the play control page, the content of 4 paragraphs is currently displayed in the page, wherein the content part of one line of the document corresponding to the current play progress is "celebrating four-year spring", and the people in the vine of Beijing county. The more "there is thereafter also displayed the last 19 lines of content adjacent to the line of textual content portions. Wherein the highlighting includes "celebrating four years of spring, Teng Zi Beijing gun. The more "the paragraph in which the text content of the line is located, i.e. the target paragraph, the following three paragraphs adopt the normal display mode.

Fig. 5B is a schematic diagram of a document overview area in an embodiment of the present application, where a portion in a dashed line frame is the document overview area, and the area may be located above the play control area, in which case a cover of the target audio program content may not be displayed. As can be seen from fig. 5B, a row of document content portion containing the current playback progress is displayed in the current document overview area, and the row of document content portion is "celebrating four-year-spring, pedunculated, batons county". The more "6 lines total, the" celebrating four years spring, the sub-geniers' live on the county, can be highlighted. The more "the target paragraph is located, and the rest of the paragraphs are displayed in common.

In addition to the above-mentioned embodiments, the paragraph being played may not be highlighted in the document presentation page or the document overview area, and the first display mode and the second display mode are the same. In addition, the content of the document in the document presentation page or the document overview area may automatically scroll along with the playing progress, may also automatically page along with the playing progress, and the like.

In addition, considering that the target audio program content may also include video content corresponding to the audio content, in addition to the form illustrated in fig. 5B, the embodiment of the present application provides another schematic diagram of the document overview area, which is shown in fig. 5C and located below the play control area. Or, the document overview area may also be a video playing area, which is parallel to each other and located above the playing control area, and the like, and is not limited herein.

Optionally, the user may also manually scroll through the content of the document currently displayed in the document presentation page or the document overview area to view content before or after the currently displayed content portion of the document. The specific process is as follows: and the client responds to the manuscript dragging operation triggered by aiming at the manuscript display page or the manuscript overview area, and drags the manuscript content displayed in the manuscript display page according to the manuscript dragging operation. In addition, a jump control can be displayed at a specified position in the manuscript display page, and a user can click the jump control; or, without displaying the jump control, the user drags a certain line of content in the document presentation page or the document overview area to the reference line position, that is, jumps to the content or the content corresponding to the paragraph where the line of content is located, and starts playing. At this time, the playing progress of the playing progress control also needs to be updated synchronously, which is exemplified by the display skip control hereinafter, but the display skip control can also skip directly, and can be set according to the user experience.

Fig. 6A is a schematic diagram of another document presentation page in the embodiment of the present application, in which a user can slide up and down to view document contents in a dragging manner. When a user triggers a document dragging operation, a jump control is displayed at a specified position in a document display page, as shown by a dashed box S61 in fig. 6A, which is one of the jump controls listed in the embodiments of the present application. The user can realize the skipping of the playing content based on the skipping control so as to quickly and accurately position a specific segment.

When the user clicks the jump control, the client responds to the triggering operation of the jump control, updates the playing progress of the playing progress control, adjusts the playing progress of the target audio program content to a time node corresponding to the manuscript content of which the jump control is positioned on the same datum line, and starts playing the target audio program content from the time node. When the skip control is not displayed, the client can update the playing progress of the playing progress control and adjust the playing progress of the target audio program content after directly responding to the manuscript dragging operation of the user.

For example, as shown in FIG. 6A, the user will "morning glory in sunset. The content corresponding to the row of content in the Yueyang building is aligned with the reference line, and when the user clicks the jump control shown in S61, the playing progress bar in the playing control area can be adjusted to the time node of 02:53 to start playing. Note that the reference line shown in fig. 6A may not be directly displayed on the document presentation page.

Except for the jump to 'morning glory sunset' listed above, it is thousands of weather. Besides the content of the line of Yue Yang building begins to be played, the player can also jump to morning and sunset. The section where the line of content of Yueyang building is located starts playing, and in this way, the playing starts at the time node corresponding to the sentence "give the splendid character of Freybar".

In addition, the user can also return to the position corresponding to the current playing progress by clicking the "return to playing position" key shown in the dashed line box S62 in fig. 6A.

In the above embodiment, the user can search the audio content based on the document content, click a specific paragraph or sentence to play, and quickly read and locate a specific segment in the target audio program content, thereby realizing quick and accurate retrieval, and effectively improving the efficiency of searching the audio content by the user.

Optionally, in addition to the listed manner of adjusting the progress, the user may also adjust the progress by adjusting the play progress control, and the specific process is as follows: the client-side responds to the operation aiming at the playing progress control, updates the playing progress of the target audio program content, and starts to play the target audio program content from a time node corresponding to the updated playing progress; and skipping the manuscript content displayed in the manuscript display page or the manuscript overview area to the manuscript content corresponding to the target audio program content played at the time node.

For example, when the user adjusts the play progress bar from the situation shown in fig. 4A to the situation shown in fig. 4B, the client updates the play progress of the target audio program content to 2 minutes and 23 seconds, starts playing from that time, and also jumps to 2 minutes and 23 seconds to start playing the content of the document displayed on the document presentation page or the document overview area.

It should be noted that, in addition to the user may implement quick positioning of a specific segment by dragging, jumping, and the like, a switching control may be displayed at a specified position of the document presentation page or the document overview area. The files displayed in the file display page or the file overview area can also automatically scroll or page along with the playing progress, and when the next chapter is played, the files of the next chapter can be automatically switched and displayed. The switching control of "reading the next chapter" can also be displayed at the bottom (namely the designated position) of the document presentation page or the document overview area, the control is in the form of a button, the small characters on the second row of the button display the title of the next chapter, and the switching to the next chapter is directly performed after clicking. For example, a switching control "read next chapter" is displayed at the lowermost position in the page shown in fig. 6B, and the title "drunk family" of the next chapter is also displayed on the switching control.

When a user clicks the switching control, the client responds to the switching operation triggered by the switching control, and the document content corresponding to the next audio program content of the target audio program content in the target playing queue is displayed in the document display area or the document overview area; and switching the currently played target audio program content in the playing control page to the next audio program content of the target audio program content, and playing.

It should be noted that, when listening to a book, a novel generally has many chapters, the above-mentioned enumeration is taken as an example of directly jumping to the next chapter, and if the chapter is divided, the chapter can also be directly jumped to the next chapter for playing, or a switching control of "reading the next chapter" is displayed, and the specific implementation process is not repeated.

Optionally, in addition to the manner shown in fig. 6B, a button of a playlist corresponding to the voiced novel may be displayed at a specified position in the document presentation page or the document overview area, as shown in fig. 6C, after the user clicks the "playlist" button, the user may jump to the page shown in fig. 6C, and display a corresponding playlist in a floating layer manner, as shown in fig. 6C, the currently played book is a tang song and song word appreciation, the playlist is used to display the titles of 52 chapters included in the playlist, after the user clicks and selects the fifth chapter, the playlist jumps to the fifth chapter for playing, and simultaneously, the content in the document presentation page may be switched to the document content corresponding to the fifth chapter.

In addition, the user may also click on the close shown in the bottom of FIG. 6C, closing the playlist; and the document can be clicked and downloaded locally, and the document content is directly stored locally for reading and the like.

It should be noted that all listed in fig. 6A to 6C are document presentation pages as examples, and the same operation may be performed for the document overview region, and the specific implementation manner is the same, and details are not repeated here.

After the above embodiments are introduced, the technical solutions in the present application will be summarized. Fig. 7 is a schematic diagram illustrating a method for displaying a manuscript of audio program content according to an embodiment of the present application.

In the embodiment of the application, firstly, voice recognition needs to be performed on an audio file of audio program content through a voice transcription interface, then, text content is divided by taking sentences as units, the text obtained through voice recognition is divided into one sentence, and a timestamp corresponding to each sentence is recorded. Besides the time stamp corresponding to the recorded sentence, the time stamp corresponding to each word in the sentence can also be recorded.

Machine learning models such as a deep full-sequence convolutional neural network can be adopted during voice transcription, and long-section audio program content (within 5 hours) data can be converted into text data through the machine learning models.

In addition, the sentences can be segmented at intervals of 50ms, the time stamp corresponding to each paragraph is recorded, and then the paragraph content corresponding to the time stamp can be displayed according to the current audio playing progress in the document display page floating layer, and the content of the currently playing paragraph is highlighted along with the playing.

In the embodiment, the modes of segmenting according to time, scrolling and playing, highlighting the current position, guiding the skip by the skip control and the like are used, so that a user can flexibly switch between characters and voice, the voice position is accurately positioned through the characters, and better listening experience is achieved.

Referring to fig. 8, it is a flowchart illustrating an implementation of a second method for controlling document display of audio program content in an embodiment of the present application, where the method is applied to a server, and a specific implementation flow is as follows:

s81: performing voice recognition on the target audio program content to obtain the manuscript content corresponding to the audio content in the target audio program content, and adding a corresponding timestamp to the manuscript content according to the audio content;

s82: after receiving a playing request aiming at the target audio program content sent by a client, sending the manuscript content corresponding to the audio content to the client so as to enable the client to play the audio content, and according to the playing progress of the target audio program content, displaying the manuscript content corresponding to the current playing progress in a manuscript display area of a playing control page in a rolling mode by taking sentences as units, wherein the playing request is sent after the client responds to the playing operation of playing the target audio program content, the playing control page further comprises a playing control area of the audio content, and the playing control area is used for controlling the playing of the audio program content.

The client side can send a playing request to the server after responding to the playing operation of playing the target audio program content, wherein the request contains the identification information of the target audio program content and is used for uniquely identifying the target audio program content. After receiving the playing request, the server searches for the manuscript content corresponding to the identifier according to the identifier information in the playing request and sends the manuscript content to the client.

In addition, after the client requests the server for the manuscript content corresponding to the target audio program content for the first time, the client can also directly store the manuscript content to the local, and after that, if the client responds to the playing operation of playing the target audio program content again, the client does not need to send a playing request to the server, and directly searches for the corresponding manuscript content from the local for displaying.

Optionally, performing voice recognition on the target audio program content to obtain the document content corresponding to the audio content in the target audio program content, and specifically includes:

It should be noted that, when the server divides the text obtained by performing speech recognition from the target audio program content to obtain the document content corresponding to the audio content, the sentence is first divided, and then the paragraphs are divided according to the playing intervals between the sentences, and the specific process may refer to the above embodiment, and is not repeated here.

In the embodiment, the modes of time segmentation, rolling playing, current position highlight, playing key guide jumping and the like are used, so that a user can flexibly switch between characters and voice, the voice position is accurately positioned through the characters, and better listening experience is achieved.

Fig. 9 is an interaction timing chart of a method for controlling the presentation display of audio program content. The specific implementation flow of the method is as follows:

step S91: the server performs voice recognition on the target audio program content to obtain the manuscript content corresponding to the audio content in the target audio program content, and adds a corresponding timestamp to the manuscript content according to the audio content;

step S92: the client responds to the playing operation of playing the target audio program content and sends a playing request aiming at the target audio program content to the server;

step S93: the server sends the manuscript content corresponding to the target audio program content to the client;

step S94: the client displays the playing control page and plays the audio content of the target audio program content;

step S95: the client displays a play control page in a rolling way by taking sentences as units in the manuscript display area according to the play progress of the target audio program content, and plays the audio content of the target audio program content;

step S96: the client responds to the target operation executed aiming at the manuscript display control and displays a manuscript display page;

step S97: the client side responds to a manuscript dragging operation triggered aiming at the manuscript display page, the manuscript content displayed in the manuscript display page is dragged according to the manuscript dragging operation, and a jump control used for adjusting the audio playing progress is displayed at a specified position in the manuscript display page;

step S98: responding to the triggering operation of the jump control, adjusting the playing progress of the target audio program content to a time node corresponding to the manuscript content of which the jump control is positioned on the same datum line, and starting to play the target audio program content from the time node.

Based on the same inventive concept, the embodiment of the present application further provides a schematic diagram of a composition structure of a device for controlling the manuscript display of the audio program content; as shown in fig. 10, the schematic configuration of the apparatus 1000 for controlling document display of audio program content may include:

a first response unit 1001, configured to display a play control page and play the audio content of the target audio program content in response to a play operation for playing the target audio program content, where the play control page includes a play control area of the audio content and a document display area, and the play control area is used to control playing of the audio program content; and

Optionally, the content of the document includes sentences divided in the speech recognition process and punctuation marks added between the sentences.

Optionally, the play control area further includes a play pause control; the device still includes:

a second response unit 1002, configured to respond to a trigger operation of the play pause control, and pause playing of the target audio program content; and

a third response unit 1003, configured to display, in response to a trigger operation of the document presentation control, a document presentation page, where the document presentation page displays at least one line of document content part corresponding to a current playing progress, and at least one of a front M line of content adjacent to the at least one line of document content part and a rear N line of content adjacent to the at least one line of document content part, where M and N are positive integers; or

Optionally, when the document content includes at least two paragraphs, each two adjacent paragraphs are in the corresponding audio content, a playing interval between a last sentence in a preceding paragraph and a first sentence in a subsequent paragraph is greater than a preset time threshold, and a playing interval between each two adjacent sentences in the same paragraph is not greater than the preset time threshold.

Optionally, the third responding unit 1003 is specifically configured to:

in the document presentation page or the document overview area, a target paragraph including at least one line of a document content section is displayed in a first display mode, and contents of paragraphs other than the target paragraph are displayed in a second display mode.

a fourth responding unit 1004, configured to, in response to a document dragging operation triggered for the document presentation page or the document overview area, drag, according to the document dragging operation, document content presented in the document presentation page;

and updating the playing progress of the playing progress control, adjusting the playing progress of the target audio program content to a time node corresponding to the manuscript content of which the jumping control is positioned on the same reference line, and starting to play the target audio program content from the time node.

Optionally, the apparatus further comprises:

a fifth response unit 1005, configured to update the playing progress of the target audio program content in response to an operation on the playing progress control, and start playing the target audio program content from a time node corresponding to the updated playing progress;

and skipping the manuscript content displayed in the manuscript display page or the manuscript overview area to the manuscript content corresponding to the target audio program content played at the time node.

Optionally, a switching control is displayed at a designated position of the document presentation page or the document overview region, and the switching control is used for controlling switching of document contents; the device still includes:

a sixth responding unit 1006, configured to, in response to a switching operation triggered by the switching control, display, in the document display area or the document overview area, a document content corresponding to a next audio program content of the target audio program content in the target play queue; and

Optionally, the play control area includes a first selection control for confirming play audio and displaying the document, and the first response unit 1001 is specifically configured to:

Optionally, the play control area includes a second selection control for confirming that the audio is not played but the document is displayed, and the first response unit 1001 further includes:

Optionally, the playing control page further includes a video playing area, and the target audio program content further includes video content corresponding to the audio content; the device still includes:

a playing unit 1007 is used to play the video content in the video playing area.

Based on the same inventive concept, the embodiment of the present application further provides a schematic diagram of a composition structure of a device for controlling the manuscript display of the audio program content; as shown in fig. 11, the schematic diagram of the structure of the manuscript display control device 1100 of the audio program content may include:

the voice transcription unit 1101 is configured to perform voice recognition on the target audio program content to obtain a document content corresponding to the audio content in the target audio program content, and add a corresponding timestamp to the document content according to the audio content;

the transmission unit 1102 is configured to send, after receiving a play request for a target audio program content sent by a client, a document content corresponding to the audio content to the client, so that the client plays the audio content, and according to a play progress of the target audio program content, scroll-display, in a document display area of a play control page, the document content corresponding to a current play progress in a unit of sentence, where the play request is sent after the client responds to a play operation for playing the target audio program content, the play control page further includes a play control area of the audio content, and the play control area is used to control playing of the audio program content.

Optionally, the voice transcription unit 1101 is specifically configured to:

For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.

Having described the method and apparatus for controlling presentation of audio program content according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application is described next.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Fig. 12 is a block diagram illustrating an electronic device 1200 according to an example embodiment, the apparatus comprising:

a processor 1210;

a memory 1220 for storing instructions executable by the processor 1210;

wherein the processor 1210 is configured to execute the instructions to implement the manuscript display control method of the audio program content in the embodiment of the present application.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 1220 comprising instructions, executable by the processor 1210 of the electronic device 1200 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Based on the same inventive concept, the embodiment of the present application further provides a terminal device 210, where the terminal device 210 may be an electronic device such as a smart phone, a tablet computer, a laptop computer, or a PC. Referring to fig. 13, which is a block diagram of a terminal device 210 according to an exemplary embodiment, the terminal device 210 includes a display unit 1340, a processor 1380 and a memory 1320, where the display unit 1340 includes a display panel 1341 for displaying information input by a user or information provided to the user and various object selection pages of the terminal device 210, and in this embodiment, is mainly used for displaying pages of applications installed in the terminal device 210, shortcut windows, and the like. Alternatively, the Display panel 1341 may be configured in the form of an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).

The processor 1380 is used to read a computer program and then execute a method defined by the computer program, for example, the processor 1380 reads a social application program, thereby running an application on the terminal device 210 and displaying a page of the application on the display unit 1340. The Processor 1380 may include one or more general purpose processors and may also include one or more DSPs (Digital Signal processors) for performing the relevant operations to implement the solutions provided by the embodiments of the present application.

Memory 1320 typically includes both internal and external memory, which may be Random Access Memory (RAM), Read Only Memory (ROM), and CACHE memory (CACHE). The external memory can be a hard disk, an optical disk, a USB disk, a floppy disk or a tape drive. The memory 1320 is used for storing computer programs including application programs and the like corresponding to applications, and other data, which may include data generated after an operating system or application programs are executed, including system data (e.g., configuration parameters of the operating system) and user data. In the embodiment of the present application, program instructions are stored in the memory 1320, and the processor 1380 executes the program instructions stored in the memory 1320, to implement the above-discussed manuscript display control method of the audio program content, or to implement the above-discussed function of adapting an application.

In addition, the terminal device 210 may further include a display unit 1340 for receiving input numerical information, character information, or contact touch operation/non-contact gesture, and generating signal input related to user setting and function control of the terminal device 210, and the like. Specifically, in the embodiment of the present application, the display unit 1340 may include a display panel 1341. The display panel 1341, such as a touch screen, can collect touch operations of a user (e.g., operations of a player on the display panel 1341 or on the display panel 1341 using any suitable object or accessory such as a finger, a stylus, etc.) on or near the display panel 1341, and drive the corresponding connection device according to a preset program. Alternatively, the display panel 1341 may include two portions of a touch detection device and a touch controller. The touch detection device comprises a touch controller, a touch detection device and a touch control unit, wherein the touch detection device is used for detecting the touch direction of a user, detecting a signal brought by touch operation and transmitting the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, and sends the touch point coordinates to the processor 1380, where the touch controller can receive and execute commands sent by the processor 1380.

The display panel 1341 can be implemented by various types, such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the display unit 1340, the terminal device 210 may also include an input unit 1330, the input unit 1330 may include, but is not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like. In fig. 13, an example is given in which the input unit 1330 includes an image input device 1331 and other input devices 1332.

In addition to the above, the terminal device 210 may further include a power supply 1390 for supplying power to other modules, an audio circuit 1360, a near field communication module 1370, and an RF circuit 1310. The terminal device 210 may also include one or more sensors 1350, such as acceleration sensors, light sensors, pressure sensors, and the like. The audio circuit 1360 specifically includes a speaker 1361 and a microphone 1362, for example, the user may use voice control, the terminal device 210 may collect the user's voice through the microphone 1362, may control the user's voice, and when the user needs to be prompted, plays a corresponding prompt sound through the speaker 1361.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A manuscript display control method for an audio program content, comprising:

2. The method of claim 1, wherein the document content includes sentences divided in a speech recognition process and punctuation marks added between the sentences.

3. The method of claim 1, wherein the play control area further comprises a play pause control; the method further comprises the following steps:

responding to the triggering operation of the playing pause control, and pausing the playing of the target audio program content; and

4. The method of claim 1, wherein the document presentation area further comprises a document presentation control, the method further comprising:

responding to the triggering operation of the manuscript display control, and displaying a manuscript display page, wherein the manuscript display page displays at least one line of manuscript content part corresponding to the current playing progress, and at least one of front M lines of content adjacent to the at least one line of manuscript content part and back N lines of content adjacent to the at least one line of manuscript content part, and M and N are positive integers; or

5. The method of claim 4, wherein when the document content includes at least two paragraphs, every two adjacent paragraphs are in the corresponding audio content, a playing interval between a last sentence in a previous paragraph and a first sentence in a next paragraph is greater than a preset time threshold, and a playing interval between every two adjacent sentences in the same paragraph is not greater than the preset time threshold.

6. The method of claim 5, wherein in the document presentation page or the document summary area, a target paragraph containing the at least one line of document content portion is displayed in a first display mode, and content of paragraphs other than the target paragraph is displayed in a second display mode.

7. The method of claim 4, the playback control page including a playback progress control that adjusts the playback progress of the audio content, the method further comprising:

in response to a document dragging operation triggered for the document display page or the document overview region, dragging the document content displayed in the document display page according to the document dragging operation;

8. The method of claim 7, wherein the method further comprises:

responding to the operation aiming at the playing progress control, updating the playing progress of the target audio program content, and starting to play the target audio program content from a time node corresponding to the updated playing progress;

9. The method of claim 4, wherein a switching control is displayed at a specified position of the document presentation page or the document overview area, the switching control being for controlling switching of document contents; the method further comprises the following steps:

responding to a switching operation triggered by the switching control, and displaying the manuscript content corresponding to the next audio program content of the target audio program content in the target play queue in the manuscript display area or the manuscript overview area; and

10. The method according to any one of claims 1 to 9, wherein the playing control area includes a first selection control for confirming playing audio and presenting a manuscript, and the playing the audio content of the target audio program content specifically includes:

11. A method for controlling presentation display of audio program content, the method comprising:

12. A manuscript display control device for audio program contents, comprising:

13. A manuscript display control device for audio program contents, comprising:

14. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 10 or the steps of the method of claim 11.

15. A computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to perform the steps of the method of any of claims 1-10 or the steps of the method of claim 11, when said program code is run on the electronic device.