CN117215440A

CN117215440A - Audio production method and device for written works and computer readable storage medium

Info

Publication number: CN117215440A
Application number: CN202210529229.9A
Authority: CN
Inventors: 吴玥璇; 蒋维明; 史小静; 许亚东; 陆飞; 姜伟; 鹿畅; 程龙; 林国雯; 赵亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2023-12-12

Abstract

The application discloses an audio production method, a device and a computer readable storage medium of a literal work, wherein the method is used for carrying out role analysis on the target literal work and displaying a role name display interface corresponding to the target literal work by responding to work importing operation on the target literal work; responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface; and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works. The method can effectively improve the audio production efficiency of the written works.

Description

Audio production method and device for written works and computer readable storage medium

Technical Field

The application relates to the technical field of audio processing, in particular to an audio production method and device of a written work and a computer readable storage medium.

Background

With the continuous development of internet technology, daily life is already indistinct from the internet. In the internet era, with the continuous development of intelligent terminal technology and the continuous reduction of traffic cost, the form of knowledge transmission is also greatly transformed. The electronic book application is characterized by large book collection amount, low cost and convenient carrying, and gradually becomes a favored reading mode for people in the Internet era.

To further enrich the reading experience when reading electronic books, industry developers have developed ways of reading aloud, such as audio novels. The audio novel can play text contents (including conversations, whites and the like) in an audio mode in a terminal such as a mobile phone and the like, and provides an immersive reading experience for readers.

At present, when a voiced novel is manufactured, a manual dubbing mode is adopted for dubbing, and then the manual configurations are recorded into the voiced novel. The cost of manual dubbing is high, the manufacturing cost is high, the manufacturing period is very long, and the manufacturing efficiency is very low.

Disclosure of Invention

The embodiment of the application provides an audio production method and device of a written work and a computer readable storage medium.

The first aspect of the application provides an audio production method of a written work, comprising the following steps:

responding to a work importing operation of a target written work, performing role analysis on the target written work and displaying a role name display interface corresponding to the target written work, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name of a corresponding target role;

responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in the role name display interface;

and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of the target written work.

Accordingly, a second aspect of the present application provides an audio production apparatus for a written work, the apparatus comprising:

The first display unit is used for responding to the work importing operation of the target written works, carrying out role analysis on the target written works and displaying a role name display interface corresponding to the target written works, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name of a corresponding target role;

the second display unit is used for responding to the triggering operation of entering the role dubbing interface, and displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in the role name display interface;

and the dubbing unit is used for acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing the dialogue text of the character based on the set tone model of each character, and obtaining the target audio of the target literal work.

In some embodiments, the audio production device for a written work provided by the application further comprises:

the display device comprises a first display subunit, a second display subunit and a third display subunit, wherein the first display subunit is used for displaying a text content display page, the text content display page comprises a text display area and a dubbing setting area, the text content display area displays a target text, and the dubbing setting area displays a second character name list of characters contained in the target text;

A second display subunit configured to display a second candidate tone model list of a target role name in response to a touch operation for the target role name displayed in the second role name list;

and the first dubbing subunit is used for updating dubbing of the target role dialogue text by adopting the first target candidate tone model in response to the selected operation of the first target candidate tone model in the second candidate tone model list.

In some embodiments, the character dubbing interface further includes a white-out dubbing setting control, and the audio production device for the written works provided by the application further includes:

a third display subunit, configured to display a third candidate timbre model list of the bystander text in response to a touch operation of the bystander dubbing setting control;

and the second dubbing subunit is used for dubbing the bystander text based on the second target candidate tone model in response to the selection operation of the second target candidate tone model in the third candidate tone model list.

the fourth display subunit is used for displaying the volume adjusting component and the language adjusting component on the role dubbing interface;

Dubbing unit is used for:

and acquiring tone model setting operation, volume adjusting operation and speech speed adjusting operation of each character in the first character name list received in the character dubbing interface, and dubbing dialogue texts of the characters based on the set tone model, the set volume and the set speech speed of each character to obtain target audio of the target written works.

In some embodiments, the first candidate timbre model list includes a plurality of timbre type controls, the dubbing unit including:

a determining subunit, configured to determine, in response to a first touch operation for any selected character name in the first character name list, a target dialog text related to the selected character name in the target works;

a fifth display subunit, configured to respond to a touch operation on a target tone type control in the first candidate tone model list, and display a plurality of tone model controls corresponding to the target tone type;

the obtaining subunit is used for responding to the touch operation of the target tone model control in the plurality of tone model controls to obtain a target tone model;

and the third dubbing subunit is used for dubbing the target dialogue text by adopting the target tone model and traversing each role name to obtain the target audio of the target literal work.

a sixth display subunit, configured to display a role name modification interface of a role name selected by any one of the role name display interfaces in response to a second touch operation for the selected role name;

and the modification subunit is used for acquiring the modification operation received by the role name modification interface and modifying the selected role name according to the modification operation.

a seventh display subunit, configured to display a role name deletion interface for selecting a role name in response to a third touch operation for selecting a role name in the role name display interface;

and the deleting subunit is used for deleting the selected character name in the first character name list in response to the confirmation deleting operation received by the character name deleting interface.

In some embodiments, the first display unit includes:

an eighth display subunit, configured to display an audio production interface of a written work, where the audio production interface includes a written work import control;

A ninth display subunit, configured to display a works selection interface in response to a touch operation for the works import control, where the works selection interface includes display tags of a plurality of works;

an importing subunit, configured to import a target works corresponding to a target display tag in response to a selection operation of the target display tag in display tags for the plurality of works;

the analysis subunit is used for carrying out role analysis on the target written works;

and the tenth display subunit is used for displaying a character name display interface corresponding to the target written work according to the character analysis result, wherein the character name display interface comprises a plurality of character name display areas, and each target character name display area displays at least one character name corresponding to the target character.

In some embodiments, the parsing subunit includes:

the first recognition module is used for recognizing a reference word in the target written work, wherein the reference word is a word or phrase conforming to a preset part of speech;

and the second recognition module is used for inputting the reference word into the first preset neural network model to perform character name recognition to obtain a plurality of output character names.

the first determining module is used for determining the role pointed by each role in the plurality of outputted role names to obtain the mapping relation between the role names and the roles;

and the second determining module is used for determining a role name set corresponding to each role according to the mapping relation.

In some embodiments, the first determining module comprises:

the identification sub-module is used for inputting the plurality of outputted character names into a second preset neural network model to perform coreference resolution identification, so as to obtain a plurality of outputted character name clusters;

and the determining submodule is used for determining the roles corresponding to each role name cluster and obtaining the mapping relation between the role names and the roles.

The third aspect of the present application also provides a computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the audio production method for a written work provided by the first aspect of the present application.

A fourth aspect of the present application provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the audio production method for a written work provided in the first aspect of the present application when the computer program is executed.

A fifth aspect of the application provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps in the audio production method of a written work provided in the first aspect.

According to the audio production method of the literal works, character analysis is carried out on the target literal works and character name display interfaces corresponding to the target literal works are displayed by responding to work importing operation of the target literal works, the character name display interfaces comprise a plurality of character name display areas, and each target character name display area displays at least one character name corresponding to a target character; responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface; and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works.

Therefore, the audio production method of the written works provided by the application can automatically analyze and display the characters in the written works and the plurality of character names corresponding to each character by only importing the written works to be dubbed. Then, dubbing of the imported literal works can be realized only by setting a dubbing tone model of the dialogue text corresponding to each role, and the method greatly improves the audio production efficiency of the literal works.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of a scene of an audio production of a Chinese work in accordance with the present application;

FIG. 2 is a flow chart of an audio production method of a written work provided by the application;

FIG. 3A is another schematic view of an audio production method of a written work according to the present application;

FIG. 3B is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 3C is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 3D is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 3E is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 4A is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 4B is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 4C is a schematic view of another scenario of the audio production method of a written work provided by the present application;

FIG. 4D is a schematic view of still another scenario of the audio production method of a written work provided by the present application;

FIG. 5 is another flow chart of the audio production method of the written works provided by the application;

FIG. 6 is a schematic diagram of an audio production device for a written work according to the present application;

fig. 7 is a schematic structural diagram of a computer device provided by the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides an audio production method and device of a written work, a computer readable storage medium and computer equipment. The audio production method of the written works can be used in an audio production device of the written works. The audio production device of the written works can be integrated in a computer device, and the computer device can be a terminal or a server. The terminal can be a mobile phone, a tablet computer, a notebook computer, an intelligent television, a wearable intelligent device, a personal computer (PC, personal Computer), a vehicle-mounted terminal and other devices. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like. Wherein the server may be a node in a blockchain.

Referring to fig. 1, a schematic view of a scenario of an audio production method of a written work according to the present application is shown. As shown in the figure, a terminal B receives a target works transmitted by a server a, then performs role analysis on the target works in response to the works importing operation on the target works, and displays a role name display interface corresponding to the target works, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to a target role; responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface; and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works. Further, the terminal B may send the target audio corresponding to the target written work to the server a.

It should be noted that, the schematic view of the audio production scene of the written work shown in fig. 1 is only an example, and the audio production scene of the written work described in the embodiment of the present application is for more clearly describing the technical solution of the present application, and does not constitute a limitation to the technical solution provided by the present application. Those skilled in the art can know that with the evolution of the audio production scene of the written work and the appearance of the new service scene, the technical scheme provided by the application is also applicable to similar technical problems.

The following describes the above-described embodiments in detail.

In the related art, when a voiced novel is produced, a dialogue text of each character is generally dubbed by using a manual dubbing, and then the voiced novel is produced by using a recorded manual dubbing. The manual dubbing is long in time consumption, and different dubbing actors are required for dubbing by different roles, so that the cost of the manual dubbing is high. In this regard, a method for automatically generating the voiced novels by adopting a method for automatically dubbing by a tone model is proposed by technicians in the voiced novels industry, so that the manufacturing efficiency of the voiced novels is improved, and the manufacturing cost of the voiced novels is reduced. However, the existing method for automatically dubbing by adopting a tone model only can dub by adopting the same dubbing model for dialogue texts with other and different roles, and the dubbing form is monotonous, so that immersive book listening experience cannot be realized. In this regard, the application provides an audio production method for a written work, so as to be able to differentially display dialogue audios of different roles to a certain extent and promote the listening experience of a voiced novel.

The embodiment of the application will be described from the perspective of an audio production device for a work of words, which may be integrated in a computer device. The computer device may be a terminal or a server. The terminal can be a mobile phone, a tablet computer, a notebook computer, an intelligent television, a wearable intelligent device, a personal computer (PC, personal Computer), a vehicle-mounted terminal and other devices. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like. As shown in fig. 2, a flow chart of an audio production method of a written work provided by the application is shown, and the method comprises the following steps:

step 101, responding to the work importing operation of the target works, performing role analysis on the target works and displaying a role name display interface corresponding to the target works, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to the target role.

The audio production method of the written works can automatically generate corresponding audio for the imported written works, namely, the corresponding sound works are automatically generated. For example, when the written work is a novel, then a voiced novel corresponding to the novel may be automatically generated; when the written works are poems, the corresponding sound poems and the like of the poems can be automatically generated. That is, the subject matter of the written works herein may be of various types, without limitation. In the embodiment of the application, a novel is specifically taken as an example of a written work to explain the technical scheme of the application in detail.

When the audio production device of the works imports a work to generate corresponding audio, the work is determined to be the target work. The audio production device of the written work can firstly conduct role analysis on the target written work to determine role information in the written work. Specifically, when the target written work is a novel, the characters in the novel can be analyzed to obtain character information in the novel. Among other things, it is to be appreciated that in a novel or other written work, a character can have a plurality of different character names, which can include, for example, the character's name, alias, nickname, title, and so forth. After the character name information of the character in the novel is obtained, the audio production device of the written work can further display a character name display interface. In the character name display interface, a plurality of character names of each character can be displayed in total.

For example, as shown in fig. 3A, a schematic view of a scenario of an audio production method of a written work provided by the present application is shown. As shown in the figure, in the character name presentation interface 10, a plurality of character name presentation areas 11 are displayed, and at least one character name corresponding to one character is presented in each character name presentation area 11. Wherein, the character Zhang Saner has an alias name in addition to the corresponding character name; the character Li IV has the name of the corresponding character and also has the aliases of small IV and four; the five character king has the corresponding character name and no alias. The character names or aliases of one character are distinguished, and the character names or aliases of one character can be determined according to the occurrence times of different character names of the same character in the written work. When a character has a plurality of names, the name that appears most frequently in the written work may be determined as the character name of the character, and the other names may be determined as the alias of the character.

In some embodiments, in response to a work importing operation on a target written work, performing role resolution on the target written work and displaying a role name display interface corresponding to the target written work, the role name display interface including a plurality of role name display areas, each of the target role name display areas displaying at least one role name of a corresponding target role, including:

1. Displaying an audio production interface of the written works, wherein the audio production interface comprises a written work importing control;

2. responding to touch operation for a written work import control, displaying a written work selection interface, wherein the written work selection interface comprises display labels of a plurality of written works;

3. responding to the selected operation of a target display label in the display labels of a plurality of written works, and importing the target written works corresponding to the target display label;

4. performing role analysis on the target written works;

5. and displaying a character name display interface corresponding to the target written work according to the character analysis result, wherein the character name display interface comprises a plurality of character name display areas, and each target character name display area displays at least one character name corresponding to the target character.

In the embodiment of the application, an importing method of the written works is provided so as to facilitate importing and analyzing the written works of the audio to be generated. Specifically, an audio production interface of the works may be set, for example, when an audio production application of the works is opened, the audio production interface of the works may be displayed, and the audio production interface reports the works import control. When the user clicks the works introduction control, a candidate works selection interface is displayed, which may include display labels for a plurality of works. At the moment, when the user clicks the target works to be selected, the target works can be imported, role analysis is carried out on the target works, and then a role name display interface corresponding to the target works is displayed according to analysis results.

Fig. 3B is a schematic diagram of another scenario of the audio production method of a written work according to the present application. As shown, when a user opens an audio production application of a work, an audio production interface 20 of the work may be displayed, and a work introduction control 21 is displayed in the audio production interface 20 of the work. In response to a touch operation to the work introduction control 21, a work selection interface as shown in fig. 3C is displayed. In the works selection interface 30, display labels of a plurality of works are displayed, specifically including a first display label 31 corresponding to a first works, a second display label 32 corresponding to a second works, and a third display label 33 corresponding to a third works. The user can select and import the corresponding written works by clicking the label corresponding to the written works to be selected.

In some embodiments, when the audio production device of the work supports multi-application split screen display, a preset importing area may be set in the audio production interface 20 of the work, and the user may implement importing the target work by dragging the target work into the preset importing area.

In some embodiments, performing role resolution on a target written work includes:

4.1, identifying a reference word in the target written work, wherein the reference word is a word or phrase conforming to a preset part of speech;

and 4.2, inputting the reference word into a first preset neural network model to perform character name recognition, and obtaining a plurality of character names.

In the embodiment of the application, the character analysis in the target works can be specifically the character name for identifying the character in the target works. In an embodiment of the application, all the character names in the target work may be identified by a named entity recognition algorithm (Named Entity Recognition, NER) in natural language processing techniques (Natural Language Processing, NLP). The NER technology is a mature technology at present, from an early dictionary and rule-based method to a traditional machine learning method, a deep learning-based method is adopted later, and until a research method such as a current attention mechanism, a graph neural network and the like is adopted, and the NER technology route is continuously developed along with time. The NER technique will not be described in detail here.

The character names in the target works are identified by adopting the NER technology, and the specific process can identify the reference words in the target works, wherein the reference words can be also called reference. The reference may be a word or phrase of some particular part of speech in the target works of words, such as noun phrases, notations, etc. After identifying these pronouns, these pronouns may be input into a predetermined neural network model for character name identification, where, to distinguish from other neural network models in the present application, they may be referred to herein as a first predetermined neural network model. The first predetermined neural network model may specifically be a trained character name recognition network model that can recognize character names among a plurality of reference words.

In some embodiments, the audio production method of the written works provided by the application further comprises the following steps:

determining the role pointed by each role name in the plurality of outputted role names to obtain a mapping relation between the role names and the roles;

and 4.4, determining a role name set corresponding to each role according to the mapping relation.

Wherein, the character names outputted after character name recognition is carried out on the reference words through the character name recognition network model contain all character names in the target literal works. As previously described, in a written work, a character may have one character name or a plurality of character names. When setting tone models corresponding to dialogue texts of a character according to the character names of the character, preferably one character is indicated by one character name, otherwise, the problem that the same character is provided with different tone models or the same character needs to be provided with tone models for many times is caused, and the audio production efficiency of a written work is affected. Therefore, after identifying all character names in the written work, it is further necessary to determine the attribution of the character of each character name. After obtaining all character names in the target text, further determining the role pointed by each character name in the character names to obtain a mapping relation between the character names and the roles, and determining a character name set corresponding to each character according to the mapping relation so as to be displayed on a character name display interface.

In some embodiments, determining the role to which each of the plurality of outputted role names points, resulting in a mapping between the role name and the role, includes:

4.3.1, inputting the outputted plurality of character names into a second preset neural network model for coreference resolution identification, and obtaining a plurality of outputted character name clusters;

and 4.3.2, determining the roles corresponding to each role name cluster, and obtaining the mapping relation between the role names and the roles.

In the embodiment of the application, a method for identifying by adopting coreference resolution is provided for determining the role attribution of each role name. Wherein coreference resolution (Coreference Resolution or Reference Resolution) is the process of finding all words or phrases in text that point to the same entity. Because people often refer to the same entity differently, such as a code, noun phrase, etc., we refer to the term "reference", and the goal of coreference resolution is to find them out.

Coreference resolution has been a core problem in natural language understanding, and has important applications in the fields of machine translation (Machine Translation), information extraction (Information Extraction), automatic digest (Automatic Summarization), automatic question-answering (Question Answering), and the like. Fully automatic co-resolution is a difficult task for computers to understand natural language because it not only requires understanding the semantics of whole text, but also avoids many ambiguities. For example, in the above text, the model is likely to recognize semantically similar words such as "school" as co-terms.

Specifically, after all character names in the target written work are identified from the reference words through the character name identification model, the character names can be further input into a second preset neural network model to perform coreference resolution identification. The second predetermined neural network model may be a trained coreference resolution recognition model. After all the character names in the target written work are identified by the coreference resolution identification model, a plurality of character name clusters are output, wherein each character name cluster points to one character. Then the role corresponding to each role name cluster can be further determined, and the mapping relation between each role name and the role can be obtained.

A. responding to a second touch operation aiming at any selected character name in the character name display interface, and displaying a character name modification interface of the selected character name;

B. and acquiring the modification operation received by the character name modification interface, and modifying the selected character name according to the modification operation.

In the embodiment of the application, the identified role name can be modified in the role name display interface. The character name recognition result displayed in the character name display interface is recognized by the character name recognition model and the coreference resolution recognition model, and the neural network model has the problem of precision, so that the absolute accuracy of the recognition result cannot be ensured. That is, there may be a case of an identification error, and thus, in the embodiment of the present application, the character name displayed on the character name display interface may provide a modification function. Specifically, the second touch operation can be performed on any character name needing to be modified in the character name display interface, and the character name modification interface for selecting the character name can be displayed by long-press operation in the second touch operation. The user can input a modification operation in the character name modification interface, thereby modifying the corresponding character name.

Specifically, as shown in fig. 3D, when the user performs a long press operation on the alias "four child" of the fourth character, a character name modification interface of the alias "four child" may be displayed, specifically, a character name modification area 12 of the alias "four child" may be displayed in the character name display interface 10, and the user may input the modified character name at the character name modification area 12, thereby implementing modification of the alias "four child".

a. responding to a third touch operation for selecting a character name in the character name display interface, and displaying a character name deleting interface for selecting the character name;

b. and deleting the selected character name in the first character name list in response to the confirmation deleting operation received by the character name deleting interface.

Wherein in some cases, model recognition errors can also result in recognition of words that are not role names as role names. Therefore, in the embodiment of the application, a method for deleting the identified role name in the role name display interface is also provided. Specifically, the user may perform a third touch operation on any role name that needs to be deleted in the role name display interface, where the third touch operation may be a double click operation or a click operation. As can be seen from the foregoing description of the embodiments, when modifying the character name displayed on the character name display interface, the touch operation needs to be performed on the character name, so as to distinguish between the touch operations with different purposes, and the second touch operation and the third touch operation may be set to be different touch operations. And after performing third touch operation on any character name needing to be deleted in the character name display interface, the character name deleting interface of the selected character name can be further displayed. The user may further confirm in the character name deletion interface to delete the corresponding character name.

Specifically, as shown in fig. 3E, when the user performs the third touch operation on "three" in the character name display interface, a corresponding character name deletion interface 13 is displayed, a deletion control and a cancel control may also be displayed in the character name deletion interface 13, and when the user clicks the deletion control, it is determined that the character name is deleted, and then the character name is deleted from the character name display interface.

Step 102, responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface.

After the identified character name information is displayed, a character dubbing interface can be further displayed according to the identified character name. In the character dubbing interface, dubbing setting can be performed on each character, and specifically, a tone model corresponding to each character can be set. The triggering operation of entering the role dubbing interface can be that a control for entering the next step is displayed in the role name display interface, and when a user touches the control, the entering of the role dubbing interface is triggered.

Specifically, a list of character names to be dubbed and a list of candidate timbre models may be displayed in the displayed character dubbing interface. The character name list to be dubbed is specifically determined according to the character names displayed in the character name display interface. As described above, the character name display interface 10 displays the character name and the alias of each character, and in order to avoid confusion caused by setting a plurality of dubbing by one character and improve the dubbing setting efficiency, only the character name of each character may be displayed in the character dubbing interface, and the alias of the character may not be displayed.

Fig. 4A is a schematic view of another scenario of the audio production method of the written work provided by the present application. As shown, in the character dubbing interface 40, a first character name list 42 of characters to be dubbed and a first candidate tone model list 41 are displayed. The user may select a character to be dubbed in the first character name list 42, for example, select character Zhang, and then further select a tone model in the first candidate tone model list 41, for example, select a tone model corresponding to "army", so as to set a tone model of a dialog text corresponding to character Zhang to "army", that is, dub character Zhang with a tone model corresponding to "army".

In some embodiments, the first character name list 42 may also display a dubbing setting status corresponding to each character name, where the dubbing setting status corresponding to each character name displays whether the corresponding dubbing model is set for the character corresponding to the character name, so that a user can clearly confirm which characters still need to be dubbed, thereby avoiding repetitive labor and improving the audio generating efficiency of the written work.

Wherein, in some embodiments, a white-out dubbing setting control can also be displayed in the character dubbing interface. The audio production method of the written works provided by the application can also comprise the following steps:

1. responding to the touch operation of the bystander dubbing setting control, and displaying a third candidate tone model list of the bystander text;

2. and in response to a selection operation of a second target candidate timbre model in the third candidate timbre model list, dubbing the bystander text based on the second target candidate timbre model.

Since the written works include the bystander content in addition to the dialogue of the characters, when the corresponding sound works of the written works, namely, the audio of the written works are produced, the dialogue text of each character needs to be dubbed, and the bystander needs to be dubbed. Therefore, in the dubbing setting, in addition to the dubbing setting for the dialogue text of each character, the dubbing setting for the bystander content is also required. Therefore, in the embodiment of the application, the white-out dubbing setting control can be displayed in the character dubbing interface. The user may make the side soundtrack settings by clicking on the side soundtrack setting control. Specifically, a candidate tone model list corresponding to the white text may be displayed after clicking the white-out dubbing setting control, and then a corresponding tone model may be selected in the candidate tone model list to dub-out the white text.

As shown in fig. 4B, a white-out dubbing setting control 43 may also be displayed in the character dubbing interface 40, and in response to a touch operation of the white-out dubbing setting control 43, a third candidate timbre model list may be switchably displayed at the first candidate timbre model list 41. The user may then select a corresponding timbre model in the third list of candidate timbre models to dub the side.

A. displaying a volume adjusting component and a language adjusting component on the character dubbing interface;

B. and receiving setting operation aiming at the volume adjusting component and the language adjusting component, and adjusting the volume and the language speed of the dubbing of the roles according to the setting operation.

In the embodiment of the application, in order to bring immersive book listening experience to listeners with books, the volume and the speech speed of dubbing of each character can be further adjusted. Specifically, the volume adjusting component and the language adjusting component can be displayed in the role dubbing interface, and then when the user sets a tone model for the role, the volume and the language speed of dubbing corresponding to the role can be set simultaneously. Referring to fig. 4C, a voice adjusting component 44 and a voice adjusting component 45 may also be displayed in the character dubbing interface 40, and a user may set the volume and the voice rate of dubbing of each character at the same time when the character dubbing is set.

Step 103, acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works.

After a user sets a corresponding tone model of dubbing for each character on a character dubbing interface, the corresponding tone model can be used for dubbing the dialogue text of each character to obtain the target audio of the target written work. When the target written work is a novel, a voiced novel of the novel is generated.

In some embodiments, the first candidate tone model list includes a plurality of tone type controls, obtains a tone model setting operation for each character in the first character name list received in the character dubbing interface, dubbing a dialogue text of the character based on the set tone model of each character, and obtains a target audio of a target written work, including:

1. in response to a first touch operation for any selected character name in the first character name list, determining target dialog text associated with the selected character name in the target written work;

2. Responding to touch operation of a target tone type control in the first candidate tone model list, and displaying a plurality of tone model controls corresponding to the target tone type;

3. responding to touch operation of a target tone model control in the plurality of tone model controls, and acquiring a target tone model;

4. and dubbing the target dialogue text by adopting a target tone model, and traversing each character name to obtain target audio of the target written work.

In the embodiment of the application, when the number of candidate timbre models in the candidate timbre model list is large, the timbre models can be classified, and then a timbre type control corresponding to each type of timbre model is displayed in the first candidate timbre model list. When a user selects a selected character name, a target tone type control to be selected can be touch-controlled in the first candidate tone model list, and then a next-stage label corresponding to the target tone type control is displayed, namely a plurality of tone model controls belonging to the tone type are displayed. The user may further select a target timbre model from the plurality of timbre model controls to complete the dubbing setting for the selected timbre. Further, a target dialogue text related to the role name in the target literal work can be determined according to the selected role name, and then the target dialogue text is dubbed by adopting a target tone model, so that all dialogue audios corresponding to the role are obtained. And traversing each role to obtain the target audio of the target written work.

A. displaying a text content display page, wherein the text content display page comprises a text display area and a dubbing setting area, the text content display area displays a target text, and the dubbing setting area displays a second character name list containing characters in the target text;

B. in response to a touch operation for the target character name displayed in the second character name list, displaying a second candidate tone model list of the target character name;

C. and in response to a selection operation for the first target candidate timbre model in the second candidate timbre model list, updating dubbing of the target character dialogue text by adopting the first target candidate timbre model.

The setting of the tone model, the dubbing volume and the dubbing speed for each character at the character dubbing interface and the corresponding setting of the side play are setting operations performed in the global dimension. In the embodiment of the application, after the dubbing setting of the global dimension is completed, the set role dubbing can be further refined.

Specifically, when the color dubbing is refined, a text content display page can be displayed, and the text content of the target works of characters is displayed in the text content display page. The generated dubbing can be synchronously played while the text content of the target written works is displayed. When the user feels that the currently set dubbing needs to be adjusted, the dubbing of the corresponding role can be further adjusted in the dubbing setting area displayed in the text content display page.

Specifically, a text content presentation page may be displayed, the text content presentation page including a text presentation area that displays a target text and a dubbing setting area that displays a second list of personas that includes the personas in the target text. And then displaying a second candidate timbre model list of the target character names in response to a touch operation for the target character names displayed in the second character name list. Then, in response to a selection operation of the target candidate timbre model in the second candidate timbre model list, the selected target candidate timbre model is employed to update the dubbing of the target character conversation text.

Referring to fig. 4D, a text display area 51 and a dubbing setting area 52 are displayed in the text content display page 50, and a character name including a character in a target text displayed in the text display area 51 is displayed in the dubbing setting area 52. When the dubbing of a certain character needs to be adjusted, a control corresponding to the character can be touched to display a corresponding tone model list for resetting.

According to the above description, in the audio production method of the written works provided by the embodiment of the application, by responding to the work importing operation of the target written works, the role of the target written works is analyzed and the role name display interface corresponding to the target written works is displayed, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to the target role; responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface; and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works.

The application also provides an audio production method of the written works, which can be used in computer equipment, wherein the computer equipment can be a terminal, and the terminal can be a mobile phone or other terminals such as a personal computer, a tablet and the like. As shown in fig. 5, another flow chart of the audio production method of the written works provided by the application is shown, and the method specifically includes:

in step 201, the computer device displays a novel audio production interface.

When the user opens the audio novel making application, the audio novel making interface can be displayed. An import novel control may be displayed in the novel audio production interface.

In step 202, in response to a touch operation to an import novel control, the computer device displays a list of candidate novels.

When the user clicks the import novel control, the computer opens the folder storing the novel and displays a candidate novel list. The user can arbitrarily select a target novel to be prepared in the candidate novel list.

In step 203, in response to a selection operation on the target novels in the candidate novels list, the computer device inputs the target novels into a preset coreference resolution recognition model, and obtains a name and an alias of each role in the target novels.

When the user selects a target novel in the candidate novel list and imports the target novel. The computer device will parse the character information in the target novel. In particular, the computer device may employ a coreference resolution algorithm to identify a plurality of character names for each character in the target novice. The method comprises the steps of identifying a plurality of character names of each character in a target novel by adopting a coreference resolution algorithm, and particularly, inputting novel text content of the target novel into a trained coreference resolution identification model to obtain a plurality of character name clusters output by the model, wherein each character name cluster comprises a plurality of character names corresponding to one character.

Further, a plurality of character names may be divided into character names and aliases of characters according to where each character name appears in the target text. The character name having the largest number of occurrences may be used as the character name, and the other character names may be used as the character names.

Wherein the coreference resolution algorithm has been initially described in the previous embodiments. The coreference resolution model is largely divided into two steps: the first step: finding out all the references in the text; and a second step of: for a word, a reference pair is formed by the word and the reference in front of the word, each reference pair is scored, and the reference pair with the highest score is selected to determine the co-pointing relationship.

The study of coreference resolution can be divided into five phases: (1) Linguistic method based on syntactic analysis, representative methods are Hobbs algorithm and central theory; (2) Classification method based on binary pairs and clustering method based on vector similarity; (3) a global optimization-based approach; (4) a method for introducing background knowledge and semantic knowledge; (5) a deep learning-based method. With the development of deep learning in recent years, an end-to-end coreference resolution algorithm with better effect appears, and the effect of all models in the past is exceeded under the condition that syntactic analysis and named entity recognition are not needed. The model uses a substring ranking (span-ranking) method to directly rank the substrings (a word string span is a substring of text, and a sentence with a length of n has n (n-1)/2 spans). The embedded layer of the end-to-end network structure uses a pre-trained model, and the feature extraction layer can use BiLSTM (two-way long-short-term memory unit) or various mainstream pre-trained language models to obtain a span as a score of a motion. Then scoring whether each two span can form a co-finger and then performing co-finger score (Antecent score), and combining the two spans is the score and the co-finger to obtain a final index score (Coreference score).

Step 204, the computer device displays a character name presentation interface.

After determining the role name and alias for each role, the computer device may further display a role name presentation interface where the computer device presents the role name for each role and all aliases for that role.

In step 205, in response to the modification or deletion operation for the character names displayed in the character name display interface, the computer device modifies or deletes the character names in the character name display interface accordingly.

After the computer equipment displays the character names and all aliases of each character in the character name display interface, the user can modify or delete the character names and aliases displayed in the character name display interface, and the computer equipment can update the character name information of the character according to the modification and deletion operations of the user.

In step 206, in response to the triggering operation of entering the character dubbing interface, the computer device displays the character dubbing interface.

After the character name information in the character name display interface is modified by the user, the character name information of each character of the target novel is determined, and at the moment, the entering of the character dubbing interface can be triggered by clicking a confirmation control or clicking a control for entering the next step.

In the role dubbing interface, the role name and the side white of each role can be displayed; in addition, a plurality of candidate timbre models, a volume adjustment component, and a display language adjustment component may also be displayed. Thus, the user can set tone model, dubbing volume and dubbing speed for each character or side in the character dubbing interface.

In step 207, the computer device obtains the tone model setting, the dubbing volume setting and the dubbing speed setting for each character received in the character dubbing interface, and dubbing is performed on the dialogue text of each character based on the setting, so as to obtain the target audio of the target novel.

After the user performs dubbing setting on each character and the side of the character dubbing interface, the dialogue text or the side of each character can be dubbed according to the setting condition of the user, namely according to the setting tone model, the setting volume and the setting speech speed of each character and the side of the character, so as to obtain the target audio of the target novel, namely the voiced novel of the target novel is manufactured.

In step 208, the computer device displays the audio modification interface of the target novel in response to a triggering operation to enter the audio modification interface of the target novel.

After the voiced novels of the target novels are produced, the production conditions of the voiced novels can be checked further. That is, in response to a triggering operation into the audio modification interface of the target novel, the computer device may display the audio modification interface of the target novel. Text content of the target novels can be displayed on the audio modification interface, and the voiced novels corresponding to the target novels can be synchronously played. So that the user can check accordingly.

In step 209, the computer device obtains the audio modification settings for the target character received in the audio modification interface, and updates the dubbing of the target character in the target audio based on the audio modification settings.

When the user feels that the dubbing of a certain character needs to be modified in the checking process, the dubbing setting area displayed in the audio modification interface can be used for carrying out dubbing setting on the selected character, and then the dubbing of the selected character is adjusted and updated based on the reconfiguration condition.

According to the above description, according to the audio production method of the written works, the character analysis is performed on the target written works and the character name display interface corresponding to the target written works is displayed in response to the work importing operation of the target written works, wherein the character name display interface comprises a plurality of character name display areas, and each target character name display area displays at least one character name corresponding to the target character; responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface; and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works.

In order to better implement the audio production method of the written works, the embodiment of the application also provides an audio production device of the written works, and the audio production device of the written works can be integrated in a terminal or a server.

For example, as shown in fig. 6, a schematic structural diagram of an audio production device for a works according to an embodiment of the present application may include a first display unit 301, a second display unit 302, and a dubbing unit 303, where:

a first display unit 301, configured to perform role resolution on a target written work and display a role name display interface corresponding to the target written work in response to a work importing operation on the target written work, where the role name display interface includes a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to a target role;

The second display unit 302 is configured to display a role dubbing interface in response to a trigger operation of entering the role dubbing interface, where the role dubbing interface includes a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to role names displayed in a role name display interface;

and the dubbing unit 303 is configured to obtain a tone model setting operation for each character in the first character name list received in the character dubbing interface, and dub the dialogue text of the character based on the set tone model of each character, so as to obtain the target audio of the target written work.

the first display subunit is used for displaying a text content display page, wherein the text content display page comprises a text display area and a dubbing setting area, the text content display area displays a target text, and the dubbing setting area displays a second character name list of characters contained in the target text;

a second display subunit configured to display a second candidate tone model list of the target character names in response to a touch operation for the target character names displayed in the second character name list;

And the first dubbing subunit is used for updating dubbing of the target role dialogue text by adopting the first target candidate timbre model in response to the selected operation of the first target candidate timbre model in the second candidate timbre model list.

the third display subunit is used for responding to the touch operation of the bystander dubbing setting control and displaying a third candidate tone model list of the bystander text;

and the second dubbing subunit is used for dubbing the bystander text based on the second target candidate timbre model in response to the selection operation of the second target candidate timbre model in the third candidate timbre model list.

dubbing unit is used for:

and acquiring tone model setting operation, volume adjusting operation and speech speed adjusting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue texts of the characters based on the set tone model, the set volume and the set speech speed of each character, and obtaining target audio of target literal works.

In some embodiments, the first candidate timbre model list includes a plurality of timbre type controls, a dubbing unit, including:

a determining subunit configured to determine, in response to a first touch operation for any selected character name in the first character name list, a target dialog text related to the selected character name in the target works;

a fifth display subunit, configured to respond to a touch operation on the target tone type control in the first candidate tone model list, and display a plurality of tone model controls corresponding to the target tone type;

the acquisition subunit is used for responding to touch operation of a target tone model control in the plurality of tone model controls and acquiring a target tone model;

and the third dubbing subunit is used for dubbing the target dialogue text by adopting the target tone model and traversing each character name to obtain the target audio of the target literal work.

a sixth display subunit, configured to display a role name modification interface for a role name selected by the role name display interface in response to a second touch operation for any one of the selected role names;

a seventh display subunit, configured to display a character name deletion interface for selecting a character name in response to a third touch operation for selecting a character name in any one of the character name display interfaces;

In some embodiments, the first display unit includes:

an eighth display subunit, configured to display an audio production interface of the written work, where the audio production interface includes a written work importing control;

a ninth display subunit, configured to display a works selection interface in response to a touch operation for the works introduction control, where the works selection interface includes display tags of a plurality of works;

an importing subunit, configured to import a target works corresponding to the target display tag in response to a selection operation of the target display tag in the display tags for the plurality of works;

In some embodiments, the parsing subunit includes:

the first recognition module is used for recognizing a reference word in the target written works, wherein the reference word is a word or phrase conforming to a preset part of speech;

the first determining module is used for determining the role pointed by each role name in the plurality of outputted role names to obtain the mapping relation between the role names and the roles;

In some embodiments, the first determining module comprises:

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

As can be seen from the above description, in the audio production device for a works according to the embodiments of the present application, in response to a work importing operation for a target works, the first display unit 301 performs role analysis on the target works and displays a role name display interface corresponding to the target works, where the role name display interface includes a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to a target role; in response to a trigger operation of entering a character dubbing interface, the second display unit 302 displays the character dubbing interface, wherein the character dubbing interface comprises a first character name list of characters to be dubbed and a first candidate tone model list, and the first character name list is a list generated according to character names displayed in a character name display interface; the dubbing unit 303 obtains the tone model setting operation for each character in the first character name list received in the character dubbing interface, and dubs the dialogue text of the character based on the set tone model of each character, so as to obtain the target audio of the target written work.

The embodiment of the application also provides a computer device, which can be a terminal or a server, as shown in fig. 7, and is a schematic structural diagram of the computer device provided by the application. Specifically, the present application relates to a method for manufacturing a semiconductor device.

The computer device may include one or more processing cores 'processing units 401, one or more storage media's storage units 402, a power module 403, and an input module 404, among other components. Those skilled in the art will appreciate that the computer device structure shown in FIG. 7 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components. Wherein:

the processing unit 401 is a control center of the computer device, connects respective parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the storage unit 402, and calling data stored in the storage unit 402. Optionally, processing unit 401 may include one or more processing cores; preferably, the processing unit 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, an object interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated in the processing unit 401.

The storage unit 402 may be used to store software programs and modules, and the processing unit 401 executes various functional applications and data processing by running the software programs and modules stored in the storage unit 402. The storage unit 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, and web page access), etc.; the storage data area may store data created according to the use of the computer device, etc. In addition, the storage unit 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory unit 402 may also include a memory controller to provide the processing unit 401 with access to the memory unit 402.

The computer device further comprises a power module 403 for supplying power to the respective components, and preferably, the power module 403 may be logically connected to the processing unit 401 through a power management system, so that functions of managing charging, discharging, and power consumption management are implemented through the power management system. The power module 403 may also include one or more of any components, such as a direct current or alternating current power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The computer device may also include an input module 404, which input module 404 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to object settings and function control.

Although not shown, the computer device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processing unit 401 in the computer device loads executable files corresponding to the processes of one or more application programs into the storage unit 402 according to the following instructions, and the processing unit 401 executes the application programs stored in the storage unit 402, so as to implement various functions as follows:

responding to the work importing operation of the target written works, performing role analysis on the target written works and displaying a role name display interface corresponding to the target written works, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to a target role; responding to a triggering operation of entering a role dubbing interface, displaying the role dubbing interface, wherein the role dubbing interface comprises a first role name list of a role to be dubbed and a first candidate tone model list, and the first role name list is a list generated according to the role names displayed in a role name display interface; and acquiring tone model setting operation of each character in the first character name list received in the character dubbing interface, dubbing dialogue text of the character based on the set tone model of each character, and obtaining target audio of target literal works.

It should be noted that, the computer device provided in the embodiment of the present application and the method in the foregoing embodiment belong to the same concept, and the specific implementation of each operation above may refer to the foregoing embodiment, which is not described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, embodiments of the present application provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Since the instructions stored in the computer readable storage medium may perform the steps in any of the methods provided in the embodiments of the present application, the beneficial effects that any of the methods provided in the embodiments of the present application can be achieved are detailed in the previous embodiments, and are not described herein.

Wherein according to an aspect of the application, a computer program product or a computer program is provided, the computer program product or computer program comprising computer instructions stored in a storage medium. The processor of the computer device reads the computer instructions from the storage medium and executes the computer instructions to cause the computer device to perform the method provided in various alternative implementations of the audio production method of a written work described above.

The above description of the audio production method, device and computer readable storage medium of the written works provided by the embodiments of the present application applies specific examples to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present application, the present description should not be construed as limiting the present application in summary.

Claims

1. A method of audio production of a written work, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

displaying a text content display page, wherein the text content display page comprises a text display area and a dubbing setting area, the text content display area displays a target text, and the dubbing setting area displays a second character name list containing characters in the target text;

Responding to touch operation for a target character name displayed in the second character name list, and displaying a second candidate tone model list of the target character name;

and in response to a selection operation for a first target candidate timbre model in the second candidate timbre model list, updating dubbing of the target character dialogue text by adopting the first target candidate timbre model.

3. The method of claim 1, wherein the character dubbing interface further comprises a bystander dubbing settings control, the method further comprising:

responding to the touch operation of the bystander dubbing setting control, and displaying a third candidate tone model list of the bystander text;

and in response to a selection operation of a second target candidate timbre model in the third candidate timbre model list, dubbing the bystander text based on the second target candidate timbre model.

4. A method according to any one of claims 1 to 3, further comprising:

displaying a volume adjusting component and a language adjusting component on the character dubbing interface;

the step of obtaining the tone model setting operation of each character in the first character name list received in the character dubbing interface, and dubbing the dialogue text of the character based on the set tone model of each character to obtain the target audio of the target literal work, comprising the following steps:

5. The method of claim 1, wherein the first candidate tone model list includes a plurality of tone type controls, the obtaining a tone model setting operation received in the character dubbing interface for each character in the first character name list, and dubbing dialogue text of the character based on the set tone model of each character, to obtain target audio of the target works of characters, including:

determining target dialogue text related to any selected character name in the target written works in response to a first touch operation for the selected character name in the first character name list;

responding to touch operation of a target tone type control in the first candidate tone model list, and displaying a plurality of tone model controls corresponding to the target tone type;

Responding to touch operation of a target tone model control in the plurality of tone model controls, and acquiring a target tone model;

and dubbing the target dialogue text by adopting the target tone model, and traversing each character name to obtain target audio of the target written work.

6. The method according to claim 1, wherein the method further comprises:

responding to a second touch operation aiming at any selected character name in the character name display interface, and displaying a character name modification interface of the selected character name;

and acquiring the modification operation received by the role name modification interface, and modifying the selected role name according to the modification operation.

7. The method of claim 6, wherein the method further comprises:

responding to a third touch operation for any selected character name in the character name display interface, and displaying a character name deletion interface for the selected character name;

and deleting the selected character name in the first character name list in response to a confirmation deleting operation received by the character name deleting interface.

8. The method of claim 1, wherein in response to a work importing operation on a target work, performing role resolution on the target work and displaying a role name presentation interface corresponding to the target work, the role name presentation interface including a plurality of role name presentation areas, each target role name presentation area presenting at least one role name corresponding to a target role, comprising:

Displaying an audio production interface of the written works, wherein the audio production interface comprises a written work importing control;

responding to the touch operation of the written work importing control, displaying a written work selecting interface, wherein the written work selecting interface comprises display labels of a plurality of written works;

responding to the selection operation of a target display label in the display labels of the plurality of written works, and importing the target written works corresponding to the target display label;

performing role analysis on the target written works;

and displaying a role name display interface corresponding to the target written work according to the role analysis result, wherein the role name display interface comprises a plurality of role name display areas, and each target role name display area displays at least one role name corresponding to the target role.

9. The method of claim 8, wherein said performing character resolution on said target written work comprises:

identifying a reference word in the target written work, wherein the reference word is a word or phrase conforming to a preset part of speech;

inputting the reference word into a first preset neural network model to perform character name recognition, and obtaining a plurality of character names.

10. The method according to claim 9, wherein the method further comprises:

and determining a role name set corresponding to each role according to the mapping relation.

11. The method of claim 10, wherein the determining the role to which each of the plurality of outputted role names points, obtains a mapping relationship between a role name and a role, comprises:

inputting the outputted plurality of character names into a second preset neural network model for coreference resolution identification to obtain outputted plurality of character name clusters;

and determining the roles corresponding to each role name cluster to obtain the mapping relation between the role names and the roles.

12. An audio production device for a written work, the device comprising:

13. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the audio production method of a written work of any one of claims 1 to 11.

14. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the audio production method of a written work of any of claims 1 to 11 when the computer program is executed.

15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in the audio production method of a written work according to any one of claims 1 to 11.