CN114023358B

CN114023358B - Audio generation method for dialogue novels, electronic equipment and storage medium

Info

Publication number: CN114023358B
Application number: CN202111424100.3A
Authority: CN
Inventors: 李铭瀚; 刘龙
Original assignee: Zhangyue Technology Co Ltd
Current assignee: Zhangyue Technology Co Ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2023-07-18
Anticipated expiration: 2041-11-26
Also published as: CN114023358A

Abstract

The present disclosure relates to an audio generation method of a conversation novel, an electronic device, and a storage medium. The audio generation method of the dialogue novel comprises the following steps: a plurality of audio files corresponding to dialogue novels are obtained, the dialogue novels comprise a plurality of character texts, and each audio file corresponds to one character text; determining a target sound image position corresponding to each audio file according to the novel role to which each audio file belongs; and for each audio file, adjusting the audio file from the initial sound image position to the target sound image position to obtain an adjusted audio file. According to the embodiment of the disclosure, the reading experience of the user can be improved.

Description

Audio generation method for dialogue novels, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an audio generation method, an electronic device and a storage medium for a dialogue novel.

Background

To enhance the user's electronic reading experience, dialogue novels are increasingly attracting attention from related technicians. A dialogue novel is a novel type of novel with a character dialogue as a basic structural mode and a presentation form. By reciting events, expanding episodes, mating environments, and characterizing personas in conversations between work personas.

However, the dialog novel in the current stage can only provide a visual reading mode for the user, so that the reading experience of the user is single, and the reading experience of the user is influenced.

Disclosure of Invention

In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides an audio generation method of a conversation novel, an electronic device, and a storage medium.

In a first aspect, the present disclosure provides an audio generation method of a dialog novel, including:

a plurality of audio files corresponding to dialogue novels are obtained, the dialogue novels comprise a plurality of character texts, and each audio file corresponds to one character text;

determining a target sound image position corresponding to each audio file according to the novel role to which each audio file belongs;

and for each audio file, adjusting the audio file from the initial sound image position to the target sound image position to obtain an adjusted audio file.

In a second aspect, the present disclosure provides an electronic device comprising a processor and a memory for storing executable instructions that cause the processor to:

In a third aspect, the present disclosure provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the audio generation method of the dialog novel of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

according to the audio generation method, the electronic device and the storage medium of the dialogue novel, after the audio files of the plurality of novel roles of the dialogue novel are acquired, the target sound image position of each audio file can be determined according to the novel roles. Then, the audio files with the adjusted roles of the novels in the dialog novels are positioned at different sound image positions by adjusting the sound image positions of the audio files to the target sound image positions. Therefore, the embodiment of the disclosure can provide the audio files of the novel roles in different audio and video positions for the user, so that the user can read the dialogue novel in an audio reading mode, and the audio files of the novel roles have different audio and video positions, so that the novel roles are clear, vivid and interesting, and the reading experience of the user is improved.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a flow chart illustrating an audio generation method of a dialogue novel according to an embodiment of the present disclosure;

FIG. 2 illustrates a conversation display interface schematic of an example conversation novel provided by embodiments of the present disclosure;

FIG. 3 illustrates a flow diagram of an alternative audio generation method for dialogue novels provided by embodiments of the present disclosure;

FIG. 4 illustrates a flow diagram of another alternative audio generation method for dialogue novels provided by embodiments of the present disclosure;

FIG. 5 illustrates a flow diagram of yet another alternative audio generation method for dialogue novels provided by embodiments of the present disclosure;

fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The embodiment of the disclosure provides an audio generation method, electronic equipment and a storage medium for dialogue novels, wherein the dialogue novels can adjust audio files of different novels to different sound image positions.

The audio generation method of the dialog novels provided in the embodiments of the present disclosure will be first described with reference to fig. 1-5.

The audio generation method of the dialogue novel provided by the embodiment of the disclosure can be used by the electronic equipment capable of providing the electronic book reading function. The electronic device may include, but is not limited to, mobile terminals such as smartphones, notebook computers, personal Digital Assistants (PDAs), tablet computers (PADs), portable Multimedia Players (PMPs), in-vehicle terminals (e.g., in-vehicle navigation terminals), wearable devices, etc., as well as stationary terminals such as digital TVs, desktop computers, smart home devices, etc. It should be noted that, the execution body of the audio generating method of the dialog novel in the embodiment of the disclosure may also be other devices with an audio adjustment function, such as an electronic reading platform supporting the audio adjustment function or a server of a Text To Speech (TTS) engine, which is not limited thereto.

Fig. 1 is a flow chart illustrating an audio generation method of a dialogue novel according to an embodiment of the present disclosure.

As shown in fig. 1, the audio generation method of the dialog novel may include the following steps.

Step S110, a plurality of audio files corresponding to the dialogue novels are obtained. Wherein the dialog novel includes a plurality of character texts.

The plurality of audio files of the dialog novel in the embodiments of the present disclosure may be TTS converted from a plurality of character texts of the dialog novel. Among them, TTS conversion is a technique capable of converting text into audio.

For character text, each character text corresponds to a novel character of a dialog novel. In particular, the character text of any of the novel characters may include a collection of speaking content belonging to that novel character in the dialog novel.

In one example, fig. 2 illustrates a conversation display interface schematic of an example conversation novel provided by embodiments of the present disclosure. As shown in fig. 2, a conversation novice may present conversation content for multiple novice characters, such as character a and character B and bystanding P3.

Wherein the character text of character a may include a plurality of dialog sub-texts, wherein the text content in each dialog box 101 of character a may be one dialog sub-text of character a. Similarly, the text content in each dialog box 102 of character B may be a piece of dialog sub-text of character B.

For audio files, each audio file corresponds to a character text. Alternatively, a character text may be converted into an audio file. It should be noted that, according to other situations or actual requirements, it may be selected to convert a part of the sub-text in the character text into an audio file, for example, to convert a sub-text belonging to a chapter in the character text into an audio file, which is not limited specifically.

Having fully introduced the concepts of dialog novels, character text, audio files, etc., a description of the specific implementation of S110 follows.

In some embodiments, implementation of S110 may include: a plurality of audio files corresponding to the dialog novels sent by the target server or the target TTS engine are received. The target server may be a server having a TTS conversion function, which is not particularly limited. The TTS engine may be a local offline engine or an online engine, which is not particularly limited.

Optionally, the specific implementation of obtaining the plurality of audio files may include: the method comprises the steps of sending a novel text of a dialogue novel to a target server or a target TTS engine, and returning a plurality of audio files after the target server or the target TTS engine generates the audio files of each of a plurality of character texts based on the novel text of the dialogue novel.

In other embodiments, implementation of S110 may include: the novel text of the dialogue novel is acquired. And performing text-to-speech conversion on the novel text of the dialogue novel to obtain audio files corresponding to the plurality of character texts of the dialogue novel.

Step S120, according to the novel role of each audio file, determining the target sound image position corresponding to the audio file.

The target sound image position of each audio file may be a virtual sound source position of the audio file perceived by the user's ear. Wherein audio files for different novel roles may be set to different target sound image locations.

Alternatively, the target sound image position of the audio file may be determined according to the first character type of the novel character, the second text label of the character text corresponding to the novel character, the sound image position difference of the novel character compared with other novel characters, and the like, which can reflect the character characteristics.

Next, three embodiments of S120 will be described in order from step S121 to step S122, step S123 to step S124, and step S125 to step S126.

In some embodiments, the target sound image location of the audio file may be determined based on the first character type of the novel character.

Accordingly, fig. 3 shows a schematic flow chart of an alternative audio generation method for dialogue novels according to an embodiment of the disclosure. Fig. 3 is different from fig. 1 in that step S120 may specifically include step S121 and step S122.

Step S121, a first character type of a novel character to which each audio file belongs is determined.

Alternatively, the first character type may be divided according to importance degrees of the novel characters. For example, it may be divided into a primary role, a secondary role, and a bystander role. Alternatively, in order to further improve the vividness of the characters, the primary characters, the secondary characters may be further divided into a plurality of primary characters of different importance levels and a plurality of secondary characters of different importance levels, which is not limited. It should be noted that, the embodiments of the present disclosure may further divide the character into a plurality of other first character types according to the importance degree of the character, which is not limited in particular.

Alternatively, text analysis may be performed on the novel text to determine a first character type for each character. Specifically, the first character type may be divided according to the number of sub-texts of the character text corresponding to the novel character, the text display position of the character text, the first text label of the novel character in the novel text, and the like, which can reflect the importance of the character. Next, the following sections of the embodiments of the present disclosure will develop specific descriptions of the manner in which the above-described plurality of data are divided into the first character types in turn by the plurality of embodiments.

In one embodiment, the first character type may be determined based on the number of sub-text of the character text.

Accordingly, step S121 may specifically include step a11 and step a12.

Step a11, determining the number of the sub-texts of the role texts corresponding to the novel roles according to the novel roles to which each audio file belongs.

In the embodiment of the disclosure, text content presented in a dialog box in the character text of the novel character can be used as a sub-text. For example, the speaking content of a novel character in the novel text between double quotation marks can be presented in a dialog box, and correspondingly, the speaking content between the double quotation marks can be regarded as a sub-text. Or each sentence in the speaking content between a double quotation mark "" can be presented in a dialog box in the novel text, and accordingly, each sentence can be used as a sub-text, which is not particularly limited.

The number of the sub-texts of the character text corresponding to the novel character is used for reflecting the speaking number of the novel character. Alternatively, the number of sub-texts of the character text corresponding to the novel character may be the number of sub-texts of the character text corresponding to the novel character in the whole dialogue novel, or the number of sub-texts of the character text corresponding to the novel character in one or more chapters, and the statistical dimension of the number of sub-texts is not particularly limited.

Step a12, determining a first character type of the novel character to which each audio file belongs according to the number of the sub-texts. Alternatively, the number of sub-texts is proportional to the importance of the character type. For example, the number of sub-texts of the primary character is greater than the number of sub-texts of the secondary character. Further, the main character may be divided into a plurality of main characters of different importance levels according to the number of sub-contents, wherein the higher the level, the more the number of sub-contents of the main character. And, similarly, the secondary character may be divided into a plurality of different levels of secondary characters, wherein the higher the level the greater the number of sub-contents of the secondary character.

In one example, the number of sub-texts of each dialog character may be arranged in order from high to low, and then the pre-set number of dialog characters are used as primary characters with sequentially reduced importance, and the subsequent dialog characters are used as secondary characters with sequentially reduced importance. The preset number may be a preset value, or may be set according to actual situations or specific requirements, which is not limited.

It should be noted that, in the embodiment of the present disclosure, the first character type may also be selected according to the number of the sub-texts in other manners, which is not limited specifically.

According to the method and the device, the number of the important characters in the novel is often large, so that the number of the sub-texts capable of representing the number of the important characters in the novel is selected, and the novel can be accurately divided into a plurality of first character types according to the importance of the novel.

After the specific content of the first character type determined according to the number of the sub-texts of the character text is introduced, a specific description is next developed of the specific content of the first character type determined according to the text display position of the character text.

In another embodiment, the first character type may be determined according to a text display position of the character text.

Accordingly, step S121 may specifically include step a13 and step a14.

And a step a13, determining the text display position of the character text corresponding to the novel role according to the novel role to which each audio file belongs.

The text display position of the character text corresponding to the novel character can be the display position of the character text of the novel character in the dialogue display interface.

Illustratively, continuing with FIG. 2 as an example, the character text for character A is displayed in the right-hand position on the dialog display interface, the character text for character B is displayed in the left-hand position on the dialog display interface, and the character text next to is displayed in the middle region of the dialog display interface.

Step a14, determining a first character type of the novel character to which each audio file belongs according to the text display position.

Alternatively, the novel character displayed at the right side of the dialog display interface may be determined as the main character; determining a novel character displayed at a left position of the dialogue display interface as a secondary character; the novel character displayed in the middle of the dialogue display interface is determined as the bystanding.

Illustratively, continuing with the example of FIG. 2, character A may be determined as a primary character, character B may be determined as a secondary character, and side P1 may be determined as a side by side, for example, where the character text of the novel character is displayed within the dialog display interface.

Optionally, if the user switches the display position of the character content of the novel character, the first character type of the character can be switched correspondingly, and then the target sound image position of the novel character can be switched flexibly.

According to the embodiment, the main role is often located on the right side of the dialog box, the secondary role is often located on the left side of the dialog box, and the side margin is often located in the middle of the dialog box, so that different first role types can be accurately reflected by the text display positions of the role texts, and the role roles can be accurately divided into a plurality of first role types according to the importance of the role roles according to the text display positions of the role texts.

After the description of the specific content of the first character type determined according to the text display position of the character text is made, a specific description will be next given of the specific content of the first character type determined according to the first text label.

In yet another embodiment, the first character type may be determined from a first text label of the character text corresponding to the novel character.

Accordingly, step S121 may specifically include step a15 and step a16.

Step a115, for each of the novel roles of the audio file, parsing the first text label of the role text corresponding to the novel role from the novel text of the dialogue novel. The first text label may be a text label conforming to a preset format, such as a speech synthesis markup language (Speech Synthesis Markup Language, SSML) label. Alternatively, the first text label may be a text written by the user in a novel, or a text added by the user to a novel corresponding to the audio file in the process of editing the audio file through the audio editing tool, which is not particularly limited.

Alternatively, the first text label may be a label capable of reflecting the importance degree of the novel character. For example, the first text label may be a character type label of the novel character and/or a text display position label of the novel character. The character type label is used for marking a first character type to which the novel character belongs, and the text display position label is used for marking the display position of the character text of the novel character on the dialogue display interface. The first character type and the text display position label may refer to the related descriptions of the above parts of the embodiments of the present disclosure, and are not described herein.

Step a116, determining a first character type of the novel character to which each audio file belongs according to the first text label.

Alternatively, if the first text label is a character type label, the first character type of the novel character may be determined directly according to the first text label.

Alternatively, if the first text label is a text display position label, the determining the related content of the first character type according to the first text label is similar to the determining the target sound image position by using the text display position, which is not described herein.

It should be noted that, in the embodiment of the present disclosure, other parameters capable of reflecting importance of the role may be selected to determine the first role according to the actual scenario and specific requirements, which is not limited in particular.

After the introduction of step S121, a specific explanation of step S122 follows.

Step S122, according to the first character type, the target sound image position corresponding to each audio file is determined. Alternatively, the target sound image location may be a sound image offset from the initial sound image location.

In one embodiment, different target sound image locations may be determined for different character types. In one example, the primary character may be offset a first offset toward one sound image direction and the secondary character may be offset a second offset toward the other sound image direction. For example, the primary character sound image position may be shifted to the right and the secondary character sound image position to the left. The first offset amount and the second offset amount may be the same or different, and are not limited thereto. Alternatively, the sound image position of the side character may not be shifted or moved in a direction different from the primary character and the secondary character, which is not particularly limited.

Alternatively, different offsets may be set for primary roles of different importance levels and secondary roles of different importance levels. For example, the lower the importance level, the larger the offset thereof. The sound image directions of the multiple roles belonging to the same first role type can be the same or different, and the method is not particularly limited.

After the first implementation of step S120 is introduced through steps S121-S122, the second implementation of step S120 is specifically explained using step S123 and step S124 in the examples of the present disclosure next.

In other embodiments, the target sound image location may be determined from a second text label of the corresponding character text of the novel character.

Accordingly, fig. 4 shows a flow diagram of another alternative audio generation method for dialogue novels provided by embodiments of the present disclosure. Fig. 4 differs from fig. 1 in that step S120 may specifically include step S123 and step S124.

Step S123, aiming at the novel role of each audio file, acquiring a second text label of the role text corresponding to the novel role from the novel text of the dialogue novel. The second text label is a sound image position label of the audio file corresponding to the novel role. Optionally, the second text label is used for marking a target sound image position or a sound image offset of the audio text corresponding to the novel role.

Specifically, other contents of the second text label are similar to those of the first text label, and reference may be made to the related contents of the first text label, which will not be described herein.

Step S124, according to the second text labels, the target sound image position corresponding to each audio file is determined.

Specifically, the target sound image position may be determined directly according to the target sound image position or the sound image offset marked by the second text label, which will not be described herein.

It should be noted that, in the embodiment of the present disclosure, the target sound image position of the audio file of each novel role may be determined by using the second text label, so that the user may flexibly adjust the target sound image position by using the second text label, and convenience in audio generation of the dialog novel is improved.

After the second implementation of step S120 is introduced through step S123-step S124, the third implementation of step S120 is specifically explained using step S125 and step S126 in the examples of the present disclosure next.

In other embodiments, the target sound image location may be determined based on a sound image location difference of the novice character compared to other novice characters.

Accordingly, fig. 5 shows a flow diagram of yet another alternative audio generation method for dialog novels provided by embodiments of the present disclosure. Fig. 5 differs from fig. 1 in that step S120 may specifically include step S125 and step S126.

Step S125, for each of the novel roles of the audio files, a difference in sound image position of the novel role compared with other novel roles is determined in the novel text of the dialogue novel.

Alternatively, the sound image position difference may be obtained by analyzing a distance keyword in the novel text, which can reflect a change in distance between novel characters. For example, the "character a has come to the side/ear of character B", "character a has left away at one time", and the like.

In one example, the distance keywords may be quantized to different sound image position differences, such as a larger quantization value as the distance is larger, which is not particularly limited. It should be noted that, the sound image position differences corresponding to the quantization of the keywords with different distances may be set according to the actual situation and the specific scene, which is not limited specifically.

Step S126, determining the target sound image position corresponding to each audio file according to the sound image position difference and the sound image positions of other novel roles.

In one example, for each of the audio files, the target sound image position calculated in step S126 may be used as the target sound image position of the audio file for which the distance change is generated this time to a plurality of sub-texts between the next distance change generation of the new and other new novel characters. That is, for an audio file of the same character text, the target sound image position thereof may vary with the distance between characters.

It should be noted that, in the embodiment of the present disclosure, other character feature data may be selected to determine the target sound image position of the target file according to a specific scene and actual requirements, which is not limited in particular.

Step S130, for each audio file, the audio file is adjusted from the initial sound image position to the target sound image position, and the adjusted audio file is obtained. That is, any adjusted audio file has its sound image position at the target sound image position corresponding to the audio file.

In some embodiments, S130 may specifically include step b11 and step b12 described below.

And b11, determining the audio adjustment amount of the audio file according to the obtained target sound image position.

The audio adjustment amount may be an audio parameter that can shift the sound image position of the audio file. Optionally, the audio adjustment comprises at least one of: the corresponding adjusting level difference, adjusting time difference and sound channel of the audio file.

Wherein the adjustment level difference may be the difference in the level of the left and right channel audio of the adjustment audio file.

The adjustment time difference may be a difference in time of adjusting left and right channel audio of the audio file.

The audio file belongs to the channels which can be completely left channel and right channel or left and right, and is positioned in the middle.

And step b12, adjusting the audio file according to the audio adjustment amount.

Specifically, the audio file may be adjusted in accordance with the audio adjustment amount.

According to the embodiment of the disclosure, the audio file can be accurately adjusted through the audio adjustment amount, and the adjustment precision is improved. And through the adjustment mode, the adjusted audio files can be presented as stereo sound effects, and the user reading interestingness is improved through the stereo sound effects, so that the user reading experience is improved.

In some embodiments, S130 may specifically include step b21 described below.

And b21, adjusting the virtual sound source of the audio file from the initial sound image position to the target sound image position by utilizing a Head related impulse response (Head-Related Impulse Response, HRIR) algorithm to obtain an adjusted audio file.

In one example, an HRIR algorithm may be utilized to determine an HRIR parameter corresponding to the target sound image location, and then the audio file may be adjusted using the HRIR parameter.

Through the embodiment of the disclosure, the audio file can be accurately adjusted through the HRIR algorithm, so that the adjustment precision is improved. And through the HRIR algorithm, the adjusted audio files can be presented as virtual surround sound effects, and the user reading interestingness can be further improved through the virtual surround sound effects, so that the user reading experience is further improved.

In some embodiments, to further increase the interest of the user in reading, S130 may specifically include the following step b31.

Step b31, for each audio file, adjusting the audio file from the initial sound image position to the target sound image position, and adjusting the tone color of the audio file according to the tone color parameters to obtain an adjusted audio file. That is, the sound image position of the adjusted audio file is at the target sound image position, and the tone color conforms to the tone color parameter.

In order to improve the reading interest of the user and facilitate different roles, different speech speeds, sound levels, etc. may be set for the audio files of different novel roles, which is not limited.

Optionally, before step b31, the audio adjustment method of the dialog novel may further include step b32 and step b33.

Step b32, determining a second character characteristic of each novel character.

Wherein the second character feature may be a character feature that is capable of reflecting a timbre characteristic of the conversational character. For example, the second character feature may be male, female, child, elderly, young, bystander, or the like.

In one example, the second character feature may be text analysis of the novel text.

In another example, the second character feature may be determined from character tags of character text of each of the novel characters in the novel text. The character labels can be labels of men, women, children, old people, young people and the like, or can be tone color template labels generated after tone color templates selected for character texts in a preset tone color template library. This is not particularly limited. In addition, for other contents of the role tag, reference may be made to the related description of the first text tag in the foregoing parts of the embodiments of the present disclosure, which is not repeated herein.

And b33, determining tone parameters corresponding to each audio file according to the second character characteristics.

Alternatively, the tone color parameter corresponding to each audio file may be preset or set by the user in real time. For example, a tone color parameter label of the character text of each novel character in the novel text can be used. This is not particularly limited. For another example, the tone parameters of the target tone color template may be obtained through a tone color selection entry, where the target tone color template may be a tone color template selected by the user for the novel role from a preset tone color template library.

In the embodiment of the present disclosure, after the audio files of the plurality of novel characters of the dialogue novel are acquired, the target sound image position of each audio file may be determined according to the novel characters. Then, the sound image position of each audio file is adjusted to the target sound image position, so that the audio files with the adjusted novel roles in the dialogue novel are positioned at different sound image positions. Therefore, the embodiment of the disclosure can provide the audio files of the novel roles in different audio and video positions for the user, so that the user can read the dialogue novel in an audio reading mode, and the audio files of the novel roles have different audio and video positions, so that the novel roles are clear, vivid and interesting, and the reading experience of the user is improved.

In some embodiments of the present disclosure, after step S130, the audio generation method of the dialog novel may further include:

and c1, generating mixed audio of the dialogue novel according to the plurality of adjusted audio files. Wherein the mixed audio may be a multi-sound image audio file mixed from audio files of a plurality of sound image locations. In some embodiments, the mixed audio for the dialog hours may be generated by itself or a plurality of adjusted audio files may be sent to the target TTS engine or target server for mixing and then the mixed audio returned by the target TTS engine or target server is received.

Alternatively, step c1 may comprise generating virtual surround audio based on the plurality of adjusted audio files. For example, for virtual surround audio, the primary character's sound image may be adjusted forward or upward, and for the secondary character's sound image further rearward or directly rendered to the back to present the sound sources of the audio of different novice characters as different sound image locations.

Still alternatively, step c2 may comprise generating stereo audio based on the plurality of adjusted audio files.

The present invention is not limited to the specific example, and may be applied to any other type of multi-sound image or mixed sound.

In the embodiment of the disclosure, the dialogue novels can be displayed in a stereo effect and a virtual surround sound effect by mixing the plurality of adjusted audio files into the stereo audio or the virtual surround sound audio, and the dialogue novels have more vivid and interesting roles through the stereo audio or the virtual surround sound audio, so that the reading experience of users is improved. And through the stereo sound effect and the virtual surrounding sound effect, the novel computer system can simulate and present the novel role to surround different positions around the user for dialogue, thereby improving the immersive experience of the user for reading.

In some embodiments of the present disclosure, after S130, the audio generation method of the dialog novel further includes steps d11 to d13.

And d11, detecting the real-time playing environment of the user terminal. Alternatively, it may be detected whether the playback device to which the user is connected supports multi-channel. For example, if the user is wearing a pair of real wireless bluetooth (TrueWirelessStereo, TWS) headphones. Also, for example, whether a portion of the channels of the multi-channel headphones are corrupted.

Step d12, if the real-time playing environment does not support multiple channels, mixing the mixed audio generated according to the multiple adjusted audio files into mono audio.

Alternatively, if the real-time playback environment supports only mono, it may be determined that the real-time playback environment does not support multi-channel. For example, the detected user only wears one TWS headset.

And d13, playing the mono audio.

According to the embodiment, when the playing environment of the user does not support multiple channels, the adjusted audio file is played in the form of the mono audio, so that the situations that the sound is too small or the sound cannot be heard due to the fact that part of audio files are concentrated on other channels are avoided, and the audio reading experience and the audio reading effect of the user are improved.

In one embodiment, after step d11, the audio generation method of the dialog novel further comprises step d14.

And d14, if the real-time playing environment supports multiple channels, playing the mixed audio generated by the multiple adjusted audio files.

The specific content of the mixed audio may be referred to the related description of the above part of the embodiments of the disclosure, which is not repeated here.

According to the embodiment, when the playing environment of the user does not support multiple sound channels, the adjusted audio files are played in a single sound channel mode, and when the playing environment supports multiple sound channels, the audio files are played in a mixed sound mode, so that the playing modes of the audio files can be flexibly adjusted according to the playing environment of the user, and the audio reading experience and the audio reading effect of the user are improved.

In one embodiment, after step d11, the audio generation method of the dialog novel further comprises step d15.

And d15, if the real-time playing environment does not support multiple channels, displaying target prompt information on the user terminal, wherein the target prompt is used for prompting the user to wear the earphone. Alternatively, the headphones may be a headphone supporting stereo playback, such as a TWS headphone, which is not particularly limited.

Through the embodiment, when the playing environment of the user does not support multiple channels, the user can be reminded to wear the earphone in time, and the use experience of the user is improved.

In some embodiments of the present disclosure, after S130, the audio generation method of the dialog novel further includes step e11 and step e12.

And e11, displaying a plurality of character texts. Wherein each character text includes at least one sub-text.

In particular, details of the character text and the sub-text may be referred to the relevant descriptions of the above parts of the embodiments of the disclosure, and are not repeated herein.

Alternatively, if the execution subject of the audio generation method of the dialog novel is the target server or the target TTS engine, the terminal device of the user may be controlled to display a plurality of character texts.

Alternatively, if the execution subject of the audio generation method of the dialog novel is a terminal device of the user, it may be controlled to display a plurality of character texts on a display screen of the terminal device.

And e12, responding to the triggering operation of the user on the target sub-text, and playing the audio corresponding to the target sub-text.

Alternatively, the triggering operation may include a gesture control operation such as clicking, double clicking, long pressing, etc. on the target sub-text, a voice control operation, or an expression control operation, etc., which is not limited herein.

Illustratively, continuing with the example of fig. 2, if the user triggered the sub-text "just repaired? "play" just repaired? "audio.

Optionally, if the execution subject of the audio generating method of the dialog novel is the target server or the target TTS engine, the terminal device of the user may be controlled to play the audio corresponding to the target sub-text.

Optionally, if the execution subject of the audio generating method of the dialog novel is a terminal device of the user, the playing module of the terminal device may be controlled to play the audio corresponding to the target sub-text.

In the embodiment of the disclosure, the audio can be played correspondingly according to the triggering operation of the user on the dialogue display interface, so that the interesting interaction of man-machine is realized, and the reading experience of the user is improved.

Optionally, to facilitate synchronization of the user's audiovisual experience, the target sub-text may be highlighted while the audio of the target sub-text is played. Such as a text box and/or font that enlarges the target sub-text, or other display characteristics, as not specifically limited.

In some embodiments of the present disclosure, after S130, the audio generation method of the dialog novel further includes step e13 and step e14.

And e13, displaying a plurality of character texts, wherein each character text comprises at least one sub-text.

Specifically, the details of step e13 may be referred to the relevant descriptions of the above parts of the embodiments of the disclosure, which are not repeated here.

And e14, responding to the triggering operation of the user on the target sub-text, and sequentially playing the target sub-text and the audio corresponding to the associated sub-text, wherein the associated sub-text and the target sub-text are continuous in position and belong to the same character text. Illustratively, continuing to take fig. 2 as an example, "just repaired? And you determine? "associated sub-text each other.

Illustratively, continuing with the example of fig. 2, if the user triggered the sub-text "just repaired? "then play" just repaired? You determine? "audio.

In the embodiment of the disclosure, after the user triggers a certain target sub-text, the user can sequentially and automatically play the associated sub-text without executing touch operation again, so that the smoothness and convenience of reading by the user are improved, and the reading experience of the user is improved.

Optionally, to facilitate synchronization of the user's audiovisual experience, the text being played is highlighted while playing the audio of either of the target and associated sub-text. Such as a text box and/or font that enlarges the text being played, or other display characteristics, without specific limitation.

In some embodiments of the present disclosure, the audio generation method of the dialog novel may further include step f11 and step f12 before S120.

Step f11, for each dialogue role, if the playing time of a plurality of sub-audios in the audio file is determined to be continuous, the prompt content in the first sub-audio is reserved. Alternatively, the alert content may be used to alert the conversational character of the speaking content and other text content. For example, the hint content may be "xx talk/channel" or the like.

In one embodiment, the playtime continuity of the sub-audio corresponding to the sub-text may be determined based on the continuity of the sub-text in the character text of the dialog character. For example, if there is no text content of other character text between adjacent sub-texts belonging to the same character text, it is determined that the playing time of the sub-audio corresponding to the adjacent sub-text is continuous.

In one embodiment, the playing time continuity of the sub-audio corresponding to the sub-text is determined according to the display continuity of the sub-text of the character text of the dialog character on the dialog display interface.

For example, if text content in which other characters such as a character side or other character is displayed is not inserted between adjacent sub-texts belonging to the same character text, it is determined that the play time of sub-audio corresponding to the adjacent sub-text is continuous.

And f12, splicing each continuous sub-audio into one sub-audio.

Alternatively, continuing with the example of fig. 2, the sub-text "just repaired? "," you determine? "corresponding sub-audio may be-" character a say: "just repaired? You determine? "".

According to the embodiment, for a plurality of sub-audios with continuous playing time of the same novel role, only one prompt content can be reserved to be distinguished from the sub-audios of other novel roles, the plurality of sub-audios can be continuously played during playing, and the audio playing effect is improved.

The electronic device provided by the embodiment of the disclosure may include an electronic device supporting an electronic book reading function. The electronic device may include, but is not limited to, mobile terminals such as smartphones, notebook computers, personal Digital Assistants (PDAs), tablet computers (PADs), portable Multimedia Players (PMPs), in-vehicle terminals (e.g., in-vehicle navigation terminals), wearable devices, etc., as well as stationary terminals such as digital TVs, desktop computers, smart home devices, etc. It should be noted that, the execution body of the audio generating method of the dialog novel in the embodiment of the disclosure may also be other devices with an audio adjustment function, for example, an electronic reading platform or Text To Speech (TTS) engine that supports the audio adjustment function, which is not limited thereto.

It should be noted that the electronic device 600 shown in fig. 6 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

The electronic device 600 conventionally comprises a processor 610 and a computer program product or computer readable medium in the form of a memory 620. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 621 of executable instructions (or program code) 6211 for performing any of the method steps in the note processing method described above. For example, the storage space 621 for executable instructions may include respective executable instructions 6211 for implementing the various steps in the above note processing method, respectively. The executable instructions may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, compact Disk (CD), memory card or floppy disk. Such computer program products are typically portable or fixed storage units. The memory unit may have memory segments or memory spaces, etc. arranged similarly to the memory 620 in the electronic device 600 of fig. 6. The executable instructions may be compressed, for example, in a suitable form. In general, the storage unit includes executable instructions for performing steps of the note processing method according to the present disclosure, i.e., code that is readable by a processor, such as the processor 610, for example, which when executed by the electronic device 600, causes the electronic device 600 to perform the various steps in the note processing method described above.

Of course, only some of the components of the electronic device 600 that are relevant to the present disclosure are shown in fig. 6 for simplicity, components such as buses, input/output interfaces, input devices, output devices, and the like are omitted. In addition, the electronic device 600 may include any other suitable components depending on the particular application.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the note processing methods provided by the embodiments of the present disclosure.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

In an embodiment of the present disclosure, program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The application discloses:

a1, an audio generation method of a dialogue novel comprises the following steps:

determining a target sound image position corresponding to each audio file according to the novel role of each audio file;

and aiming at each audio file, adjusting the audio file from the initial sound image position to the target sound image position to obtain an adjusted audio file.

A2, determining a target sound image position corresponding to each audio file according to the novel role to which the audio file belongs, wherein the method comprises the following steps:

determining a first role type of a novel role to which each audio file belongs;

and determining the target sound image position corresponding to each audio file according to the first character type.

A3, determining a first role type of the novel role to which each audio file belongs according to the method of A2, wherein the determining the first role type of the novel role to which each audio file belongs comprises:

determining the number of the sub-texts of the role texts corresponding to the novel roles according to the novel roles of each audio file;

and determining a first role type of the novel role to which each audio file belongs according to the number of the sub-texts.

A4, determining a first role type of the novel role to which each audio file belongs according to the method of A2, wherein the determining the first role type of the novel role to which each audio file belongs comprises:

determining a text display position of a character text corresponding to the novel role according to the novel role to which each audio file belongs;

and determining a first character type of the novel character to which each audio file belongs according to the text display position.

A5, determining a first role type of the novel role to which each audio file belongs according to the method of A2, wherein the determining the first role type of the novel role to which each audio file belongs comprises:

aiming at the novel role of each audio file, a first text label of a role text corresponding to the novel role is analyzed in the novel text of the dialogue novel;

and determining a first character type of the novel character to which each audio file belongs according to the first text label.

A6, the method of A5, wherein the first text label comprises at least one of:

and the first character type label of the novel character and the text display position label of the character text corresponding to the novel character.

A7, determining a target sound image position corresponding to each audio file according to the novel role to which the audio file belongs, wherein the method comprises the following steps:

The determining, for each novel role to which the audio file belongs, a target sound image position corresponding to the audio file includes:

aiming at the novel role of each audio file, acquiring a second text label of a role text corresponding to the novel role from the novel text of the dialogue novel, wherein the second text label is a sound image position label of the audio file corresponding to the novel role;

and determining the target sound image position corresponding to each audio file according to the second text label.

A8, determining a target sound image position corresponding to each audio file according to the novel role of the audio file according to the A1, wherein the determining the target sound image position corresponding to the audio file comprises the following steps:

for each of the audio files, determining a sound image position difference of the novel role compared with other novel roles in novel text of the dialogue novel;

and determining the target sound image position corresponding to each audio file according to the sound image position difference and the sound image positions of the other novel roles.

A9. the method according to any of A1-A8, wherein the obtaining a plurality of audio files corresponding to dialogue novels includes:

Acquiring a novel text of a dialogue novel;

and performing text-to-speech conversion on the novel text to obtain an audio file corresponding to each role text.

A10, the method according to any one of A1-A9, wherein the adjusting the audio file from the initial sound image position to the target sound image position to obtain the adjusted audio file comprises:

determining the audio adjustment amount of the audio file according to the obtained target sound image position;

and adjusting the audio file according to the audio adjustment amount.

A11, the method of a10, wherein the audio adjustment comprises at least one of:

and adjusting the level difference, the time difference and the sound channel to which the audio file belongs, wherein the sound level difference, the time difference and the sound channel are correspondingly adjusted by the audio file.

A12, the method according to any one of A1-A11, wherein before the adjusting the audio file from the initial sound image position to the target sound image position, the method further comprises:

determining a second character characteristic for each of the novel characters;

according to the character characteristics, determining tone parameters corresponding to each audio file;

wherein, for each audio file, the audio file is adjusted from an initial sound image position to the target sound image position, and the adjusted audio file is obtained, which includes:

And aiming at each audio file, adjusting the audio file from the initial sound image position to the target sound image position, and adjusting the tone of the audio file according to the tone parameters to obtain an adjusted audio file.

A13, determining a target sound image position corresponding to each audio file according to the novel role to which the audio file belongs, wherein the method comprises the following steps:

aiming at each dialogue role, if the playing time of a plurality of sub-audios in the audio file is determined to be continuous, the prompt content in the first sub-audio is reserved;

each successive sub-audio is spliced into one sub-audio.

A14. the method according to any of A1-a13, wherein after said adjusting the audio file from an initial sound image position to the target sound image position, the method further comprises:

detecting a real-time playing environment of a user terminal;

if the real-time playing environment does not support multiple channels, mixing the mixed audio generated according to the adjusted audio files into mono audio;

playing the mono audio.

A15. the method according to a14, wherein after the detecting the real-time playing environment of the user terminal, the method further includes:

And if the real-time playing environment supports multiple channels, playing the mixed audio generated by the plurality of adjusted audio files.

A16. the method according to a14, wherein after the detecting the real-time playing environment of the user terminal, the method further includes:

and if the real-time playing environment does not support multiple channels, displaying target prompt information on the user terminal, wherein the target prompt is used for prompting the user to wear the earphone.

A17, the method according to any one of A1-A16, wherein after the adjusting the audio file from the initial sound image position to the target sound image position, the method further comprises:

displaying a plurality of character texts, wherein each character text comprises at least one sub-text;

and responding to the triggering operation of the user on the target sub-text, and playing the audio corresponding to the target sub-text.

A18. the method according to any of A1-a16, wherein after said adjusting the audio file from an initial sound image position to the target sound image position, the method further comprises:

Responding to the triggering operation of the user to the target sub-text, sequentially playing the target sub-text and the audio corresponding to the associated sub-text,

and the associated sub-text and the target sub-text are continuous in position and belong to the same character text.

A19, the method of a17 or a18, wherein the method further comprises:

highlighting the sub-text that is playing the audio.

A20. the method according to any of A1-a19, wherein after said adjusting the audio file from an initial sound image position to the target sound image position, the method further comprises:

and generating virtual surround audio based on the plurality of adjusted audio files.

A21, the method according to a20, wherein the adjusting the audio file from the initial sound image position to the target sound image position to obtain the adjusted audio file includes:

and adjusting the virtual sound source of the audio file from the initial sound image position to the target sound image position by utilizing a Head Related Impulse Response (HRIR) algorithm to obtain the adjusted audio file.

B22, an electronic device comprising a processor and a memory, the memory to store executable instructions that cause the processor to:

B23, the electronic device according to B22, wherein, when determining the target sound image position corresponding to each audio file according to the novel role to which the audio file belongs, the executable instructions specifically cause the processor to execute:

determining a first role type of a novel role to which each audio file belongs;

B24, the electronic device of B23, wherein, when executing the determining the first role type of the novel role to which each of the audio files belongs, the executable instructions specifically cause the processor to perform:

B25, the electronic device of B23, wherein, when executing the determining the first role type of the novel role to which each of the audio files belongs, the executable instructions specifically cause the processor to execute:

B26, the electronic device of B23, wherein, when executing the determining the first role type of the novel role to which each of the audio files belongs, the executable instructions specifically cause the processor to perform:

B27, the electronic device of B26, wherein the first text label includes at least one of:

B28, the electronic device according to B22, wherein, when executing the determining, according to the novel role to which each audio file belongs, the target sound image position corresponding to the audio file, the executable instructions specifically cause the processor to execute:

B29, the electronic device according to B22, wherein, when executing the novel role corresponding to each audio file and determining the target sound image position corresponding to the audio file, the executable instructions specifically cause the processor to execute:

B30, the electronic device of any one of B22-B29, wherein the executable instructions, when executing the plurality of audio files corresponding to the retrieve dialog novel, specifically cause the processor to:

acquiring a novel text of a dialogue novel;

B31, the electronic device of any one of B22-B30, wherein, when performing the adjusting the audio file from the initial sound image position to the target sound image position to obtain the adjusted audio file, the executable instructions specifically cause the processor to perform:

and adjusting the audio file according to the audio adjustment amount.

B32, the electronic device of B30, wherein the audio adjustment amount includes at least one of:

B33, the electronic device of any of B22-B32, wherein, prior to performing the adjusting the audio file from an initial sound image location to the target sound image location, the executable instructions further cause the processor to perform:

determining a second character characteristic for each of the novel characters;

wherein, when executing the adjusting the audio file from the initial sound image position to the target sound image position for each of the audio files to obtain an adjusted audio file, the executable instructions specifically cause the processor to execute:

B34, the electronic device of any one of B22-B33, wherein, prior to executing the determining the target sound image location corresponding to each of the audio files according to the novel role to which the audio file belongs, the executable instructions further cause the processor to:

each successive sub-audio is spliced into one sub-audio.

B35, the electronic device of any one of B22-B34, wherein after performing the adjusting the audio file from an initial sound image location to the target sound image location to obtain an adjusted audio file, the executable instructions further cause the processor to perform:

detecting a real-time playing environment of a user terminal;

playing the mono audio.

B36, the electronic device of B25, wherein after executing the detecting the real-time playback environment of the user terminal, the executable instructions further cause the processor to perform:

B37, the electronic device of B25, wherein after executing the detecting the real-time playback environment of the user terminal, the executable instructions further cause the processor to perform:

The electronic device of any of B22-B37, wherein after performing the adjusting the audio file from an initial sound image location to the target sound image location resulting in an adjusted audio file, the executable instructions further cause the processor to perform:

B40, the electronic device of B38 or B39, wherein the executable instructions further cause the processor to perform:

highlighting the sub-text that is playing the audio.

B41, the electronic device of any of B22-B40, wherein after performing the adjusting the audio file from an initial sound image location to the target sound image location to obtain an adjusted audio file, the executable instructions further cause the processor to perform:

B42, the electronic device of B41, wherein, when performing the adjusting the audio file from the initial sound image position to the target sound image position to obtain the adjusted audio file, the executable instructions specifically cause the processor to perform:

C43, a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement an audio generation method using the dialog novel of any of the above-mentioned A1-a 21.

Various component embodiments of the present disclosure may be implemented in whole or in part in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in an electronic device according to embodiments of the present disclosure may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present disclosure may also be embodied as a device or apparatus program (e.g., computer program and computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present disclosure may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method of audio generation for a conversation novel, the method comprising:

and aiming at each audio file, adjusting the audio file from an initial sound image position to the target sound image position to obtain an adjusted audio file, wherein the adjusted audio file is positioned at different sound image positions, and the target sound image position has a sound image offset compared with the initial sound image position.

2. The method of claim 1, wherein determining the target sound image location corresponding to each audio file according to the novel role to which the audio file belongs comprises:

determining a first role type of a novel role to which each audio file belongs;

3. The method of claim 2, wherein determining the first character type of the novel character to which each of the audio files belongs comprises:

4. The method of claim 2, wherein determining the first character type of the novel character to which each of the audio files belongs comprises:

5. The method of claim 2, wherein determining the first character type of the novel character to which each of the audio files belongs comprises:

6. The method of claim 5, wherein the first text label comprises at least one of:

7. The method of claim 1, wherein determining the target sound image location corresponding to each audio file according to the novel role to which the audio file belongs comprises:

8. The method of claim 1, wherein the determining, for each of the audio files, the target sound image location corresponding to the audio file comprises:

9. The method of any of claims 1-8, wherein the obtaining a plurality of audio files corresponding to dialog novels comprises:

acquiring a novel text of a dialogue novel;

10. The method of claim 9, wherein adjusting the audio file from an initial sound image location to the target sound image location results in an adjusted audio file, comprising:

and adjusting the audio file according to the audio adjustment amount.

11. The method of claim 10, wherein the audio adjustment comprises at least one of:

12. The method of claim 11, wherein prior to said adjusting said audio file from an initial sound image location to said target sound image location to obtain an adjusted audio file, the method further comprises:

determining a second character characteristic for each of the novel characters;

13. The method according to claim 12, wherein before determining the target sound image position corresponding to each audio file according to the novel role to which the audio file belongs, the method comprises:

each successive sub-audio is spliced into one sub-audio.

14. The method of claim 13, wherein after said adjusting said audio file from an initial sound image location to said target sound image location to obtain an adjusted audio file, said method further comprises:

detecting a real-time playing environment of a user terminal;

playing the mono audio.

15. The method according to claim 14, wherein after said detecting the real-time playing environment of the user terminal, the method further comprises:

16. The method according to claim 14, wherein after said detecting the real-time playing environment of the user terminal, the method further comprises:

17. The method of claim 16, wherein after said adjusting said audio file from an initial sound image location to said target sound image location to obtain an adjusted audio file, said method further comprises:

displaying a plurality of character texts, each character text comprising at least one sub-text;

18. The method of claim 16, wherein after said adjusting said audio file from an initial sound image location to said target sound image location to obtain an adjusted audio file, said method further comprises:

19. The method according to claim 17 or 18, characterized in that the method further comprises:

highlighting the sub-text that is playing the audio.

20. The method of claim 19, wherein after said adjusting said audio file from an initial sound image location to said target sound image location to obtain an adjusted audio file, said method further comprises:

21. The method of claim 20, wherein said adjusting the audio file from the initial sound image position to the target sound image position to obtain the adjusted audio file comprises:

22. An electronic device comprising a processor and a memory for storing executable instructions that cause the processor to:

23. The electronic device of claim 22, wherein the executable instructions, when executed to determine the target sound image location corresponding to each of the audio files based on the novel role to which the audio file belongs, specifically cause the processor to perform:

determining a first role type of a novel role to which each audio file belongs;

24. The electronic device of claim 23, wherein the executable instructions, when executing the determining the first character type of the novel character to which each of the audio files belongs, specifically cause the processor to perform:

25. The electronic device of claim 23, wherein the executable instructions, when executing the determining the first character type of the novel character to which each of the audio files belongs, specifically cause the processor to perform:

26. The electronic device of claim 23, wherein the executable instructions, when executing the determining the first character type of the novel character to which each of the audio files belongs, specifically cause the processor to perform:

27. The electronic device of claim 26, wherein the first text label comprises at least one of:

28. The electronic device of claim 22, wherein in executing the determining the target sound image location corresponding to each of the audio files based on the novel role to which the audio file belongs, the executable instructions specifically cause the processor to:

29. The electronic device of claim 22, wherein in executing the novel role for each of the audio files, the executable instructions specifically cause the processor to:

30. The electronic device of any of claims 22-29, wherein the executable instructions, when executing the plurality of audio files corresponding to the retrieve dialog novel, specifically cause the processor to perform:

acquiring a novel text of a dialogue novel;

31. The electronic device of claim 30, wherein, when performing the adjusting the audio file from the initial sound image location to the target sound image location to obtain the adjusted audio file, the executable instructions specifically cause the processor to perform:

and adjusting the audio file according to the audio adjustment amount.

32. The electronic device of claim 31, wherein the audio adjustment comprises at least one of:

33. The electronic device of claim 32, wherein prior to performing the adjusting the audio file from the initial sound image location to the target sound image location resulting in an adjusted audio file, the executable instructions further cause the processor to perform:

determining a second character characteristic for each of the novel characters;

34. The electronic device of claim 33, wherein prior to performing the determining the target sound image location corresponding to each of the audio files based on the novel role to which the audio file belongs, the executable instructions further cause the processor to perform:

each successive sub-audio is spliced into one sub-audio.

35. The electronic device of claim 34, wherein after performing the adjusting the audio file from the initial sound image location to the target sound image location resulting in an adjusted audio file, the executable instructions further cause the processor to perform:

detecting a real-time playing environment of a user terminal;

playing the mono audio.

36. The electronic device of claim 35, wherein the executable instructions, after executing the detecting the real-time playback environment of the user terminal, further cause the processor to perform:

37. The electronic device of claim 35, wherein the executable instructions, after executing the detecting the real-time playback environment of the user terminal, further cause the processor to perform:

38. The electronic device of claim 37, wherein after performing the adjusting the audio file from the initial sound image location to the target sound image location resulting in an adjusted audio file, the executable instructions further cause the processor to perform:

39. The electronic device of claim 37, wherein after performing the adjusting the audio file from the initial sound image location to the target sound image location resulting in an adjusted audio file, the executable instructions further cause the processor to perform:

40. The electronic device of claim 38 or 39, wherein the executable instructions further cause the processor to perform:

highlighting the sub-text that is playing the audio.

41. The electronic device of claim 40, wherein after performing the adjusting the audio file from the initial sound image location to the target sound image location resulting in an adjusted audio file, the executable instructions further cause the processor to perform:

42. The electronic device of claim 41, wherein, when performing the adjusting the audio file from the initial sound image location to the target sound image location to obtain the adjusted audio file, the executable instructions specifically cause the processor to perform:

43. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, causes the processor to implement an audio generation method using the dialog novel of any of the preceding claims 1-21.