CN114121028A

CN114121028A - Voice playing method, device, equipment and storage medium

Info

Publication number: CN114121028A
Application number: CN202111137785.3A
Authority: CN
Inventors: 李曼曼
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2022-03-01

Abstract

The embodiment of the application discloses a voice playing method, a voice playing device, voice playing equipment and a storage medium, and the voice playing method, the voice playing device, the voice playing equipment and the storage medium are applicable to various scenes such as cloud technology, artificial intelligence, intelligent traffic, Internet of things and auxiliary driving. The method comprises the following steps: responding to a terminal system of a user login target terminal, and displaying a tone customization prompt page; acquiring first audio data uploaded by a user based on a tone customization prompt page, and displaying a tone list page, wherein the tone list page comprises first tone configuration information determined by the first audio data, and the first audio data and the first tone configuration information correspond to the same tone; and responding to a setting instruction of a user for the target tone color configuration information in the tone color list page, and playing the audio information by the target terminal in the tone color corresponding to the target tone color configuration information. By adopting the embodiment of the application, the tone can be quickly and conveniently set for the terminal, and the applicability is high.

Description

Voice playing method, device, equipment and storage medium

Technical Field

The present application relates to the field of internet of things, and in particular, to a method, an apparatus, a device, and a storage medium for playing voice.

Background

With the development of science and technology, the voice synthesis technology makes great progress, and machine voice broadcast can be widely applied to devices such as intelligent mobile terminals, intelligent homes and vehicle-mounted sound equipment.

However, the existing speech synthesis technology is often only directed to a single application, i.e. developers of different applications only provide independent tone synthesis schemes for respective applications. Taking the intelligent mobile terminal as an example, the user can only set a specific tone for a single application program, so that the tone settings between the application programs are not intercommunicated, and the user cannot set the specific tone for the intelligent mobile terminal. And if the user needs to set a specific tone for the intelligent mobile terminal, the process is complicated when the same tone needs to be set for each application program. Therefore, how to quickly and conveniently set the tone for the terminal becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a voice playing method, a voice playing device, voice playing equipment and a storage medium, and the method, the device, the equipment and the storage medium can be used for quickly and conveniently setting tone colors for terminal equipment and are high in applicability.

In one aspect, an embodiment of the present application provides a voice playing method, where the method includes:

responding to a terminal system of a user login target terminal, and displaying a tone customization prompt page;

acquiring first audio data uploaded by the user based on the tone customization prompting page, and displaying a tone list page, wherein the tone list page comprises first tone configuration information determined by the first audio data, and the first audio data and the first tone configuration information correspond to the same tone;

and responding to a setting instruction of the user for the target tone color configuration information in the tone color list page, and playing the audio information by the target terminal according to the tone color corresponding to the target tone color configuration information.

On the other hand, an embodiment of the present application provides a voice playing apparatus, including:

the prompt page display module is used for responding to a terminal system of a user login target terminal and displaying a tone customization prompt page;

a tone list display module, configured to obtain first audio data uploaded by the user on the basis of the tone customization prompt page, and display a tone list page, where the tone list page includes the first tone configuration information determined by the first audio data, and the first audio data and the first tone configuration information correspond to the same tone;

and the voice playing module is used for responding to a setting instruction of the user aiming at the target tone color configuration information in the tone color list page, and playing the audio information by the target terminal according to the tone color corresponding to the target tone color configuration information.

In another aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other;

the memory is used for storing computer programs;

the processor is configured to execute the voice playing method provided by the embodiment of the application when the computer program is called.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to implement the voice playing method provided in the embodiment of the present application.

In another aspect, the present application provides a computer program product, where the computer program product includes a computer program or a computer instruction, and when the computer program or the computer instruction is executed by a processor, the voice playing method provided by the present application is provided.

In the embodiment of the application, after the user logs in the terminal system of the target terminal, the audio data uploaded by the user can be obtained based on the tone customization prompting page, so that the tone can be customized based on the audio data uploaded by the user, and the user experience is improved. And moreover, the tone can be quickly and conveniently customized for the target terminal based on the setting instruction of the user for each tone configuration information in the tone list page, and the applicability is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a voice playing method provided in an embodiment of the present application;

fig. 2 is a scene schematic diagram of a login page of a terminal system according to an embodiment of the present application;

fig. 3 is a schematic view of a scene of a login page of a tone recording program according to an embodiment of the present application;

fig. 4 is a scene schematic diagram of a tone recording page provided in an embodiment of the present application;

fig. 5a is a schematic view of a scene of a first guidance page provided in an embodiment of the present application;

fig. 5b is a schematic view of a scene of a second guidance page provided in the embodiment of the present application;

fig. 5c is a schematic view of a scene of a third guidance page provided in the embodiment of the present application;

fig. 6 is a schematic view of a scene of an audio information completing page provided in an embodiment of the present application;

FIG. 7a is a schematic view of a scene of a tone list page provided in an embodiment of the present application;

FIG. 7b is a schematic diagram of another scenario of a tone list page provided in an embodiment of the present application;

FIG. 8 is a schematic view of a scene of a tone setting page provided in an embodiment of the present application;

FIG. 9 is a schematic flow chart illustrating the customization of the timbre at the vehicle end according to an embodiment of the present application;

FIG. 10a is a functional framework diagram of a TTS component provided by an embodiment of the present application;

FIG. 10b is a schematic flow chart illustrating the use of a TTS component provided by an embodiment of the present application;

FIG. 11a is a timing diagram illustrating TTS service selection provided by an embodiment of the present application;

FIG. 11b is a timing diagram of setting timbre provided in an embodiment of the present application;

fig. 12 is a schematic structural diagram of a voice playing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The voice playing method provided by the embodiment of the application can be applied to relevant scenes such as the Internet of things, the Internet of vehicles and artificial intelligence, which relate to the voice broadcasting technology, can be determined based on actual application scene requirements, and is not limited herein. For example, based on the voice playing method provided by the embodiment of the application, the control of the broadcast tone of the vehicle end in the internet of things and the equipment control end (such as intelligent home equipment) in the internet of things can be realized, and the applicability is high.

The voice playing method provided by the embodiment of the present application may be executed by a server, a TTS (Text To Speech ) component, or a terminal, and may be specifically determined based on requirements of an actual application scenario, which is not limited herein. The server may be an independent physical server, such as a car networking server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services, and the application is not limited herein. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, and the like, but is not limited thereto.

Referring to fig. 1, fig. 1 is a schematic flowchart of a voice playing method provided in an embodiment of the present application. As shown in fig. 1, the voice playing method provided in the embodiment of the present application may include the following steps:

and step S11, responding to the terminal system of the user login target terminal, and displaying a tone customization prompt page.

In some possible embodiments, the target terminal is a terminal that broadcasts voice to the user, such as a vehicle terminal (i.e., a vehicle-mounted terminal). After the target terminal is started, a login page of a terminal system for logging in the target terminal, such as a two-dimensional code login page, an account password login page, and the like, can be displayed to a user through the target terminal, and can be specifically determined based on the requirements of an actual application scene, which is not limited herein.

Optionally, the login page may be displayed to the user through the target terminal in response to a login selection instruction of the user, so that the user logs in the terminal system of the target terminal based on the login page.

The user can scan the two-dimensional code of the two-dimensional code login page through the appointed application program to log in the terminal system of the target terminal, or can log in the terminal system of the target terminal through inputting a corresponding account password. After logging in a terminal system of a target terminal, a user has the use permission of using the target terminal, and further can customize tone for the user based on audio data uploaded by the user or related setting instructions.

As shown in fig. 2, fig. 2 is a scene schematic diagram of a login page of a terminal system according to an embodiment of the present application. The target terminal can display the two-dimension code login page to the user to prompt the user to scan the two-dimension code in the login page through the appointed application program for login, further, login information sent by the user scanning the two-dimension code can be obtained, the login state of the user is verified based on the login information sent by the user, and therefore whether the user successfully logs in or not is determined. If the fact that the user logs in the terminal system of the target terminal is determined to be failed, failure prompt information can be displayed to the user through the two-dimensional code login page so as to indicate the user to log in again.

Optionally, if it is determined that the user terminal corresponding to the user establishes the communication connection with the target terminal, it may be determined that the user logs in the terminal system of the target terminal. If the user terminal corresponding to the user is determined to establish Bluetooth connection with the target terminal, or the user terminal corresponding to the user and the target terminal are determined to be connected to the same local area network, the user can be determined to log in the terminal system of the target terminal.

Further, in response to the user logging in the terminal system of the target terminal, a tone customization prompt page is displayed to the user. Specifically, the tone customization prompting page can be displayed to the user through the target terminal after the user logs in the terminal system of the target terminal, or the tone customization prompting page can be displayed through the user terminal of the user after the user logs in the terminal system of the target terminal, so that the user is guided to customize the tone when the target terminal is subjected to voice broadcasting through the tone customization prompting page.

When the tone customization prompting page is displayed, different target terminals can customize the prompting page corresponding to different tones. For example, for the target terminal, different tone customization prompt pages are displayed according to one or more of the model, channel number and application type (mobile terminal, vehicle-mounted terminal, etc.) of the target terminal, which may be determined based on the actual application scene requirements, and is not limited herein.

When the target terminal displays the tone customization prompting page, a tone customization interface provided by the target terminal through an SDK (Software Development Kit) can be called, page configuration information of the tone customization prompting page corresponding to the target terminal is obtained based on the tone customization interface, and then the corresponding tone customization prompting page is displayed to a user through the target terminal based on the page configuration information.

In some feasible embodiments, for a target terminal, the application type, manufacturer, model, channel number, and the like of the target terminal also affect whether the target terminal can perform tone customization, for example, a terminal of a certain channel number does not have a configuration condition for tone customization, a terminal of a certain type does not belong to a terminal in a tone customization range, a terminal of a certain manufacturer is outside a tone customization white list, and the like, which may be determined based on actual application scene requirements, and is not limited herein.

Therefore, when the tone customization prompting page is displayed through the target terminal, the terminal information of the target terminal including the information of each item can be determined, and whether the target terminal has the tone customization authority or not is determined based on the terminal information of the target terminal. And under the condition that the target terminal has the tone customization authority, displaying a tone customization prompting page through the target terminal based on the implementation mode.

Under the condition that the target terminal does not have the tone customization authority, when the target terminal needs to broadcast the audio information through voice, the audio information can be played through the target terminal according to the default tone configuration information corresponding to the default tone configuration information on the basis of the default tone configuration information corresponding to the target terminal. The default tone color configuration information corresponding to the target terminal may be stored in the server corresponding to the target terminal, or may be stored in the local storage space of the target terminal without limitation.

When the audio information is played to the user through the target terminal by the default tone corresponding to the default tone configuration information, the application program corresponding to the audio information to be played by the target terminal can be determined, the default tone corresponding to the application program is determined based on the default tone data, and then the audio information to be played corresponding to the application program is played through the target terminal by the default tone corresponding to the application program.

And step S12, acquiring first audio data uploaded by the user based on the tone customization prompting page, and displaying a tone list page.

In some possible embodiments, the tone customization prompting page includes a tone recording control, and at this time, the tone customization prompting page may respond to a confirmation instruction triggered based on the tone recording control, and display the tone recording page to obtain the first audio data uploaded by the user through the tone recording page.

The first audio data uploaded by the user may be audio data recorded on a tone recording page by the user, or audio data acquired by the user based on a local storage space or through a network, and may be determined based on actual application scene requirements, which is not limited herein. In addition, the first audio data uploaded by the user may be audio data with a duration exceeding a duration threshold, or may be multiple segments of audio data corresponding to the same timbre, which is not limited herein.

As an example, a user may upload a broadcast voice of a news anchor as first audio data, so that a tone color may be determined as tone color configuration information of the news anchor based on the first audio data.

Specifically, the login page of the tone color recording program may be displayed in response to a confirmation instruction triggered by the user through the tone color customization prompting page, such as a confirmation instruction triggered by a tone color recording control in the tone color customization prompting page. And further responding to the user login tone recording program, and displaying a tone recording page to guide the user to perform tone customization after logging in the tone recording program.

As shown in fig. 3, fig. 3 is a scene schematic diagram of a login page of a tone recording program according to an embodiment of the present application. And displaying a two-dimensional code login page of the tone recording program to the user through the target terminal or the user terminal, and enabling the user to log in after recognizing the two-dimensional code based on the specified application program. Further, whether the user logs in the tone recording program or not can be determined based on the login information of the user, after the user successfully logs in the tone recording program, the tone recording page can be displayed through the target terminal or the user terminal, and then the first audio data uploaded by the user can be obtained through the tone recording page.

Optionally, the tone recording page includes recording text information, that is, the recording text information may be displayed through the tone recording page, where the recording text information is used to prompt the user to record audio data based on the recording text information, that is, the user needs to read the recording text information to record the first audio data.

Based on this, under the condition that the text content corresponding to the audio data recorded by the user is consistent with the recorded text information, the first audio data can be acquired. And under the condition that the text content corresponding to the first audio data recorded by the user is inconsistent with the recorded text information, displaying error prompt information through a tone recording page to prompt the user that the recorded first audio data is different from the recorded text information, and recording the first audio data based on the recorded text information again.

In the recording of the first audio data based on the recorded text information, the recording progress of the audio data and the related prompt information for prompting the user that the user is recording can be displayed through the tone recording page based on the audio data recorded by the user in real time, and the determination can be specifically based on the requirements of the actual application scene, and is not limited herein.

As shown in fig. 4, fig. 4 is a scene schematic diagram of a tone recording page provided in the embodiment of the present application. The tone recording page shown in fig. 4 displays the recorded text message "the skilful and worried people are worried about, and the disabled people do not. If the user eats the game and travels to a different boat, the user needs to read the recorded text information while pressing the voice input control for a long time so as to record the first audio data. That is, the text content of the first audio data recorded by the user needs to be consistent with the recorded text information displayed in the tone recording page, so that the recorded first audio data can be successfully uploaded. In the process, a prompt message of 'recording in progress' can be displayed through the tone recording page to prompt that the user is currently in the audio data recording process, the audio recording progress of the user can be determined based on the text content and the recording text information of the audio data recorded by the user, and the audio recording progress and the like can be displayed through the tone recording page.

In some possible embodiments, after determining that the user logs in to the tone recording program, a guide page may be further displayed to guide the user to upload audio data through the guide page.

As an example, after determining that the user logs in the tone recording program, a first guide page of the tone recording program may be displayed to determine an upload time for uploading audio data by the user through the first guide page.

As shown in fig. 5a, fig. 5a is a scene schematic diagram of a first guidance page provided in the embodiment of the present application. Based on the first guidance page shown in fig. 5a, the user may be displayed with relevant descriptive information about the customized timbre and provided with a time option to upload audio data. If the user selects the option of 'customizing later' and 'customizing immediately', a tone recording page can be displayed to the user, and first audio data uploaded by the user can be acquired through the tone recording page. The time option may be determined based on the actual application scene requirement, and is not limited herein.

And if the user selects a 'later customization' option, displaying a tone recording page to the user at the corresponding time based on the customization time requirement of the user to obtain first audio data uploaded by the user based on the tone recording page.

It should be particularly noted that the uploading time of the audio data is only an example, and may be determined specifically based on user settings and actual application scenario requirements, and is not limited herein.

As an example, after determining that the user logs in to the tone recording program, or after determining that the user currently needs to record audio data based on the first guide page, a second guide page may be displayed to prompt the user whether there is already uploaded audio data and guide the user to upload audio data based on the related control.

As shown in fig. 5b, fig. 5b is a scene schematic diagram of a second guidance page provided in the embodiment of the present application. Based on the second guidance page shown in fig. 5b, the user may be prompted that no audio data has been uploaded before, and a "customized dedicated tone" control may be provided to the user to guide the user to upload the audio data. If a confirmation instruction triggered by the user based on the 'customized exclusive tone' control is acquired based on the guidance page, a tone recording page can be displayed to the user, and first audio data uploaded by the user based on the tone recording page is acquired.

As an example, after determining that the user logs in the tone recording program, or after determining that the audio data needs to be uploaded based on any of the above-mentioned guide pages, a third guide page of the tone recording program may be displayed to the user. The third guide page can display the tone type corresponding to the first audio data which the user needs to upload to the user, so that the tone type corresponding to the first audio data uploaded by the user is determined. Based on the tone type selected by the user, when the first tone configuration information is determined based on the first audio data uploaded by the user, the accuracy of the tone corresponding to the first tone configuration information can be improved.

As shown in fig. 5c, fig. 5c is a scene schematic diagram of a third guidance page provided in the embodiment of the present application. Based on the third guidance page shown in fig. 5c, the user may be provided with a variety of timbre type options, such as "female", "male with a sound of a child", and "female with a sound of a child", among other type options. If the user selects the "female sound" type, it may be determined that the tone color corresponding to the first audio data uploaded by the user thereafter is the "female sound" type, that is, the tone color customized based on the uploaded first audio data is the "female sound" type desired by the user. Based on the method, after the tone type selected by the user is determined, the tone type corresponding to the first audio data uploaded by the user is determined, and the tone recording page is displayed to acquire the first audio data uploaded by the user based on the tone recording page.

Optionally, after the user records the first audio data based on the recorded text information, an audio information perfecting page may be displayed. Further, responding to an uploading instruction triggered by the user based on the audio information perfecting page, and acquiring first audio data uploaded by the user. The audio information perfecting page can display prompt information of 'audio recorded' to a user to prompt the user to complete recording, and the audio information perfecting page can also be used for acquiring modification information of the user on first audio data, such as naming information of the first audio data, and further acquiring the modification information when acquiring an uploading instruction triggered by the user on the basis of the audio response page, and acquiring the first audio data after perfection.

As an example, as shown in fig. 6, fig. 6 is a schematic view of a scene of an audio information completing page provided in an embodiment of the present application. Based on the audio information perfection page shown in fig. 6, perfection information for the first audio data can be provided to the user, so that name information and the like for the first audio data input by the user based on the audio perfection information page are acquired. Meanwhile, a related control for confirming the uploading of the audio data can be displayed on the audio information perfecting page, and the first audio data uploaded by the user is obtained in response to an uploading instruction triggered by the user based on the space.

In some possible embodiments, after the first audio data uploaded by the user is acquired and it is confirmed that the uploading of the first audio data by the user is completed, a tone list page including first tone configuration information determined by the first audio data may be displayed to the user through the target terminal or the user terminal. After the first audio data uploaded by the user is acquired, first tone color configuration information, such as a voice packet, corresponding to the first audio data may be generated based on the first audio data. The tone corresponding to the first tone configuration information is the same as the tone corresponding to the first audio data, that is, the first tone configuration information of the same tone can be generated based on the first audio data uploaded by the user.

Optionally, the tone list page may further include at least one or more of tone configuration information determined based on historical audio data uploaded by the user and at least one default tone configuration information corresponding to the target terminal. The tone corresponding to any two pieces of tone configuration information in the tone list page is two different tones. Specifically, the tone color configuration information of each tone color corresponding to the user may be acquired from the server corresponding to the target terminal, and a tone color list page may be generated based on each tone color configuration information.

Optionally, each piece of tone configuration information may be displayed in a corresponding tone type in the tone list page, that is, the tone type may be displayed in the tone list page to represent the corresponding tone configuration information. And the preview of the tone corresponding to each tone configuration information can be realized through the tone list page, and the like, and can be specifically determined based on the requirements of the actual application scene, which is not limited herein.

As an example, as shown in fig. 7a, fig. 7a is a schematic view of a scene of a tone list page provided in an embodiment of the present application. Based on the tone list page shown in fig. 7a, the tone types and corresponding head images corresponding to different tone configuration information can be displayed to the user, so that the user can intuitively determine the tone corresponding to each tone configuration information. And the playing controls corresponding to all tone types can be displayed through the tone list page, so that after a playing or pausing instruction of a user for any playing control is acquired based on the tone list page, the corresponding tone is previewed or previewed in a pausing mode.

Optionally, before displaying the tone list page, tone attribute information of each tone configuration information corresponding to the user may also be determined. For any tone color configuration information, the tone color attribute information corresponding to the tone color configuration information includes at least one of a data size of the tone color configuration information, a corresponding tone color type, an information identifier of the tone color configuration information, or a storage path. When the tone list page is displayed, the tone list page may be displayed based on tone attribute information of each tone configuration information corresponding to the user.

That is, the tone list page may display information such as data size of each tone configuration information, in addition to the tone types of different tone configuration information, and may be specifically determined based on the requirements of the actual application scene, which is not limited herein. Referring to fig. 7b, fig. 7b is another schematic view of a scene of a tone list page provided in an embodiment of the present application. If the tone color configuration information is regarded as voice packets with different tone colors, based on the tone color list page shown in fig. 7b, the tone color type of each voice packet, such as "xianlian" or "xianrui", can be displayed, and the related information, such as the data size of each voice packet, can also be synchronously displayed. And because the tone list page is determined based on the tone attribute information of each voice packet, if a downloading instruction for any voice packet triggered by the user based on the tone list page is detected, the voice packet (i.e. the tone configuration information) can be sent to the target terminal, so that the terminal can locally store the tone information.

And step S13, responding to the setting instruction of the user for the target tone color configuration information in the tone color list page, and playing the audio information with the tone color corresponding to the target tone color configuration information through the target terminal.

In some possible embodiments, after the tone list page is displayed, the audio information may be played in the tone corresponding to the target tone configuration information through the target terminal in response to a setting instruction of the user for the target tone configuration information in the tone list page. The target tone color configuration information may be any tone color configuration information in a tone color list page.

Specifically, in response to a setting instruction of a user for target tone color configuration information in a tone color list page, it is determined that the target tone color configuration information is synchronized to each application program of the target terminal, and further, audio information of any application program can be played through the target terminal in a tone color corresponding to the target tone color configuration information. That is, after the user sets a command for the target tone color configuration information in the tone color list page, the tone color corresponding to the target tone color configuration information can be determined as the broadcast tone color of all the application programs in the target terminal. When the target terminal needs to play the audio information of any application program, the corresponding audio information can be played based on the tone corresponding to the target tone configuration information.

Optionally, in response to a setting instruction of the user for the target tone color configuration information in the tone color list page, the target application program corresponding to the target tone color configuration information is determined, and then the audio information of the target application program can be played in the tone color corresponding to the target tone color configuration information through the target terminal. That is, after the user sets a setting instruction for the target tone color configuration information in the tone color list page, the tone color corresponding to the target tone color configuration information may be determined as the broadcast tone color of the target application program corresponding to the setting instruction. When the target terminal needs to play the audio information of the target application program, the audio information of the target application program can be played based on the tone corresponding to the target tone configuration information.

Optionally, the tone list page includes a tone setting control, and the tone setting page is displayed in response to a touch instruction triggered by the user based on the tone setting control in the tone list page. The tone setting page may be a tone setting page for any application program, and the tone setting page includes various application scenes of the application program, tone configuration information of at least one tone customized by a user, and at least one default tone configuration information. And determining tone color configuration information set by the user aiming at any application scene of the application program based on the setting instruction of the user, so as to customize the tone color of the broadcast tone corresponding to different application scenes of the application program.

Alternatively, the tone setting page may be a tone setting page for each application, where the tone setting page includes at least one application, an application scene corresponding to each application, and multiple tone configuration information including user-customized tone configuration information and default tone configuration information. Based on the tone setting page, tone customization can be simultaneously carried out on broadcast tones corresponding to different application scenes for one or more users. Or the broadcast timbres corresponding to the same application scene can be customized, that is, the broadcast timbres of different application scenes in the same application scene are the same.

As shown in fig. 8, fig. 8 is a scene schematic diagram of a tone setting page provided in the embodiment of the present application. The tone setting page for the navigation application shown in fig. 8 includes different voice broadcast scenes in the navigation application, such as electronic newspaper, a front road condition, and a safety prompt. At this time, the tone of the audio information of the front road condition application scene of the target terminal broadcasting the navigation application can be determined as the "standard male sound" in response to a relevant instruction that the user sets any tone configuration information for any application scene, for example, in response to a relevant instruction that the user sets the "standard male sound" for the front road condition application scene.

In some possible embodiments, when the setting instruction of the user for the target tone color configuration information is obtained, the setting instruction may be obtained based on an instant messaging protocol. For example, the setting instruction of the user for the target tone color configuration information may be obtained based on a Message Queue Telemetry Transport (MQTT) protocol.

In some feasible embodiments, when the target terminal plays the audio information with the tone corresponding to the target tone configuration information, the audio information to be played may be determined first, and the audio information to be played is processed based on the target tone configuration information, so as to obtain the audio information with the tone corresponding to the target tone configuration information, and then the processed audio information is played by the target terminal.

The audio information to be played may be information that any application program needs to output to a user in a voice manner through a target terminal, including but not limited to navigation information, human-computer interaction information, and the like, and may be specifically determined based on actual application scene requirements, which is not limited herein. If the navigation application of the target terminal is running instruction information which needs to be broadcasted through the target terminal in the navigation process, if the translation application of the target terminal is translation text which corresponds to the text to be translated and needs to be broadcasted to the user through the target terminal, and the like, the translation application of the target terminal determines the text to be translated and input by the user. That is, the audio information to be played may be information that is generated by the application program in the target terminal and needs to be output to the user, or may be corresponding result information that is determined by the application program in the target terminal based on the query information of the user.

Specifically, query information input by the user may be acquired through the target terminal, where the query information may be voice information input to each application program by the user through the target terminal, or may also be text information input to each application program by the user through the target terminal, and may be specifically determined based on actual application scene requirements, which is not limited herein.

Further, after the query information input by the user is obtained, the text content corresponding to the query information can be determined, and the result information of the query information pair can be determined based on the text content corresponding to the query information. Specifically, the query information may be subjected to Speech recognition based on Speech Technology (Speech Technology) to obtain corresponding text information. Optionally, the query information may be subjected to text analysis by a Natural Language Processing (NLP) technique, so as to obtain corresponding text content, semantics, and the like. And the result information corresponding to the query information can be obtained by querying the text content corresponding to the result information, predicting through a prediction model and the like.

The prediction model includes, but is not limited to, a translation model, a dialogue model, and the like, and may be constructed based on a machine learning method and the like. Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Based on machine learning and deep learning, the machine can simulate or realize the learning behavior of human beings so as to acquire new knowledge or skills, reorganize the existing knowledge structure to continuously improve the performance of the knowledge structure, and obtain corresponding translation models, conversation models and the like.

In some possible embodiments, each piece of tone color configuration information may be stored in a server, a database, a cloud storage (cloud storage), or a block chain (Blockchain), and when a target terminal needs to play audio information, the target terminal may obtain the corresponding tone color configuration information online and play the audio information in the tone color corresponding to the tone color configuration information based on the corresponding tone color configuration information.

The database can be regarded as an electronic file cabinet, namely a place for storing electronic files, and can be used for storing various tone color configuration information in the application. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A blockchain is essentially a decentralized database, a string of data blocks that are associated using cryptography. In the present application, each data block in the block chain may store each tone color configuration information. The cloud storage is a new concept extended and developed from a cloud computing concept, and means that a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network are aggregated to cooperatively work through application software or application interfaces through functions such as cluster application, a grid technology, a distributed storage file system and the like, and all tone configuration information is stored together.

Optionally, in response to a tone downloading instruction triggered by the user based on the tone list page or the tone configuration page, the corresponding tone configuration information may be obtained from a server, a block chain, a database, and the like corresponding to the target terminal and sent to the target terminal, so that the target terminal stores the corresponding tone configuration information. Therefore, when the target terminal needs to broadcast the audio information according to the tone corresponding to the tone configuration information, the target terminal can perform voice conversion on the audio information to be broadcast based on the locally stored tone configuration information to obtain the audio information with the tone corresponding to the tone configuration information, and play the processed audio information to the user.

In some possible embodiments, the tone color configuration information used by the target terminal when playing the audio information may be associated with the user, that is, after the user logs in the terminal system of the target terminal, the corresponding audio information may be played by the target terminal according to the tone color configuration information set by the user. After the user logs out, the target terminal and the user do not have an association relation any more, and at the moment, the audio information can be played through the target terminal in a default tone corresponding to the default tone configuration information.

That is, if the user logs out of the terminal system of the target terminal, the tone color configuration information corresponding to the target terminal can be restored to the default tone color configuration information, and then, after the user logs out, the audio information is played through the target terminal with the default tone color corresponding to the default tone color configuration information based on the default tone color configuration information.

If each application program in the target terminal is provided with corresponding tone color configuration information, the tone color configuration information corresponding to each application program can be restored to corresponding default tone color configuration information after the user logs out. For any application program of the target terminal, the audio information of the application program can be played through the target terminal in the default tone color corresponding to the default tone color configuration information based on the default tone color configuration information corresponding to the application program.

The following further describes the voice playing method provided in the embodiment of the present application by taking the car end as an example. As shown in fig. 9, fig. 9 is a schematic flowchart of the tone color customization at the vehicle end provided by the embodiment of the present application, and the flowchart is applicable to a TTS component in the vehicle end. The TTS component can judge whether the user logs in the system of the vehicle end or not through the tone setting entry of the vehicle end, and if the user logs in the system of the vehicle end, the tone customization prompting page can be displayed through the vehicle end. If the user is determined not to log in the system of the vehicle end, the vehicle end login page can be displayed through the vehicle end to prompt the user to log in. And further responding to a system of a vehicle end for logging in by a user, and displaying a tone customization prompting page through the vehicle end. Specifically, a tone customization prompting page can be popped up based on a personalized customization Interface provided by the vehicle end SDK, and the tone customization prompting page can be a User Interface (UI).

When determining whether the user logs in, the user can determine based on information such as user identification, and a tone customization prompting page displayed after the user logs in includes related controls for prompting the user to customize the tone, such as a touch button of a 'record voice packet', and simultaneously loads tone configuration information downloaded by the user before.

Further, if the user is confirmed to need to upload the audio data based on the tone customization prompting page, the login page of the tone recording program can be displayed through the vehicle terminal, so that the user can log in the tone recording program. And displaying the tone recording page after the user logs in, so as to acquire the audio data recorded by the user through the tone recording page and upload the audio data to the Internet of vehicles server.

Further, a tone list page may be displayed to the user, including default tone configuration information, tone configuration information determined based on audio data recorded by the user, and the like. In addition, under the condition that the vehicle-mounted terminal is determined to have the tone customization authority, the tone configuration information in the tone list page can be acquired and stored in a specific data structure, for example, the tone configuration information can be stored based on the data of the TTSdata structure.

When storing each tone configuration information, the information such as identification information, name, tone type, data size, storage path, etc. of each tone configuration information can be stored together, and the tone list page can be realized by a cycleview.

The method includes sending tone color configuration information to a vehicle end based on a storage path corresponding to the tone color configuration information in response to a download instruction of a user for any tone color configuration information, and specifically, implementing the download instruction by using a Uniform Resource Locator (URL) Connection (Connection) supporting a hypertext Transfer Protocol (HTTP) specific function.

After the vehicle end finishes downloading the tone color configuration information, the tone color corresponding to the tone color configuration information can be synchronized to each application program of the vehicle end in response to a setting instruction of the vehicle end to any tone color configuration information. When the vehicle end needs to play the audio information, the voice broadcast can be carried out based on the tone parts corresponding to the tone configuration information.

The functionality of the TTS component described above is further described below with reference to FIG. 10 a. FIG. 10a is a functional framework diagram of a TTS component provided by an embodiment of the present application. After a user logs in a system of the vehicle end, the TTS component can determine whether the vehicle end uses a default playing engine or not based on information such as a channel number of the vehicle end, and if the vehicle end uses the default playing engine, the vehicle end broadcasts audio information in a default tone color during voice broadcasting. If the fact that the TTS component is used by the vehicle end to customize the tone is determined, the user side of the TTS component can determine tone configuration information according to audio data recorded by the user by initializing the TTS, synchronize the tone configuration information to each application program, confirm that the user logs in the tone customization program through the account service interface, and ensure timely uploading of the tone configuration information through the network interface. And the service side of the TTS component can provide offline TTS service or online TTS service for the user, namely the TTS component can customize tone playing audio information through the vehicle end based on locally stored tone configuration information, also can acquire the tone configuration information in real time through a network, and customize tone playing audio information through the vehicle end based on the acquired tone configuration information. The user side and the service side of the TTS component are connected through a fixed logic Language, for example, an Android Interface Definition Language (AIDL) connection.

With further reference to FIG. 10b, FIG. 10b is a schematic flow chart illustrating the use of a TTS component provided by embodiments of the present application. As shown in fig. 10b, the TTS component may provide an offline TTS service and an online TTS service, wherein when the offline TTS service is provided, the audio information may be played through the vehicle end in a customized tone or a default tone. When the audio information is played through the vehicle end in a customized tone, the optimal scheme matched with the vehicle end can be selected based on TTS service priority.

And if the TTS application program package is stored at the vehicle end, the TTS application program package is preferentially used for providing tone customization and voice playing services for the vehicle end. The application packages include, but are not limited to, an Android Application Package (APK) and application packages supporting other operating systems. And if the TTS application program package is not stored, under the condition that the TTS SDK is stored at the vehicle end, tone customization and voice playing service are provided for the vehicle end based on the built-in TTS SDK. Otherwise, the default TTS component at the vehicle end is used for providing tone customization and voice playing service for the vehicle end. Under the condition that the TTS services cannot be normally carried out, tone customization and voice playing services can be provided for a vehicle terminal based on the highest-version accompanying TTS service, namely based on the TTS component compatible with the old version.

Referring to fig. 11a, fig. 11a is a timing diagram illustrating TTS service selection provided in the embodiment of the present application. As shown in fig. 11a, after acquiring the channel number (which may also be other terminal information) of the vehicle end, the TTS management module of the TTS component may determine a TTS service, an online TTS service, or an offline TTS service corresponding to the vehicle end. If the TTS service corresponding to the vehicle end is the off-line TTS service, the voice playing engine selected by the vehicle end is determined, and if the default playing engine of the vehicle end is used, voice broadcasting can be carried out through the default tone of the vehicle end. If the playing engine corresponding to the TTS component is used, voice broadcasting can be carried out based on the tone color configuration information set by the user. And if the TTS service corresponding to the vehicle end is the online TTS service, determining that the voice playing engine selected by the vehicle end is the playing engine corresponding to the TTS component.

Referring to fig. 11b, fig. 11b is a schematic timing diagram of setting timbre according to an embodiment of the present application. As shown in fig. 11a, the TTS management module of the TTS component may enter a display page, such as a display tone customization prompt page and a tone customization page, in response to a tone customization related instruction from the vehicle end to obtain audio data recorded by the user. Meanwhile, the logic module can acquire a tone list corresponding to the user from the internet of vehicles server, and display each tone (namely tone configuration information in the application) corresponding to the user through a tone list page, wherein the tone comprises the tone recorded based on the audio data recorded by the user. Further, based on the logic module and the TTS service module, corresponding tone setting can be completed according to a user related setting instruction, for example, the voice broadcast tone at the vehicle end is determined to be a 'boy voice' type tone, and the TTS service module can synchronize the type of tone to the vehicle networking server, so that the vehicle networking server synchronizes the tone to each application program at the vehicle end, and each application program at the vehicle end uses the same tone.

By adopting the embodiment of the application, after the user logs in the terminal system of the target terminal, the audio data uploaded by the user can be obtained based on the tone customization prompting page, so that the tone can be customized based on the audio data uploaded by the user, and the user experience is improved. Meanwhile, based on a setting instruction of a user for each tone configuration information in the tone list page, the tone can be quickly and conveniently customized for any application scene of any application program in the target terminal, and user experience is further improved.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a voice playing apparatus according to an embodiment of the present application. The voice playing device provided by the embodiment of the application comprises:

a prompt page display module 121, configured to display a tone customization prompt page in response to a user logging in a terminal system of a target terminal;

a tone list display module 122, configured to obtain first audio data uploaded by the user on the basis of the tone customization prompt page, and display a tone list page, where the tone list page includes the first tone configuration information determined by the first audio data, and the first audio data and the first tone configuration information correspond to the same tone;

and the voice playing module 123 is configured to respond to a setting instruction of the user for the target tone color configuration information in the tone color list page, and play the audio information with the tone color corresponding to the target tone color configuration information through the target terminal.

In some possible embodiments, the tone list display module 122 is configured to:

responding to a confirmation instruction triggered by the user through the tone customization prompting page, and displaying a login page of a tone recording program;

responding to the user to log in the tone recording program and displaying a tone recording page;

and acquiring first audio data uploaded by the user based on the tone recording page.

In some possible embodiments, the tone recording page includes recording text information for prompting the user to record audio data based on the recording text information;

the tone list display module 122 is configured to:

and responding to the consistency of the text content corresponding to the first audio data recorded by the user and the recorded text information, and acquiring the first audio data.

In some possible embodiments, the voice playing module 123 is further configured to:

synchronizing the target tone color configuration information to each application program of the target terminal;

and responding to a setting instruction of the user for target tone color configuration information in the tone color list page, and playing the audio information of any application program by the target terminal in a tone color corresponding to the target tone color configuration information.

In some possible embodiments, the voice playing module 123 is configured to:

responding to a setting instruction of the user for target tone color configuration information in the tone color list, and determining a target application program corresponding to the target tone color configuration information;

and playing the audio information of the target application program by the target terminal according to the tone corresponding to the target tone configuration information.

In some possible embodiments, the prompt page display module 121 is further configured to:

determining tone attribute information of each piece of tone configuration information corresponding to the user, wherein the tone attribute information of any one piece of tone configuration information comprises at least one of the data size of the tone configuration information, the corresponding tone type, an information identifier or a storage path;

and displaying the tone list page determined by the tone attribute information.

In some possible embodiments, the voice playing module 123 is configured to:

acquiring query information input by the user, and determining text content corresponding to the query information;

and determining result information corresponding to the query information based on the text content corresponding to the query information, and playing the result information by the target terminal in the tone corresponding to the target tone configuration information.

In some possible embodiments, the prompt page display module 121 is configured to:

and responding to the fact that the target terminal has the tone customization permission based on the terminal information of the target terminal, and displaying a tone customization prompting page.

and in response to the fact that the target terminal does not have the tone customization authority based on the terminal information of the target terminal, playing the audio information by the target terminal according to the default tone corresponding to the target terminal.

and responding to the user log-out of the terminal system, and playing the audio information by the target terminal according to the default tone corresponding to the target terminal.

and acquiring a setting instruction of the user for the target tone color configuration information based on a message queue telemetry transmission protocol.

In a specific implementation, the voice playing apparatus may execute the implementation manners provided in the steps in fig. 1 through the built-in function modules, which may specifically refer to the implementation manners provided in the steps, and will not be described herein again.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 13, the electronic device 1000 in the present embodiment may include: the processor 1001, the network interface 1004, and the memory 1005, and the electronic device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 13, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the electronic device 1000 shown in fig. 13, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

In some possible embodiments, the processor 1001 is configured to:

the processor 1001 is configured to:

In some possible embodiments, the processor 1001 is further configured to:

In some possible embodiments, the processor 1001 is configured to:

In some possible embodiments, the processor 1001 is further configured to:

and displaying the tone list page determined by the tone attribute information.

In some possible embodiments, the processor 1001 is configured to:

In some possible embodiments, the processor 1001 is further configured to:

It should be understood that in some possible embodiments, the processor 1001 may be a Central Processing Unit (CPU), and the processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In a specific implementation, the electronic device 1000 may execute the implementation manners provided in the steps in fig. 1 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps, and are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and is executed by a processor to implement the method provided in each step in fig. 1, which may specifically refer to the implementation manner provided in each step, and is not described herein again.

The computer readable storage medium may be the voice playing apparatus or an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. The computer readable storage medium may further include a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), and the like. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The embodiment of the present application provides a computer program product, which includes a computer program or computer instructions, and when the computer program or the computer instructions are executed by a processor, the voice playing method provided by the embodiment of the present application performs the method provided by each step in fig. 1.

The terms "first", "second", and the like in the claims and in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or electronic device that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or electronic device. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the appended claims.

Claims

1. A method for playing speech, the method comprising:

and responding to a setting instruction of the user for target tone color configuration information in the tone color list page, and playing audio information by the target terminal in a tone color corresponding to the target tone color configuration information.

2. The method of claim 1, wherein the obtaining of the first audio data uploaded by the user based on the tone customization prompting page comprises:

responding to the user login of the tone recording program, and displaying a tone recording page;

3. The method of claim 2, wherein the tone recording page includes a recording text message for prompting the user to record audio data based on the recording text message;

the acquiring of the first audio data uploaded by the user based on the tone recording page includes:

4. The method of claim 1, further comprising:

the responding to the setting instruction of the user for the target tone color configuration information in the tone color list page, playing the audio information by the target terminal in the tone color corresponding to the target tone color configuration information, including:

5. The method according to claim 1, wherein the playing, by the target terminal, the audio information in the tone corresponding to the target tone configuration information in response to the setting instruction of the user for the target tone configuration information in the tone list page comprises:

and playing the audio information of the target application program through the target terminal according to the tone corresponding to the target tone configuration information.

6. The method of claim 1, further comprising:

determining tone attribute information of each piece of tone configuration information corresponding to the user, wherein the tone attribute information of any piece of tone configuration information comprises at least one of the data size of the tone configuration information, the corresponding tone type, an information identifier or a storage path;

the display tone list page includes:

and displaying the tone list page determined by the tone attribute information.

7. The method according to claim 1, wherein the playing, by the target terminal, the audio information in the tone corresponding to the target tone configuration information comprises:

8. The method of claim 1, wherein displaying the timbre customization prompt page comprises:

9. The method of claim 8, further comprising:

and in response to the fact that the target terminal does not have the tone customization permission based on the terminal information of the target terminal, playing audio information in a default tone corresponding to the target terminal through the target terminal.

10. The method of claim 1, further comprising:

and responding to the fact that the user logs out of the terminal system, and playing audio information through the target terminal according to the default tone corresponding to the target terminal.

11. The method of claim 1, further comprising:

12. A voice playback apparatus, characterized in that the apparatus comprises:

a tone list display module, configured to obtain first audio data uploaded by the user on the basis of the tone customization prompting page, and display a tone list page, where the tone list page includes first tone configuration information determined by the first audio data, and the first audio data and the first tone configuration information correspond to the same tone;

and the voice playing module is used for responding to a setting instruction of the user for the target tone color configuration information in the tone color list page, and playing the audio information by the target terminal according to the tone color corresponding to the target tone color configuration information.

13. An electronic device comprising a processor and a memory, the processor and the memory being interconnected;

the memory is used for storing a computer program;

the processor is configured to perform the method of any of claims 1 to 11 when the computer program is invoked.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1 to 11.

15. A computer program product, characterized in that it comprises a computer program or computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 11.