WO2020134896A1 - Method and device for invoking speech synthesis file - Google Patents

Method and device for invoking speech synthesis file Download PDF

Info

Publication number
WO2020134896A1
WO2020134896A1 PCT/CN2019/122545 CN2019122545W WO2020134896A1 WO 2020134896 A1 WO2020134896 A1 WO 2020134896A1 CN 2019122545 W CN2019122545 W CN 2019122545W WO 2020134896 A1 WO2020134896 A1 WO 2020134896A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
app
file
registered
synthesis file
Prior art date
Application number
PCT/CN2019/122545
Other languages
French (fr)
Chinese (zh)
Inventor
韩喆
王磊
傅春霖
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020134896A1 publication Critical patent/WO2020134896A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context

Definitions

  • This specification relates to the field of computers, and in particular to a method and device for calling a speech synthesis file.
  • the embodiments of the present specification provide a method and a device for calling a speech synthesis file, which solve the problems raised by the background art mentioned above.
  • the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
  • the download address of the speech synthesis file
  • the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
  • the method before detecting whether there is a voice synthesis file required for the registered APP on the client, the method further includes:
  • the distributed voice configuration file includes the server corresponding to the registered APP encrypting the voice configuration file delivered, and then assigning it to The first verification information corresponding to the registered APP;
  • determining whether the first verification information matches the second verification information pre-stored by the client specifically includes:
  • the method further includes:
  • the voice basic training model is based on the registered APP
  • the method further includes:
  • the registered APP performs voice playback according to the voice synthesis file.
  • the registered APP performs voice playback according to the voice synthesis file, specifically including: the server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; the encrypted voice synthesis After the file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
  • An apparatus for invoking a speech synthesis file provided by an embodiment of this specification, the apparatus includes:
  • the detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
  • the downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client.
  • the configuration file has a built-in download address for the speech synthesis file;
  • the calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • the device further includes:
  • a pulling unit configured to pull the voice configuration file from the server corresponding to the registered APP
  • a receiving unit configured to receive a voice configuration file delivered by a server corresponding to the registered APP, and the voice configuration file delivered includes the server corresponding to the registered APP performing the voice configuration file issued by the server After encryption, it is assigned to the first verification information corresponding to the registered APP;
  • the judging unit is used to judge whether the first verification information matches the second verification information pre-stored by the client;
  • the verification unit is configured to verify that the voice configuration file delivered is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
  • the judgment unit is specifically used to:
  • the device further includes:
  • the training unit is configured to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP can pass the built-in voice basic training
  • the model trains the APP developer's customized voice model, and generates a speech synthesis file corresponding to the registered APP from the APP developer's customized voice model according to the pre-stored text.
  • the voice basic training model is based on The need for the registered APP to play voice needs to be a model trained by several voice samples provided in advance and can be shared by the registered APP.
  • the device further includes:
  • a calculation unit configured to calculate a first summary value corresponding to the speech synthesis file
  • the judging unit is further used to judge whether the second summary value corresponding to the speech synthesis file previously stored in the speech configuration file is the same as the first summary value;
  • the judgment unit judges that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
  • the registered APP performs voice playback according to the voice synthesis file, specifically including: the server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; the encrypted voice synthesis After the file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
  • a voice system provided by an embodiment of this specification includes a terminal and a server, and the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
  • the APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
  • the server is used to train the APP developer's customized voice model through the built-in voice basic training model, and input the pre-stored text into the APP developer's customized voice model to generate the registered APP needs.
  • a voice synthesis file, the voice basic training model is a model that is obtained by training a number of voice samples provided in advance according to the needs of the registered APP to play voice and can be shared by registered APPs;
  • the voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receiving the voice configuration file delivered by the server corresponding to the registered APP, the distributed voice configuration file includes all
  • the server corresponding to the registered APP encrypts the delivered voice configuration file and distributes it to the first verification information corresponding to the registered APP; judging the first verification information and the second pre-stored by the client Whether the verification information matches; when judging that the first verification information matches the second verification information pre-stored by the client, verify that the delivered voice configuration file is correct; detect whether the client has a registered APP that needs to be used Voice synthesis file, the registered APP is an APP that needs to be pre-registered and needs to use a voice synthesis file; if it is detected that the voice synthesis file does not exist on the client, the voice configuration file corresponding to the registered APP corresponds to the registered APP Server downloads the speech synthesis file, and the speech configuration file has a built-in download address for the speech synthesis file; if it
  • a computer-readable medium provided by an embodiment of the present specification has stored thereon computer-readable instructions, and the computer-readable instructions may be executed by a processor to perform the following steps:
  • the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
  • the download address of the speech synthesis file
  • the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
  • An apparatus for calling a speech synthesis file includes a memory for storing computer program instructions and a processor for executing program instructions, where, when the computer program instructions are executed by the processor, Trigger the device to perform the following steps:
  • the detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
  • the downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client.
  • the configuration file has a built-in download address for the speech synthesis file;
  • the calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • the APP developer can train the APP developer's customized voice model through the server corresponding to the registered APP, and then input the pre-stored text into the APP developer's customized voice model to generate the APP developer's voice synthesis file. , When the registered APP needs to use the speech synthesis file, download the corresponding speech synthesis file for the registered APP to play voice;
  • the voice system can support multiple registered APPs, so that the utilization rate of the voice system is fully utilized.
  • FIG. 1 is a schematic flowchart of a method for invoking a speech synthesis file provided in Embodiment 1 of the present specification;
  • FIG. 2 is a schematic flowchart of a method for invoking a speech synthesis file provided in Embodiment 2 of this specification;
  • FIG. 3 is a schematic structural diagram of an apparatus for invoking a speech synthesis file provided in Embodiment 3 of this specification;
  • FIG. 4 is a schematic structural diagram of a voice system provided in Embodiment 4 of the present specification.
  • FIG. 1 is a schematic flowchart of a method for calling a speech synthesis file provided by an embodiment of the present specification.
  • the schematic flowchart includes:
  • step S101 it is detected whether the client has a voice synthesis file required for the registered APP, if it exists, step S102 is executed, and if it does not exist, step S103 is executed.
  • step S101 of the embodiment of the present specification the step of detecting whether there is a voice synthesis file required by the registered APP on the client can be performed by the voice SDK.
  • the voice SDK is provided with an interface for connecting multiple APPs at the same time, that is, the APP performs to the voice SDK Registration is to connect the APP data to the voice SDK.
  • the registered APP is an application that is registered with the voice SDK in advance and requires a voice synthesis file.
  • the voice SDK is a framework for APP developers when developing software.
  • the speech synthesis file is trained by the server corresponding to the registered APP according to the needs of the APP developer.
  • the APP developer sends the voice data provided by the APP developer to reflect the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the APP developer to customize the custom through the built-in voice basic training model.
  • Voice model and input the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP.
  • the basic voice training model is a model that can be shared by registered APPs and is trained by using several voice samples provided in advance according to the needs of registered APPs to play voices. Among them, some voice samples are high-quality voice data stored on the server corresponding to the registered APP.
  • the voice basic training model determines the sampling time of high-quality voice data according to the accuracy of the entire voice system.
  • the The sampling time can be 300 hours, but when the accuracy required by the entire voice system is not high, the sampling time of high-quality voice data is selected to be 100 hours.
  • step S101 of the embodiment of the present specification after the server corresponding to the registered APP trains the voice basic training model, the APP developer uploads voice data reflecting the characteristics of the APP developer to the server corresponding to the registered APP, through the voice basis
  • the training model trains a customized voice model for APP developers.
  • the voice data reflecting the characteristics of the APP developer is the voice data recorded according to the language environment required by the APP developer. At this time, the APP developer only needs to upload a small amount of uploaded voice data to the server corresponding to the registered APP.
  • the voice basic training model can be understood as an intermediate model with a large data set provided by the server corresponding to the registered APP to the APP developer, and then the intermediate model is tuned for the voice data uploaded by the APP developer to obtain training A customized voice model reflecting the characteristics of APP developers.
  • step S101 of the embodiment of the present specification the voice data uploaded by the APP developer needs to be reviewed.
  • the management personnel of the voice system conducts the review.
  • the mechanism can be that the customized voice model that reflects the characteristics of the APP developer can be used normally after being approved. That is to say, even if a customized voice model that reflects the characteristics of the APP developer is generated but has not been approved by the reviewer, the The customized voice model reflecting the characteristics of the APP developer cannot be used normally; at the same time, the audit mechanism can also be that regardless of whether the audit result of the customized voice model reflecting the characteristics of the APP developer passes, the registered APP can be normal. Used, but once the reviewer detects that the customized voice model reflecting the characteristics of the APP developer is unqualified, the customized voice model reflecting the characteristics of the APP developer becomes invalid.
  • step S101 of the embodiment of the present specification if the APP developer does not adopt this solution, but uses a traditional method to achieve customization requirements.
  • One is that the APP developer directly uploads the voice data reflecting the characteristics of the APP developer. After any processing, this makes the robustness low; the second is that the APP developer separately produces a customized voice model that reflects the characteristics of the APP developer. This process takes a long time to execute, and it cannot guarantee the customized voice model. quality.
  • the voice system can also be applied to a video system, that is, the video basic training model is stored in the server corresponding to the registered APP.
  • Step S102 Invoking the speech synthesis file of the client.
  • the voice SDK when a registered voice SDK has an application with a voice synthesis file that needs to be used, the voice SDK first detects whether the client exists. When the client has a configuration file that needs to be called, the call is stored on the client. Voice synthesis file, registered APP can play voice according to the voice synthesis file.
  • Step S103 Download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
  • step S103 of the embodiment of the present specification the speech synthesis file is generated according to a pre-stored text and a customized speech model by the APP developer. If the speech synthesis file does not exist during the judgment in step S102, it means that the speech synthesis file has never been downloaded by the registered APP before.
  • the voice configuration file has a built-in download address of the voice synthesis file, and the registered APP downloads the required voice synthesis file according to the download address of the voice synthesis file for the registered APP to synthesize the voice File for voice playback.
  • step S103 of the embodiment of the present specification before the registered APP performs voice playback according to the voice synthesis file, the voice synthesis file also needs to be verified, and the specific steps may be:
  • Step 1 Calculate the first summary value corresponding to the speech synthesis file.
  • the first digest value corresponding to the speech synthesis file checks the parameter value of whether the downloaded speech synthesis file has an error, or whether the downloaded speech synthesis file has been tampered with.
  • MD5 digest can be used for implementation.
  • MD5 is a widely used cryptographic hash function that can generate a 128-bit (16-byte) hash value to ensure downloading. Whether the voice configuration file of the Internet is wrong, or whether the downloaded voice configuration file has been tampered with. For example, under Unix, many softwares have a file with the same file name and a file extension of .md5 when downloaded. There is usually only one line of text in this file, and the general structure is as follows:
  • MD5 treats the entire file as a large text message, and through its irreversible string transformation algorithm, produces this unique MD5 message digest.
  • anyone on the planet has their own unique fingerprint, which is often the most trusted method for the judiciary to identify criminals; similarly, MD5 can generate a file for any file (regardless of its size, format, or number)
  • the same unique "digital fingerprint” if anyone makes any changes to the file, its MD5 value, that is, the corresponding "digital fingerprint” will change.
  • the MD5 value of the file is like the "digital fingerprint" of the file.
  • the MD5 value of each file is different. If anyone makes any changes to the file, the MD5 value of the corresponding "digital fingerprint" will change.
  • the download server provides an MD5 value for a file in advance. After the user downloads the file, the MD5 value of the downloaded file is recalculated. By comparing whether the two values are the same, you can determine whether the downloaded file is wrong, or the downloaded file Has it been tampered with?
  • step 1 of the embodiment of the present specification calculating the first summary value is to check whether the downloaded speech synthesis file has an error, or whether the downloaded speech synthesis file has been tampered with, so as to realize real-time detection of the speech synthesis file error, once the speech synthesis file If an error occurs in the content, the error message will be reported intuitively to prevent the error from spreading in the application.
  • the check for detecting speech synthesis files can also be implemented using SHA256 digests.
  • Step 2 Determine whether the second digest value corresponding to the pre-stored voice synthesis file in the voice configuration file is the same as the first digest value. If they are the same, perform step 3; if they are not the same, return to step S103.
  • Step 3 The registered APP performs voice playback according to the voice synthesis file.
  • the server corresponding to the registered APP can be encrypted according to the built-in private key.
  • it needs to be decrypted according to the public key stored in the decryption module and then play the voice.
  • a general voice database is configured in the voice basic training model, and the general voice database includes voice broadcasts of transaction amount and time, that is, the APP developer customizes when entering numbers in the text.
  • the speech model can be directly converted into a transaction amount of speech or a time speech synthesis file, rather than a simple digital reading. For example, when the text is written at 5:00, the speech played in the speech synthesis file is 5 o'clock.
  • a registered APP when a registered APP needs to use a speech synthesis file, it detects whether the client caches the speech synthesis file, and preferentially calls the speech synthesis file cached by the client when the client has the speech synthesis file to reduce the response time of the entire speech system .
  • FIG. 2 is a schematic flowchart of a method for calling a speech synthesis file provided by an embodiment of the present specification.
  • the schematic flowchart includes:
  • Step S201 Pull the voice configuration file from the server corresponding to the registered APP.
  • step S201 of the embodiment of the present specification the customized speech model corresponding to the registered APP converts the pre-stored text into a speech synthesis file, and the speech configuration file corresponding to the registered APP includes the speech list of the speech synthesis file.
  • Step S202 Receive the voice configuration file delivered by the server corresponding to the registered APP.
  • the delivered voice configuration file includes the server corresponding to the registered APP encrypts the delivered voice configuration file and assigns it to the corresponding 1. Verification information.
  • step S202 of the embodiment of this specification the developer APP registers with the voice SDK, and the voice SDK is connected with a decryption module.
  • the decryption module can issue a decrypted public key through TSM.
  • the public key corresponds to the registered APP.
  • the unique public key the server is configured with a corresponding private key, and the server corresponding to the registered APP encrypts the voice configuration file delivered by the private key.
  • the public key and private key are a key pair, the public key is the public part of the key pair, and the private key is the non-public part.
  • the key pair composed of the public key and the private key can be guaranteed to be unique.
  • this key pair When using this key pair, if you use one of the keys to encrypt a piece of data, you must use the other key to decrypt it. For example, if the public key is used to encrypt data, the private key must be used to decrypt. If the private key is used to encrypt data, the public key must also be used to decrypt, otherwise the decryption will not succeed.
  • the decryption module may be an SE module, and the SE module is a module that ensures system security.
  • the security chip and the chip operating system (COS) are used to implement functions such as secure storage of data and encryption and decryption operations.
  • the main functions of the SE module in the security system include: secure storage of keys, data encryption operations, and secure storage of information.
  • the secure storage of keys can establish a relatively complete key management system to ensure that keys cannot be read.
  • Data encryption operations include support for reliable security algorithms, sensitive data ciphertext transmission, and data transmission tamper resistance.
  • the safe storage of information refers to a strict file access authority mechanism and reliable authentication algorithms and processes.
  • the public key is placed in the SE module.
  • SE modules can be packaged in various forms, common ones include smart cards and embedded security modules (eSE).
  • an embedded security module eSE
  • eSE embedded security module
  • the built-in security operating system satisfies the terminal’s security key storage and data encryption services. demand.
  • the voice system can be widely used in finance, map navigation, urban transportation, medical treatment, retail and other fields, and can protect the security of the system when it is used.
  • step S203 it is determined whether the first verification information matches the second verification information pre-stored by the client. If so, step S204 is executed, and if not, the process ends.
  • step S203 of the embodiment of the present specification according to the identifier of the registered APP, the second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client is determined; it is determined whether the first verification information and the second verification information match.
  • the identity of the registered APP is the identity information of the registered APP.
  • Step S204 verify that the delivered voice configuration file is correct.
  • step S205 it is detected whether the client has a voice synthesis file required for the registered APP, if it exists, step S206 is executed, and if it does not exist, step S207 is executed.
  • step S205 in the embodiment of the present specification, it is the same as step S101 described above, and is not repeated here.
  • Step S206 calling the speech synthesis file of the client.
  • step S206 in the embodiment of the present specification it is the same as the above step S102, and is not repeated here.
  • Step S207 Download the voice synthesis file from the server corresponding to the registered APP according to the voice configuration file corresponding to the registered APP.
  • step S207 of the embodiment of the present specification it is the same as the above step S103, and is not repeated here.
  • the voice system in this embodiment also has a synchronization problem between the server and the registered APP.
  • the server can support the active push method, that is, when the client's voice synthesis file changes, the server actively sends The client pushes.
  • FIG. 3 is a schematic structural diagram of a calling device for a speech synthesis file provided by an embodiment of the present specification.
  • the schematic structural diagram includes: a detecting unit 1, a calling unit 2, a downloading unit 3, a pulling unit 4, a receiving unit 5, and a judging unit 6 , Verification unit 7, training unit 8 and calculation unit 9.
  • the detection unit 1 is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance.
  • the calling unit 2 is used to call the voice synthesis file of the client if it is detected that the client has a voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • the downloading unit 3 is used to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if the voice synthesis file does not exist on the client terminal. download link.
  • the pulling unit 4 is used to pull the voice configuration file from the server corresponding to the registered APP.
  • the receiving unit 5 is used to receive the voice configuration file delivered by the server corresponding to the registered APP.
  • the delivered voice configuration file includes the server corresponding to the registered APP encrypting the delivered voice configuration file and assigning it to the registered APP. First verification information.
  • the judging unit 6 is used to judge whether the first verification information matches the second verification information pre-stored by the client;
  • the verification unit 7 is configured to verify that the delivered voice configuration file is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
  • the judgment unit 6 is specifically used for:
  • the second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client according to the identifier of the registered APP;
  • the training unit 8 is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the APP developer to customize through the built-in voice basic training model
  • the voice model, and the voice synthesis file corresponding to the registered APP is generated by the APP-developed voice model according to the pre-stored text.
  • the basic voice training model is based on the needs of the registered APP to play voice.
  • the obtained model can be used by registered APPs.
  • the calculation unit 9 is used to calculate the first summary value corresponding to the speech synthesis file
  • the judging unit 6 is also used to judge whether the second digest value corresponding to the pre-stored voice synthesis file in the voice configuration file is the same as the first digest value;
  • the registered APP performs voice playback according to the voice synthesis file.
  • the registered APP performs voice playback according to the speech synthesis file, including: the server corresponding to the registered APP encrypts the speech synthesis file according to the preset rules; the encrypted speech synthesis file is decrypted according to the built-in decryption module, and the registered APP performs speech Play.
  • the embodiments of the present specification also provide a computer-readable medium on which computer-readable instructions are stored.
  • the computer-readable instructions can be executed by a processor to perform the following steps:
  • the client does not have a voice synthesis file, download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP, and the voice configuration file has a built-in download address for the voice synthesis file;
  • the client's voice synthesis file is called to allow the registered APP to perform voice playback according to the voice synthesis file.
  • An embodiment of the present specification also provides a calling device for a speech synthesis file.
  • the device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein, when the computer program instructions are executed by the processor, Trigger the device to perform the following steps:
  • the detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client, and the registered APP is an APP that needs to use the voice synthesis file in advance;
  • the downloading unit is used to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if the voice synthesis file does not exist on the client terminal. download link;
  • the calling unit is used to call the voice synthesis file of the client if it is detected that the client has a voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  • a voice system provided by an embodiment of this specification includes a terminal and a server, and the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
  • the APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
  • the server is used to train the APP developer's customized voice model through the built-in voice basic training model, and enter the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP.
  • the training model is a model that can be shared by the registered APPs and trained by using several voice samples provided in advance according to the needs of the registered APP to play voices;
  • the voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receive the voice configuration file delivered by the server corresponding to the registered APP.
  • the issued voice configuration file includes the server to the corresponding APP
  • the voice configuration file is encrypted, it is assigned to the first verification information corresponding to the registered APP; it is determined whether the first verification information matches the second verification information pre-stored by the client; When the second verification information matches, verify that the delivered voice configuration file is correct; detect whether the client has the voice synthesis file required for the registered APP, and the registered APP is an application that requires the voice synthesis file to be registered in advance; if it is detected There is no voice synthesis file on the client.
  • the voice configuration file has a built-in download address for the voice synthesis file; if it is detected that there is voice synthesis on the client File, call the voice synthesis file of the client for the registered APP to play voice according to the voice synthesis file
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions.
  • These computer program instructions can be provided to the processor of a computer, dedicated computer, embedded processor, or other programmable data processing device to produce a machine so that the instructions executed by the processor of the computer or other programmable data processing device produce instructions for A device for realizing the functions specified in one block or multiple blocks in one flow or multiple flows in a flowchart
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in computer-readable media, such as read only memory (ROM) or flash memory (flashRAM). Memory is an example of computer-readable media.
  • RAM random access memory
  • ROM read only memory
  • flashRAM flash memory
  • Computer readable media including permanent and non-permanent, removable and non-removable media, can store information by any method or technology.
  • the information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.

Abstract

A method and a device for invoking a speech synthesis file. Said method comprises: detecting whether a client has a speech synthesis file needed to be used by a registered app (S101), the registered app being a pre-registered app requiring the use of a speech synthesis file; if it is detected that the client does not have a speech synthesis file, downloading, according to a pre-stored speech configuration file corresponding to the registered app, a speech synthesis file from a server side corresponding to the registered app (S103), the speech configuration file including the download address of the speech synthesis file; and if it is detected that the client has the speech synthesis file, invoking the speech synthesis file of the client (S102), so that the registered app performs speech playback according to the speech synthesis file. When a registered app needs to use a speech synthesis file, whether a client has a speech synthesis file is detected, and if the client has a speech synthesis file, the speech synthesis file cached in the client is preferentially invoked, reducing the response time of the entire speech system.

Description

一种语音合成文件的调用方法及装置Calling method and device of speech synthesis file 技术领域Technical field
本说明书涉及计算机领域,尤其是涉及一种语音合成文件的调用方法及装置。This specification relates to the field of computers, and in particular to a method and device for calling a speech synthesis file.
背景技术Background technique
随着互联网的发展,多方合作已经体现在越来越多的方面。建设一个大型语音系统时,终端的框架和服务端由运行商进行搭建,但终端的应用需要多个ISV(独立软件开发商)来共同完成。With the development of the Internet, multi-party cooperation has been reflected in more and more aspects. When constructing a large-scale voice system, the terminal framework and server are built by the operator, but the application of the terminal requires multiple ISVs (independent software developers) to complete together.
现有的大型语音系统中,ISV开发的APP调用语音合成文件进行语音播放时,每次都需要由服务端合成该语音合成文件,再将该语音合成文件下载至终端进行调用,整个过程使得系统的响应时间增加,严重的还会造成整个系统的瘫痪,从而影响系统的正常运行。In the existing large-scale speech system, when the APP developed by ISV calls the speech synthesis file for speech playback, the speech synthesis file needs to be synthesized by the server every time, and then the speech synthesis file is downloaded to the terminal for calling. The whole process makes the system The increased response time will severely cause the entire system to be paralyzed, thus affecting the normal operation of the system.
发明内容Summary of the invention
本说明书实施例提供一种语音合成文件的调用方法及装置,解决了上述背景技术提出的问题。The embodiments of the present specification provide a method and a device for calling a speech synthesis file, which solve the problems raised by the background art mentioned above.
为解决上述技术问题,本说明书实施例是这样实现的:To solve the above technical problems, the embodiments of this specification are implemented as follows:
本说明书实施例提供的一种语音合成文件的调用方法,该方法包括:A method for calling a speech synthesis file provided by an embodiment of this specification includes:
检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;Detecting whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use a voice synthesis file in advance;
若检测出客户端不存在所述语音合成文件,根据预先存储的已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;If it is detected that the voice synthesis file does not exist on the client, the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP. The download address of the speech synthesis file;
若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。If it is detected that the voice synthesis file exists on the client, the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
可选的,所述检测客户端是否存在已注册APP所需要使用的语音合成文件之前,所述方法还包括:Optionally, before detecting whether there is a voice synthesis file required for the registered APP on the client, the method further includes:
向所述已注册APP对应的服务端拉取所述语音配置文件;Pull the voice configuration file from the server corresponding to the registered APP;
接收所述已注册APP对应的服务端下发的语音配置文件,下发的所述语音配置文件包括所述已注册APP对应的服务端对下发的所述语音配置文件进行加密后,分配给所述已注册APP对应的第一验证信息;Receiving the voice configuration file delivered by the server corresponding to the registered APP, the distributed voice configuration file includes the server corresponding to the registered APP encrypting the voice configuration file delivered, and then assigning it to The first verification information corresponding to the registered APP;
判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配;Determine whether the first verification information matches the second verification information pre-stored by the client;
在判断出所述第一验证信息与客户端预先保存的第二验证信息匹配时,则验证下发的所述语音配置文件正确。When it is determined that the first verification information matches the second verification information pre-stored by the client, it is verified that the delivered voice configuration file is correct.
可选的,判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配,具体包括:Optionally, determining whether the first verification information matches the second verification information pre-stored by the client specifically includes:
根据所述已注册APP的标识从内置于客户端安全运行环境中预先保存的与所述已注册APP对应的第二验证信息;According to the identifier of the registered APP from the pre-stored second verification information corresponding to the registered APP built in the secure operating environment of the client;
判断所述第一验证信息与第二验证信息是否匹配。Determine whether the first verification information matches the second verification information.
可选的,所述向所述已注册APP对应的服务端拉取所述语音配置文件之前,所述方法还包括:Optionally, before the voice configuration file is pulled from the server corresponding to the registered APP, the method further includes:
向所述已注册APP对应的服务端发送所述APP开发者提供的反映所述APP开发者特征的语音数据,以便所述已注册APP对应的服务端通过内置的语音基础训练模型训练出所述APP开发者定制化的语音模型,并将预先储存的文本输入所述APP开发者定制化的语音模型以生成已注册APP需要的语音合成文件,所述语音基础训练模型为根据所述已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型。Sending voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the training through the built-in voice basic training model APP developer's customized voice model, and input the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP, the voice basic training model is based on the registered APP To play voice, you need to use a model provided by several voice samples provided in advance, which can be shared by registered APPs.
可选的,所述已注册APP根据所述语音合成文件进行语音播放之前,所述方法还包括:Optionally, before the registered APP performs voice playback according to the voice synthesis file, the method further includes:
计算所述语音合成文件对应的第一摘要值;Calculating a first summary value corresponding to the speech synthesis file;
判断根据所述语音配置文件内预先储存的所述语音合成文件对应的第二摘要值与所述第一摘要值是否相同;Determine whether the second summary value corresponding to the speech synthesis file pre-stored in the speech configuration file is the same as the first summary value;
若判断出所述第二摘要值与所述第一摘要值相同时,则所述已注册APP根据所述语音合成文件进行语音播放。If it is determined that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
可选的,所述已注册APP根据所述语音合成文件进行语音播放,具体包括:所述 已注册APP对应的服务端根据预设规则对所述语音合成文件进行加密;所述加密的语音合成文件根据内置解密模块解密后,由所述已注册APP进行语音播放。Optionally, the registered APP performs voice playback according to the voice synthesis file, specifically including: the server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; the encrypted voice synthesis After the file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
本说明书实施例提供的一种语音合成文件的调用装置,所述装置包括:An apparatus for invoking a speech synthesis file provided by an embodiment of this specification, the apparatus includes:
检测单元,用于检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;The detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
下载单元,用于若检测出客户端不存在所述语音合成文件,根据预先存储的已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;The downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client. The configuration file has a built-in download address for the speech synthesis file;
调用单元,用于若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。The calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
可选的,所述装置还包括:Optionally, the device further includes:
拉取单元,用于向所述已注册APP对应的服务端拉取所述语音配置文件;A pulling unit, configured to pull the voice configuration file from the server corresponding to the registered APP;
接收单元,用于接收所述已注册APP对应的服务端下发的语音配置文件,下发的所述语音配置文件包括所述已注册APP对应的服务端对下发的所述语音配置文件进行加密后,分配给所述已注册APP对应的第一验证信息;A receiving unit, configured to receive a voice configuration file delivered by a server corresponding to the registered APP, and the voice configuration file delivered includes the server corresponding to the registered APP performing the voice configuration file issued by the server After encryption, it is assigned to the first verification information corresponding to the registered APP;
判断单元,用于判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配;The judging unit is used to judge whether the first verification information matches the second verification information pre-stored by the client;
验证单元,用于在判断出所述第一验证信息与客户端预先保存的第二验证信息匹配时,验证下发的所述语音配置文件正确。The verification unit is configured to verify that the voice configuration file delivered is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
可选的,所述判断单元具体用于:Optionally, the judgment unit is specifically used to:
根据所述已注册APP的标识从内置于客户端安全运行环境中预先保存的与所述已注册APP对应的第二验证信息;According to the identifier of the registered APP from the pre-stored second verification information corresponding to the registered APP built in the secure operating environment of the client;
判断所述第一验证信息与第二验证信息是否匹配。Determine whether the first verification information matches the second verification information.
可选的,所述装置还包括:Optionally, the device further includes:
训练单元,用于向所述已注册APP对应的服务端发送所述APP开发者提供的反映所述APP开发者特征的语音数据,以便所述已注册APP对应的服务端通过内置的语音基础训练模型训练出所述APP开发者定制化的语音模型,并根据预先储存的文本由所 述APP开发者定制化的语音模型生成已注册APP对应的语音合成文件,所述语音基础训练模型为根据所述已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型。The training unit is configured to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP can pass the built-in voice basic training The model trains the APP developer's customized voice model, and generates a speech synthesis file corresponding to the registered APP from the APP developer's customized voice model according to the pre-stored text. The voice basic training model is based on The need for the registered APP to play voice needs to be a model trained by several voice samples provided in advance and can be shared by the registered APP.
可选的,所述装置还包括:Optionally, the device further includes:
计算单元,用于计算所述语音合成文件对应的第一摘要值;A calculation unit, configured to calculate a first summary value corresponding to the speech synthesis file;
所述判断单元还用于判断根据所述语音配置文件内预先储存的所述语音合成文件对应的第二摘要值与所述第一摘要值是否相同;The judging unit is further used to judge whether the second summary value corresponding to the speech synthesis file previously stored in the speech configuration file is the same as the first summary value;
所述判断单元若判断出所述第二摘要值与所述第一摘要值相同时,则所述已注册APP根据所述语音合成文件进行语音播放。If the judgment unit judges that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
可选的,所述已注册APP根据所述语音合成文件进行语音播放,具体包括:所述已注册APP对应的服务端根据预设规则对所述语音合成文件进行加密;所述加密的语音合成文件根据内置解密模块解密后,由所述已注册APP进行语音播放。Optionally, the registered APP performs voice playback according to the voice synthesis file, specifically including: the server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; the encrypted voice synthesis After the file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
本说明书实施例提供的一种语音系统,包括终端、服务器,终端包括运行在终端中的语音SDK、已注册APP以及APP开发者端;A voice system provided by an embodiment of this specification includes a terminal and a server, and the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
所述APP开发者端用于向所述已注册APP对应的服务端发送所述APP开发者提供的反映所述APP开发者特征的语音数据;The APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
所述服务端用于通过内置的语音基础训练模型训练出所述APP开发者定制化的语音模型,并将预先储存的文本输入所述APP开发者定制化的语音模型以生成已注册APP需要的语音合成文件,所述语音基础训练模型为根据所述已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型;The server is used to train the APP developer's customized voice model through the built-in voice basic training model, and input the pre-stored text into the APP developer's customized voice model to generate the registered APP needs. A voice synthesis file, the voice basic training model is a model that is obtained by training a number of voice samples provided in advance according to the needs of the registered APP to play voice and can be shared by registered APPs;
所述语音SDK用于向所述已注册APP对应的服务端拉取所述语音配置文件;接收所述已注册APP对应的服务端下发的语音配置文件,所述下发语音配置文件包括所述已注册APP对应的服务端对所述下发的语音配置文件进行加密后,分配给所述已注册APP对应的第一验证信息;判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配;在判断出所述第一验证信息与客户端预先保存的第二验证信息匹配时,则验证所述下发的语音配置文件正确;检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;若检测出客户端不存在所述语音合成文件,根据已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地 址;若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。The voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receiving the voice configuration file delivered by the server corresponding to the registered APP, the distributed voice configuration file includes all The server corresponding to the registered APP encrypts the delivered voice configuration file and distributes it to the first verification information corresponding to the registered APP; judging the first verification information and the second pre-stored by the client Whether the verification information matches; when judging that the first verification information matches the second verification information pre-stored by the client, verify that the delivered voice configuration file is correct; detect whether the client has a registered APP that needs to be used Voice synthesis file, the registered APP is an APP that needs to be pre-registered and needs to use a voice synthesis file; if it is detected that the voice synthesis file does not exist on the client, the voice configuration file corresponding to the registered APP corresponds to the registered APP Server downloads the speech synthesis file, and the speech configuration file has a built-in download address for the speech synthesis file; if it is detected that the speech synthesis file exists on the client, the speech synthesis file of the client is called for The registered APP performs voice playback according to the voice synthesis file.
本说明书实施例提供的一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以下步骤:A computer-readable medium provided by an embodiment of the present specification has stored thereon computer-readable instructions, and the computer-readable instructions may be executed by a processor to perform the following steps:
检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;Detecting whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use a voice synthesis file in advance;
若检测出客户端不存在所述语音合成文件,根据预先存储的已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;If it is detected that the voice synthesis file does not exist on the client, the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP. The download address of the speech synthesis file;
若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。If it is detected that the voice synthesis file exists on the client, the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
本说明书实施例提供的一种语音合成文件的调用设备,该设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该设备执行以下步骤:An apparatus for calling a speech synthesis file provided by an embodiment of this specification includes a memory for storing computer program instructions and a processor for executing program instructions, where, when the computer program instructions are executed by the processor, Trigger the device to perform the following steps:
检测单元,用于检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;The detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
下载单元,用于若检测出客户端不存在所述语音合成文件,根据预先存储的已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;The downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client. The configuration file has a built-in download address for the speech synthesis file;
调用单元,用于若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。The calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:The above at least one technical solution adopted by the embodiments of the present specification can achieve the following beneficial effects:
1、已注册APP需要使用语音合成文件时,检测客户端是否缓存该语音合成文件,在客户端存在该语音合成文件时优先调用客户端缓存的语音合成文件,减少整个语音系统的响应时间;1. When a registered APP needs to use a speech synthesis file, detect whether the client caches the speech synthesis file, and preferentially call the speech synthesis file cached by the client when the client has the speech synthesis file, to reduce the response time of the entire speech system;
2、APP开发者通过已注册APP对应的服务器可以训练出APP开发者定制化的语音模型,再将预先储存的文本输入至APP开发者定制化的语音模型以生成APP开发者需要使用语音合成文件,已注册APP需要使用其中的语音合成文件时再将对应的语音 合成文件下载以供已注册APP进行语音播放;2. The APP developer can train the APP developer's customized voice model through the server corresponding to the registered APP, and then input the pre-stored text into the APP developer's customized voice model to generate the APP developer's voice synthesis file. , When the registered APP needs to use the speech synthesis file, download the corresponding speech synthesis file for the registered APP to play voice;
3、该语音系统可以支持多个已注册APP,使得该语音系统使用率得到充分利用。3. The voice system can support multiple registered APPs, so that the utilization rate of the voice system is fully utilized.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present specification or the technical solutions in the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only These are some of the embodiments described in this specification. For those of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.
图1为本说明书实施例一提供的语音合成文件的调用方法的流程示意图;FIG. 1 is a schematic flowchart of a method for invoking a speech synthesis file provided in Embodiment 1 of the present specification;
图2为本说明书实施例二提供的语音合成文件的调用方法的流程示意图;2 is a schematic flowchart of a method for invoking a speech synthesis file provided in Embodiment 2 of this specification;
图3为本说明书实施例三提供的语音合成文件的调用装置的结构示意图;FIG. 3 is a schematic structural diagram of an apparatus for invoking a speech synthesis file provided in Embodiment 3 of this specification;
图4为本说明书实施例四提供的语音系统的结构示意图。FIG. 4 is a schematic structural diagram of a voice system provided in Embodiment 4 of the present specification.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本说明书实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the technical solutions in this specification, the technical solutions in the embodiments of this specification will be described clearly and completely in conjunction with the drawings in the embodiments of this specification. Obviously, the described The embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments of this specification, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of this application.
图1为本说明书实施例提供的一种语音合成文件的调用方法的流程示意图,该流程示意图包括:FIG. 1 is a schematic flowchart of a method for calling a speech synthesis file provided by an embodiment of the present specification. The schematic flowchart includes:
步骤S101,检测客户端是否存在已注册APP所需要使用的语音合成文件,若存在,则执行步骤S102,若不存在,则执行步骤S103。In step S101, it is detected whether the client has a voice synthesis file required for the registered APP, if it exists, step S102 is executed, and if it does not exist, step S103 is executed.
在本说明书实施例的步骤S101中,可由语音SDK执行检测客户端是否存在已注册APP所需要使用的语音合成文件的步骤,语音SDK设置有同时连接多个APP的接口,即APP向语音SDK进行注册,就是将APP数据连接至语音SDK,已注册APP为预先向语音SDK注册且需要使用语音合成文件的APP。其中,在本实施例中语音SDK为APP开发者在开发软件时的框架。In step S101 of the embodiment of the present specification, the step of detecting whether there is a voice synthesis file required by the registered APP on the client can be performed by the voice SDK. The voice SDK is provided with an interface for connecting multiple APPs at the same time, that is, the APP performs to the voice SDK Registration is to connect the APP data to the voice SDK. The registered APP is an application that is registered with the voice SDK in advance and requires a voice synthesis file. In this embodiment, the voice SDK is a framework for APP developers when developing software.
在本说明书实施例的步骤S101中,语音合成文件是由已注册APP对应的服务端根据APP开发者的需求训练出的。首先APP开发者向已注册APP对应的服务端发送APP开发者提供的反映APP开发者特征的语音数据,以便已注册APP对应的服务端通过内置的语音基础训练模型训练出APP开发者定制化的语音模型,并将预先储存的文本输入APP开发者定制化的语音模型以生成已注册APP需要的语音合成文件。语音基础训练模型为根据已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型。其中,若干语音样本为已注册APP对应的服务端储存的高质量语音数据。In step S101 of the embodiment of the present specification, the speech synthesis file is trained by the server corresponding to the registered APP according to the needs of the APP developer. First, the APP developer sends the voice data provided by the APP developer to reflect the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the APP developer to customize the custom through the built-in voice basic training model. Voice model, and input the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP. The basic voice training model is a model that can be shared by registered APPs and is trained by using several voice samples provided in advance according to the needs of registered APPs to play voices. Among them, some voice samples are high-quality voice data stored on the server corresponding to the registered APP.
进一步的,在本说明书实施例的步骤S101中,语音基础训练模型根据整个语音系统的精确度来确定高质量语音数据的采样时间,在整个语音系统要求的精确度高时时,高质量语音数据的采样时间可以为300小时,但对于整个语音系统要求的精确度不高时,高质量语音数据的采样时间选取100小时。Further, in step S101 of the embodiment of the present specification, the voice basic training model determines the sampling time of high-quality voice data according to the accuracy of the entire voice system. When the accuracy required by the entire voice system is high, the The sampling time can be 300 hours, but when the accuracy required by the entire voice system is not high, the sampling time of high-quality voice data is selected to be 100 hours.
在本说明书实施例的步骤S101中,已注册APP对应的服务端在训练出语音基础训练模型后,APP开发者上传反映APP开发者特征的语音数据至已注册APP对应的服务端,通过语音基础训练模型训练出APP开发者定制化的语音模型。反映APP开发者特征的语音数据是根据APP开发者需求的语言环境而录制的语音数据。此时APP开发者只需将少量上传的语音数据上传至已注册APP对应的服务端。其中,语音基础训练模型可以理解为已注册APP对应的服务端为APP开发者提供的数据集很大的中间模型,然后将此中间模型为APP开发者上传的语音数据进行调优训练,从而得出反映APP开发者特征的定制化的语音模型。In step S101 of the embodiment of the present specification, after the server corresponding to the registered APP trains the voice basic training model, the APP developer uploads voice data reflecting the characteristics of the APP developer to the server corresponding to the registered APP, through the voice basis The training model trains a customized voice model for APP developers. The voice data reflecting the characteristics of the APP developer is the voice data recorded according to the language environment required by the APP developer. At this time, the APP developer only needs to upload a small amount of uploaded voice data to the server corresponding to the registered APP. Among them, the voice basic training model can be understood as an intermediate model with a large data set provided by the server corresponding to the registered APP to the APP developer, and then the intermediate model is tuned for the voice data uploaded by the APP developer to obtain training A customized voice model reflecting the characteristics of APP developers.
在本说明书实施例的步骤S101中,APP开发者上传的语音数据需要进行审核,在生成反映APP开发者特征的定制化的语音模型后,由该语音系统的管理人员进行审核,此时审核的机制可以为审核通过后该反映APP开发者特征的定制化的语音模型才可以正常使用,也就是说即使生成了反映APP开发者特征的定制化的语音模型,但未经过审核人员审核通过,该反映APP开发者特征的定制化的语音模型也是无法正常使用的;同时审核的机制也可以为不管该反映APP开发者特征的定制化的语音模型的审核结果是否通过,已注册的APP皆可正常使用,但是审核人员一旦检测出该反映APP开发者特征的定制化的语音模型不合格时,该该反映APP开发者特征的定制化的语音模型即失效。In step S101 of the embodiment of the present specification, the voice data uploaded by the APP developer needs to be reviewed. After generating a customized voice model reflecting the characteristics of the APP developer, the management personnel of the voice system conducts the review. The mechanism can be that the customized voice model that reflects the characteristics of the APP developer can be used normally after being approved. That is to say, even if a customized voice model that reflects the characteristics of the APP developer is generated but has not been approved by the reviewer, the The customized voice model reflecting the characteristics of the APP developer cannot be used normally; at the same time, the audit mechanism can also be that regardless of whether the audit result of the customized voice model reflecting the characteristics of the APP developer passes, the registered APP can be normal. Used, but once the reviewer detects that the customized voice model reflecting the characteristics of the APP developer is unqualified, the customized voice model reflecting the characteristics of the APP developer becomes invalid.
在本说明书实施例的步骤S101中,若是APP开发者不采用这种方案,而是使用传 统的方法实现定制化的要求,其一是APP开发者直接上传反映APP开发者特征的语音数据,不经过任何处理,这样做使得鲁棒性低;其二是APP开发者单独制作反映APP开发者特征的定制化的语音模型,该过程在执行的过程时间长,而且无法保证定制化的语音模型的质量。In step S101 of the embodiment of the present specification, if the APP developer does not adopt this solution, but uses a traditional method to achieve customization requirements. One is that the APP developer directly uploads the voice data reflecting the characteristics of the APP developer. After any processing, this makes the robustness low; the second is that the APP developer separately produces a customized voice model that reflects the characteristics of the APP developer. This process takes a long time to execute, and it cannot guarantee the customized voice model. quality.
在本说明书实施例的步骤S101中,该语音系统也可以应用于视频系统中,即已注册APP对应的服务端内储存视频基础训练模型。In step S101 of the embodiment of the present specification, the voice system can also be applied to a video system, that is, the video basic training model is stored in the server corresponding to the registered APP.
步骤S102,调用客户端的语音合成文件。Step S102: Invoking the speech synthesis file of the client.
在本说明书实施例S102中,已向语音SDK已注册APP有需要使用的语音合成文件时,语音SDK优先检测客户端是否存在,在客户端存在所需要调用的配置文件时,调用存放在客户端的语音合成文件,已注册APP可以根据语音合成文件进行语音播放。In the embodiment S102 of this specification, when a registered voice SDK has an application with a voice synthesis file that needs to be used, the voice SDK first detects whether the client exists. When the client has a configuration file that needs to be called, the call is stored on the client. Voice synthesis file, registered APP can play voice according to the voice synthesis file.
步骤S103,根据预先存储的已注册APP对应的语音配置文件从已注册APP对应的服务端下载语音合成文件。Step S103: Download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP.
在本说明书实施例的步骤S103中,语音合成文件是根据预先储存的文本并由APP开发者定制化的语音模型所生成。若是在步骤S102的判断时语音合成文件不存在,说明该语音合成文件以前从未被已注册APP下载。In step S103 of the embodiment of the present specification, the speech synthesis file is generated according to a pre-stored text and a customized speech model by the APP developer. If the speech synthesis file does not exist during the judgment in step S102, it means that the speech synthesis file has never been downloaded by the registered APP before.
在本说明书实施例的步骤S103中,语音配置文件内置有语音合成文件的下载地址,已注册APP根据该语音合成文件的下载地址下载所需要使用的语音合成文件,以供已注册APP根据语音合成文件进行语音播放。In step S103 of the embodiment of the present specification, the voice configuration file has a built-in download address of the voice synthesis file, and the registered APP downloads the required voice synthesis file according to the download address of the voice synthesis file for the registered APP to synthesize the voice File for voice playback.
在本说明书实施例的步骤S103中,已注册APP根据语音合成文件进行语音播放之前,还需要对语音合成文件进行验证,具体步骤可以为:In step S103 of the embodiment of the present specification, before the registered APP performs voice playback according to the voice synthesis file, the voice synthesis file also needs to be verified, and the specific steps may be:
步骤1、计算语音合成文件对应的第一摘要值。Step 1. Calculate the first summary value corresponding to the speech synthesis file.
在本说明书实施例的步骤1中,语音合成文件对应的第一摘要值检查下载的语音合成文件是否出错,或者说下载的语音合成文件是否被篡改的参数值。在本实施例中可以采用MD5摘要实现,其中,MD5是一种被广泛使用的密码散列函数,可以产生出一个128位(16字节)的散列值(hash value),用于确保下载的语音配置文件是否出错,或者下载的语音配置文件是否被篡改。比如,在Unix下有很多软件在下载的时候都有一个文件名相同,文件扩展名为.md5的文件,在这个文件中通常只有一行文本,大致结构如:In step 1 of the embodiment of the present specification, the first digest value corresponding to the speech synthesis file checks the parameter value of whether the downloaded speech synthesis file has an error, or whether the downloaded speech synthesis file has been tampered with. In this embodiment, MD5 digest can be used for implementation. MD5 is a widely used cryptographic hash function that can generate a 128-bit (16-byte) hash value to ensure downloading. Whether the voice configuration file of the Internet is wrong, or whether the downloaded voice configuration file has been tampered with. For example, under Unix, many softwares have a file with the same file name and a file extension of .md5 when downloaded. There is usually only one line of text in this file, and the general structure is as follows:
MD5(tanajiya.tar.gz)=38b8c2c1093dd0fec383a9d9ac940515MD5(tanajiya.tar.gz)=38b8c2c1093dd0fec383a9d9ac940515
这就是tanajiya.tar.gz文件的数字签名。MD5将整个文件当作一个大文本信息,通过其不可逆的字符串变换算法,产生了这个唯一的MD5信息摘要。通俗来说,地球上任何人都有自己独一无二的指纹,这常常成为司法机关鉴别罪犯身份最值得信赖的方法;与之类似,MD5就可以为任何文件(不管其大小、格式、数量)产生一个同样独一无二的“数字指纹”,如果任何人对文件做了任何改动,其MD5值也就是对应的“数字指纹”都会发生变化。下载站点中的MD5,它的作用就在于我们可以在下载文件后,对下载的文件用专门的软件(如Windows MD5 Check等)做一次MD5校验,以确保我们获得的文件与该站点提供的文件为同一文件。具体来说文件的MD5值就像是这个文件的“数字指纹”。每个文件的MD5值是不同的,如果任何人对文件做了任何改动,其MD5值也就是对应的“数字指纹”就会发生变化。比如下载服务器针对一个文件预先提供一个MD5值,用户下载完该文件后,重新计算下载文件的MD5值,通过比较这两个值是否相同,就能判断下载的文件是否出错,或者说下载的文件是否被篡改了。This is the digital signature of the tanajiya.tar.gz file. MD5 treats the entire file as a large text message, and through its irreversible string transformation algorithm, produces this unique MD5 message digest. In layman's terms, anyone on the planet has their own unique fingerprint, which is often the most trusted method for the judiciary to identify criminals; similarly, MD5 can generate a file for any file (regardless of its size, format, or number) The same unique "digital fingerprint", if anyone makes any changes to the file, its MD5 value, that is, the corresponding "digital fingerprint" will change. Download the MD5 in the site, its role is that after downloading the file, we can do a MD5 check on the downloaded file with special software (such as Windows MD5 Check, etc.) to ensure that the file we obtained is the same as that provided by the site The file is the same file. Specifically, the MD5 value of the file is like the "digital fingerprint" of the file. The MD5 value of each file is different. If anyone makes any changes to the file, the MD5 value of the corresponding "digital fingerprint" will change. For example, the download server provides an MD5 value for a file in advance. After the user downloads the file, the MD5 value of the downloaded file is recalculated. By comparing whether the two values are the same, you can determine whether the downloaded file is wrong, or the downloaded file Has it been tampered with?
在本说明书实施例的步骤1中,计算第一摘要值是检查下载的语音合成文件是否出错,或者说下载的语音合成文件是否被篡改,实现对语音合成文件错误的实时检测,一旦语音合成文件内容发生错误,将直观地报出错误信息,防止错误在应用程序中蔓延。此外,检测语音合成文件的检查也可采用SHA256摘要实现。In step 1 of the embodiment of the present specification, calculating the first summary value is to check whether the downloaded speech synthesis file has an error, or whether the downloaded speech synthesis file has been tampered with, so as to realize real-time detection of the speech synthesis file error, once the speech synthesis file If an error occurs in the content, the error message will be reported intuitively to prevent the error from spreading in the application. In addition, the check for detecting speech synthesis files can also be implemented using SHA256 digests.
步骤2,判断根据语音配置文件内预先储存的语音合成文件对应的第二摘要值与第一摘要值是否相同,若相同,则执行步骤3,若不相同,则返回步骤S103。Step 2: Determine whether the second digest value corresponding to the pre-stored voice synthesis file in the voice configuration file is the same as the first digest value. If they are the same, perform step 3; if they are not the same, return to step S103.
步骤3,已注册APP根据语音合成文件进行语音播放。Step 3: The registered APP performs voice playback according to the voice synthesis file.
在本说明书实施例的步骤3中,已注册APP对应的服务端可以根据内置的私钥进行加密,加密的语音合成文件播放时需要根据解密模块存储的公钥解密后进行语音播放。In step 3 of the embodiment of this specification, the server corresponding to the registered APP can be encrypted according to the built-in private key. When playing the encrypted voice synthesis file, it needs to be decrypted according to the public key stored in the decryption module and then play the voice.
在本说明书实施例的步骤S103中,语音基础训练模型内配置有通用的语音数据库,该通用的语音数据库内包括交易金额、时间的语音播报,即APP开发者在文本中输入数字时通过定制化的语音模型可直接转化为交易的金额的语音或时间的语音合成文件,而非单纯的数字朗读,例如,文本中写入5:00时,语音合成文件播放出的语音为时间为5点。In step S103 of the embodiment of the present specification, a general voice database is configured in the voice basic training model, and the general voice database includes voice broadcasts of transaction amount and time, that is, the APP developer customizes when entering numbers in the text. The speech model can be directly converted into a transaction amount of speech or a time speech synthesis file, rather than a simple digital reading. For example, when the text is written at 5:00, the speech played in the speech synthesis file is 5 o'clock.
上述步骤中,已注册APP需要使用语音合成文件时,检测客户端是否缓存该语音合成文件,在客户端存在该语音合成文件时优先调用客户端缓存的语音合成文件,减少整个语音系统的响应时间。In the above steps, when a registered APP needs to use a speech synthesis file, it detects whether the client caches the speech synthesis file, and preferentially calls the speech synthesis file cached by the client when the client has the speech synthesis file to reduce the response time of the entire speech system .
进一步的,为了该语音系统可以应用在安全的环境下,对于上述实施例做出了改变,图2为本说明书实施例提供的一种语音合成文件的调用方法的流程示意图,该流程示意图包括:Further, in order that the voice system can be applied in a secure environment, changes are made to the above embodiments. FIG. 2 is a schematic flowchart of a method for calling a speech synthesis file provided by an embodiment of the present specification. The schematic flowchart includes:
步骤S201,向已注册APP对应的服务端拉取语音配置文件。Step S201: Pull the voice configuration file from the server corresponding to the registered APP.
在本说明书实施例的步骤S201中,已注册APP对应的定制化的语音模型将预先储存的文本转化为语音合成文件,已注册APP对应的语音配置文件包括该语音合成文件的语音列表。In step S201 of the embodiment of the present specification, the customized speech model corresponding to the registered APP converts the pre-stored text into a speech synthesis file, and the speech configuration file corresponding to the registered APP includes the speech list of the speech synthesis file.
步骤S202,接收已注册APP对应的服务端下发的语音配置文件,下发语音配置文件包括已注册APP对应的服务端对下发的语音配置文件进行加密后,分配给已注册APP对应的第一验证信息。Step S202: Receive the voice configuration file delivered by the server corresponding to the registered APP. The delivered voice configuration file includes the server corresponding to the registered APP encrypts the delivered voice configuration file and assigns it to the corresponding 1. Verification information.
在本说明书实施例的步骤S202中,开发者APP在语音SDK进行注册,语音SDK连接有解密模块,该解密模块内可以通过TSM下发解密的公钥,该公钥为对应已注册APP对应的唯一公钥,服务端配置有对应的私钥,已注册APP对应的服务端对下发的语音配置文件由该私钥进行加密。公钥与私钥是一个密钥对,公钥是密钥对中公开的部分,私钥则是非公开的部分。公钥和私钥组成的密钥对能保证在是唯一的。使用这个密钥对的时候,如果用其中一个密钥加密一段数据,必须用另一个密钥解密。比如用公钥加密数据就必须用私钥解密,如果用私钥加密也必须用公钥解密,否则解密将不会成功。In step S202 of the embodiment of this specification, the developer APP registers with the voice SDK, and the voice SDK is connected with a decryption module. The decryption module can issue a decrypted public key through TSM. The public key corresponds to the registered APP. The unique public key, the server is configured with a corresponding private key, and the server corresponding to the registered APP encrypts the voice configuration file delivered by the private key. The public key and private key are a key pair, the public key is the public part of the key pair, and the private key is the non-public part. The key pair composed of the public key and the private key can be guaranteed to be unique. When using this key pair, if you use one of the keys to encrypt a piece of data, you must use the other key to decrypt it. For example, if the public key is used to encrypt data, the private key must be used to decrypt. If the private key is used to encrypt data, the public key must also be used to decrypt, otherwise the decryption will not succeed.
进一步的,在本说明书实施例的步骤S202中,解密模块可以为SE模块,SE模块为确保系统安全的模块,通过安全芯片和芯片操作系统(COS)实现数据安全存储、加解密运算等功能。SE模块在安全体系里主要功能包括:密钥的安全存储、数据加密运算和信息的安全存放。密钥的安全存储可建立相对完善的密钥管理体系,保证密钥不可被读取。数据加密运算包括对于可靠的安全算法的支持、敏感数据密文传输和数据传输防篡改等。信息安全存放指的是严格的文件访问权限机制和可靠的认证算法和流程。本实施例中是将公钥放置SE模块中。SE模块可封装成各种形式,常见的有智能卡和嵌入式安全模块(eSE)等。本实施例中可以针对语音系统的语音SDK植入嵌入式安全模块(eSE),并采用满足CCEAL5+安全等级要求的智能安全芯片,内置安全操作系统,满足终端的安全密钥存储、数据加密服务的需求。使得该语音系统可广泛应用于金融、地图导航、城市交通、医疗、零售等领域,能保护系统在使用时安全性。Further, in step S202 of the embodiment of the present specification, the decryption module may be an SE module, and the SE module is a module that ensures system security. The security chip and the chip operating system (COS) are used to implement functions such as secure storage of data and encryption and decryption operations. The main functions of the SE module in the security system include: secure storage of keys, data encryption operations, and secure storage of information. The secure storage of keys can establish a relatively complete key management system to ensure that keys cannot be read. Data encryption operations include support for reliable security algorithms, sensitive data ciphertext transmission, and data transmission tamper resistance. The safe storage of information refers to a strict file access authority mechanism and reliable authentication algorithms and processes. In this embodiment, the public key is placed in the SE module. SE modules can be packaged in various forms, common ones include smart cards and embedded security modules (eSE). In this embodiment, an embedded security module (eSE) can be implanted for the voice SDK of the voice system, and a smart security chip that meets the requirements of CCEAL5+ security level is used. The built-in security operating system satisfies the terminal’s security key storage and data encryption services. demand. The voice system can be widely used in finance, map navigation, urban transportation, medical treatment, retail and other fields, and can protect the security of the system when it is used.
步骤S203,判断第一验证信息与客户端预先保存的第二验证信息是否匹配,若是, 则执行步骤S204,若否,则结束流程。In step S203, it is determined whether the first verification information matches the second verification information pre-stored by the client. If so, step S204 is executed, and if not, the process ends.
在本说明书实施例的步骤S203中,根据已注册APP的标识从内置于客户端安全运行环境中预先保存的与已注册APP对应的第二验证信息;判断第一验证信息与第二验证信息是否匹配。其中,已注册APP的标识是该已注册APP的身份信息。In step S203 of the embodiment of the present specification, according to the identifier of the registered APP, the second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client is determined; it is determined whether the first verification information and the second verification information match. The identity of the registered APP is the identity information of the registered APP.
步骤S204,验证下发的语音配置文件正确。Step S204, verify that the delivered voice configuration file is correct.
步骤S205,检测客户端是否存在已注册APP所需要使用的语音合成文件,若存在,则执行步骤S206,若不存在,则执行步骤S207。In step S205, it is detected whether the client has a voice synthesis file required for the registered APP, if it exists, step S206 is executed, and if it does not exist, step S207 is executed.
在本说明书实施例的步骤S205中,同上述步骤S101,不再赘述。In step S205 in the embodiment of the present specification, it is the same as step S101 described above, and is not repeated here.
步骤S206,调用客户端的语音合成文件。Step S206, calling the speech synthesis file of the client.
在本说明书实施例的步骤S206中,同上述步骤S102,不再赘述。In step S206 in the embodiment of the present specification, it is the same as the above step S102, and is not repeated here.
步骤S207,根据已注册APP对应的语音配置文件从已注册APP对应的服务端下载语音合成文件。Step S207: Download the voice synthesis file from the server corresponding to the registered APP according to the voice configuration file corresponding to the registered APP.
在本说明书实施例的步骤S207中,同上述步骤S103,不再赘述。In step S207 of the embodiment of the present specification, it is the same as the above step S103, and is not repeated here.
进一步的,本实施例中的语音系统还存在着服务端和已注册APP的同步问题,为了解决该问题,可以支持服务端主动推送的方式,即客户端的语音合成文件发生变化时服务端主动向客户端进行推送。Further, the voice system in this embodiment also has a synchronization problem between the server and the registered APP. In order to solve this problem, the server can support the active push method, that is, when the client's voice synthesis file changes, the server actively sends The client pushes.
图3为本说明书实施例提供的一种语音合成文件的调用装置的结构示意图,该结构示意图包括:检测单元1、调用单元2、下载单元3、拉取单元4、接收单元5、判断单元6、验证单元7、训练单元8与计算单元9。FIG. 3 is a schematic structural diagram of a calling device for a speech synthesis file provided by an embodiment of the present specification. The schematic structural diagram includes: a detecting unit 1, a calling unit 2, a downloading unit 3, a pulling unit 4, a receiving unit 5, and a judging unit 6 , Verification unit 7, training unit 8 and calculation unit 9.
检测单元1用于检测客户端是否存在已注册APP所需要使用的语音合成文件,已注册APP为预先注册需要使用语音合成文件的APP。The detection unit 1 is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance.
调用单元2用于若检测出客户端存在语音合成文件,调用客户端的语音合成文件,以供已注册APP根据语音合成文件进行语音播放。The calling unit 2 is used to call the voice synthesis file of the client if it is detected that the client has a voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
下载单元3用于若检测出客户端不存在语音合成文件,根据预先存储的已注册APP对应的语音配置文件从已注册APP对应的服务端下载语音合成文件,语音配置文件内置有语音合成文件的下载地址。The downloading unit 3 is used to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if the voice synthesis file does not exist on the client terminal. download link.
拉取单元4用于向已注册APP对应的服务端拉取语音配置文件。The pulling unit 4 is used to pull the voice configuration file from the server corresponding to the registered APP.
接收单元5用于接收已注册APP对应的服务端下发的语音配置文件,下发语音配置文件包括已注册APP对应的服务端对下发的语音配置文件进行加密后,分配给已注册APP对应的第一验证信息。The receiving unit 5 is used to receive the voice configuration file delivered by the server corresponding to the registered APP. The delivered voice configuration file includes the server corresponding to the registered APP encrypting the delivered voice configuration file and assigning it to the registered APP. First verification information.
判断单元6用于判断第一验证信息与客户端预先保存的第二验证信息是否匹配;The judging unit 6 is used to judge whether the first verification information matches the second verification information pre-stored by the client;
验证单元7用于在判断出第一验证信息与客户端预先保存的第二验证信息匹配时,验证下发的语音配置文件正确。The verification unit 7 is configured to verify that the delivered voice configuration file is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
判断单元6具体用于:The judgment unit 6 is specifically used for:
根据已注册APP的标识从内置于客户端安全运行环境中预先保存的与已注册APP对应的第二验证信息;The second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client according to the identifier of the registered APP;
判断第一验证信息与第二验证信息是否匹配。Determine whether the first verification information matches the second verification information.
训练单元8用于向已注册APP对应的服务端发送APP开发者提供的反映APP开发者特征的语音数据,以便已注册APP对应的服务端通过内置的语音基础训练模型训练出APP开发者定制化的语音模型,并根据预先储存的文本由APP开发者定制化的语音模型生成已注册APP对应的语音合成文件,语音基础训练模型为根据已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型。The training unit 8 is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the APP developer to customize through the built-in voice basic training model The voice model, and the voice synthesis file corresponding to the registered APP is generated by the APP-developed voice model according to the pre-stored text. The basic voice training model is based on the needs of the registered APP to play voice. The obtained model can be used by registered APPs.
计算单元9用于计算语音合成文件对应的第一摘要值;The calculation unit 9 is used to calculate the first summary value corresponding to the speech synthesis file;
判断单元6还用于判断根据语音配置文件内预先储存的语音合成文件对应的第二摘要值与第一摘要值是否相同;The judging unit 6 is also used to judge whether the second digest value corresponding to the pre-stored voice synthesis file in the voice configuration file is the same as the first digest value;
判断单元6若判断出第二摘要值与第一摘要值相同时,则已注册APP根据语音合成文件进行语音播放。If the judging unit 6 judges that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
已注册APP根据语音合成文件进行语音播放,具体包括:已注册APP对应的服务端根据预设规则对语音合成文件进行加密;加密的语音合成文件根据内置解密模块解密后,由已注册APP进行语音播放。The registered APP performs voice playback according to the speech synthesis file, including: the server corresponding to the registered APP encrypts the speech synthesis file according to the preset rules; the encrypted speech synthesis file is decrypted according to the built-in decryption module, and the registered APP performs speech Play.
本说明书实施例还提供了一种计算机可读介质,其上存储有计算机可读指令,计算机可读指令可被处理器执行以下步骤:The embodiments of the present specification also provide a computer-readable medium on which computer-readable instructions are stored. The computer-readable instructions can be executed by a processor to perform the following steps:
检测客户端是否存在已注册APP所需要使用的语音合成文件,已注册APP为预先注册需要使用语音合成文件的APP;Detect whether there is a voice synthesis file required by the registered APP on the client, and the registered APP is an APP that needs to use the voice synthesis file in advance;
若检测出客户端不存在语音合成文件,根据预先存储的已注册APP对应的语音配 置文件从已注册APP对应的服务端下载语音合成文件,语音配置文件内置有语音合成文件的下载地址;If it is detected that the client does not have a voice synthesis file, download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP, and the voice configuration file has a built-in download address for the voice synthesis file;
若检测出客户端存在语音合成文件,调用客户端的语音合成文件,以供已注册APP根据语音合成文件进行语音播放。If it is detected that the client has a voice synthesis file, the client's voice synthesis file is called to allow the registered APP to perform voice playback according to the voice synthesis file.
本说明书实施例还提供一种语音合成文件的调用设备,该设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该设备执行以下步骤:An embodiment of the present specification also provides a calling device for a speech synthesis file. The device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein, when the computer program instructions are executed by the processor, Trigger the device to perform the following steps:
检测单元,用于检测客户端是否存在已注册APP所需要使用的语音合成文件,已注册APP为预先注册需要使用语音合成文件的APP;The detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client, and the registered APP is an APP that needs to use the voice synthesis file in advance;
下载单元,用于若检测出客户端不存在语音合成文件,根据预先存储的已注册APP对应的语音配置文件从已注册APP对应的服务端下载语音合成文件,语音配置文件内置有语音合成文件的下载地址;The downloading unit is used to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if the voice synthesis file does not exist on the client terminal. download link;
调用单元,用于若检测出客户端存在语音合成文件,调用客户端的语音合成文件,以供已注册APP根据语音合成文件进行语音播放。The calling unit is used to call the voice synthesis file of the client if it is detected that the client has a voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
本说明书实施例提供的一种语音系统,包括终端、服务器,终端包括运行在终端中的语音SDK、已注册APP以及APP开发者端;A voice system provided by an embodiment of this specification includes a terminal and a server, and the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
APP开发者端用于向已注册APP对应的服务端发送APP开发者提供的反映APP开发者特征的语音数据;The APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
服务端用于通过内置的语音基础训练模型训练出APP开发者定制化的语音模型,并将预先储存的文本输入APP开发者定制化的语音模型以生成已注册APP需要的语音合成文件,语音基础训练模型为根据已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型;The server is used to train the APP developer's customized voice model through the built-in voice basic training model, and enter the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP. The training model is a model that can be shared by the registered APPs and trained by using several voice samples provided in advance according to the needs of the registered APP to play voices;
语音SDK用于向已注册APP对应的服务端拉取语音配置文件;接收已注册APP对应的服务端下发的语音配置文件,下发语音配置文件包括已注册APP对应的服务端对下发的语音配置文件进行加密后,分配给已注册APP对应的第一验证信息;判断第一验证信息与客户端预先保存的第二验证信息是否匹配;在判断出第一验证信息与客户端预先保存的第二验证信息匹配时,则验证下发的语音配置文件正确;检测客户端是否存在已注册APP所需要使用的语音合成文件,已注册APP为预先注册需要使用语音合成文件的APP;若检测出客户端不存在语音合成文件,根据已注册APP对应的语音配置 文件从已注册APP对应的服务端下载语音合成文件,语音配置文件内置有语音合成文件的下载地址;若检测出客户端存在语音合成文件,调用客户端的语音合成文件,以供已注册APP根据语音合成文件进行语音播放。The voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receive the voice configuration file delivered by the server corresponding to the registered APP. The issued voice configuration file includes the server to the corresponding APP After the voice configuration file is encrypted, it is assigned to the first verification information corresponding to the registered APP; it is determined whether the first verification information matches the second verification information pre-stored by the client; When the second verification information matches, verify that the delivered voice configuration file is correct; detect whether the client has the voice synthesis file required for the registered APP, and the registered APP is an application that requires the voice synthesis file to be registered in advance; if it is detected There is no voice synthesis file on the client. Download the voice synthesis file from the server corresponding to the registered APP according to the voice configuration file corresponding to the registered APP. The voice configuration file has a built-in download address for the voice synthesis file; if it is detected that there is voice synthesis on the client File, call the voice synthesis file of the client for the registered APP to play voice according to the voice synthesis file
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a computer, dedicated computer, embedded processor, or other programmable data processing device to produce a machine so that the instructions executed by the processor of the computer or other programmable data processing device produce instructions for A device for realizing the functions specified in one block or multiple blocks in one flow or multiple flows in a flowchart
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, the computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM)。内存是计算机可读介质的示例。The memory may include non-permanent memory, random access memory (RAM) and/or non-volatile memory in computer-readable media, such as read only memory (ROM) or flash memory (flashRAM). Memory is an example of computer-readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法 或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media, including permanent and non-permanent, removable and non-removable media, can store information by any method or technology. The information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variant thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device that includes a series of elements not only includes those elements, but also includes Other elements not explicitly listed, or include elements inherent to such processes, methods, goods, or equipment. In the absence of more restrictions, the elements defined by the sentence "include one..." do not exclude that there are other identical elements in the process, method, commodity or equipment that includes the elements.
以上仅为本说明书的实施例而已,并不用于限制本说明书。对于本领域技术人员来说,本说明书可以有各种更改和变化。凡在本说明书的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本说明书的权利要求范围之内。The above are only examples of this specification, and are not intended to limit this specification. For those skilled in the art, this specification may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of this specification shall be included in the scope of the claims of this specification.

Claims (15)

  1. 一种语音合成文件的调用方法,其特征在于,所述方法包括:A method for calling a speech synthesis file, characterized in that the method includes:
    检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;Detecting whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use a voice synthesis file in advance;
    若检测出客户端不存在所述语音合成文件,根据预先存储的已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;If it is detected that the voice synthesis file does not exist on the client, the voice synthesis file is downloaded from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP. The download address of the speech synthesis file;
    若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。If it is detected that the voice synthesis file exists on the client, the voice synthesis file of the client is invoked for the registered APP to perform voice playback according to the voice synthesis file.
  2. 根据权利要求1所述的语音合成文件的调用方法,其特征在于,所述检测客户端是否存在已注册APP所需要使用的语音合成文件之前,所述方法还包括:The method for invoking a speech synthesis file according to claim 1, wherein before the detecting whether there is a speech synthesis file required for the registered APP on the client, the method further comprises:
    向所述已注册APP对应的服务端拉取所述语音配置文件;Pull the voice configuration file from the server corresponding to the registered APP;
    接收所述已注册APP对应的服务端下发的语音配置文件,下发的所述语音配置文件包括所述已注册APP对应的服务端对下发的所述语音配置文件进行加密后,分配给所述已注册APP对应的第一验证信息;Receiving the voice configuration file delivered by the server corresponding to the registered APP, the distributed voice configuration file includes the server corresponding to the registered APP encrypting the voice configuration file delivered, and then assigning it to The first verification information corresponding to the registered APP;
    判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配;Determine whether the first verification information matches the second verification information pre-stored by the client;
    在判断出所述第一验证信息与客户端预先保存的第二验证信息匹配时,则验证下发的所述语音配置文件正确。When it is determined that the first verification information matches the second verification information pre-stored by the client, it is verified that the delivered voice configuration file is correct.
  3. 根据权利要求2所述的语音合成文件的调用方法,其特征在于,判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配,具体包括:The method for calling a speech synthesis file according to claim 2, wherein determining whether the first verification information matches the second verification information stored in advance by the client specifically includes:
    根据所述已注册APP的标识从内置于客户端安全运行环境中预先保存的与所述已注册APP对应的第二验证信息;According to the identifier of the registered APP from the pre-stored second verification information corresponding to the registered APP built in the secure operating environment of the client;
    判断所述第一验证信息与第二验证信息是否匹配。Determine whether the first verification information matches the second verification information.
  4. 根据权利要求2所述的语音合成文件的调用方法,其特征在于,向所述已注册APP对应的服务端拉取所述语音配置文件之前,所述方法还包括:The method for invoking a voice synthesis file according to claim 2, wherein before the voice configuration file is pulled from the server corresponding to the registered APP, the method further comprises:
    向所述已注册APP对应的服务端发送所述APP开发者提供的反映所述APP开发者特征的语音数据,以便所述已注册APP对应的服务端通过内置的语音基础训练模型训练出所述APP开发者定制化的语音模型,并将预先储存的文本输入所述APP开发者定制化的语音模型以生成已注册APP需要的语音合成文件,所述语音基础训练模型为根据所述已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型。Sending voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP trains the training through the built-in voice basic training model APP developer's customized voice model, and input the pre-stored text into the APP developer's customized voice model to generate the voice synthesis file required by the registered APP, the voice basic training model is based on the registered APP To play voice, you need to use a model provided by several voice samples provided in advance, which can be shared by registered APPs.
  5. 根据权利要求1所述的语音合成文件的调用方法,其特征在于,所述已注册APP根据所述语音合成文件进行语音播放之前,所述方法还包括:The method for calling a speech synthesis file according to claim 1, wherein before the registered APP performs speech playback according to the speech synthesis file, the method further comprises:
    计算所述语音合成文件对应的第一摘要值;Calculating a first summary value corresponding to the speech synthesis file;
    判断根据所述语音配置文件内预先储存的所述语音合成文件对应的第二摘要值与所述第一摘要值是否相同;Determine whether the second summary value corresponding to the speech synthesis file pre-stored in the speech configuration file is the same as the first summary value;
    若判断出所述第二摘要值与所述第一摘要值相同时,则所述已注册APP根据所述语音合成文件进行语音播放。If it is determined that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
  6. 根据权利要求1所述的语音合成文件的调用方法,其特征在于,所述已注册APP根据所述语音合成文件进行语音播放,具体包括:所述已注册APP对应的服务端根据预设规则对所述语音合成文件进行加密;所述加密的语音合成文件根据内置解密模块解密后,由所述已注册APP进行语音播放。The method for invoking a voice synthesis file according to claim 1, wherein the registered APP performs voice playback according to the voice synthesis file, which specifically includes: the server corresponding to the registered APP performs The voice synthesis file is encrypted; after the encrypted voice synthesis file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
  7. 一种语音合成文件的调用装置,其特征在于,所述装置包括:A voice synthesis file calling device, characterized in that the device includes:
    检测单元,用于检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;The detection unit is used to detect whether there is a voice synthesis file required by the registered APP on the client terminal, and the registered APP is an APP that needs to use the voice synthesis file in advance;
    下载单元,用于若检测出客户端不存在所述语音合成文件,根据预先存储的已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;The downloading unit is configured to download the voice synthesis file from the server corresponding to the registered APP according to the pre-stored voice configuration file corresponding to the registered APP if it is detected that the voice synthesis file does not exist on the client. The configuration file has a built-in download address for the speech synthesis file;
    调用单元,用于若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。The calling unit is configured to call the voice synthesis file of the client if it is detected that the client has the voice synthesis file, so that the registered APP can perform voice playback according to the voice synthesis file.
  8. 根据权利要求7所述的语音合成文件的调用装置,其特征在于,所述装置还包括:The apparatus for calling a speech synthesis file according to claim 7, wherein the apparatus further comprises:
    拉取单元,用于向所述已注册APP对应的服务端拉取所述语音配置文件;A pulling unit, configured to pull the voice configuration file from the server corresponding to the registered APP;
    接收单元,用于接收所述已注册APP对应的服务端下发的语音配置文件,下发的所述语音配置文件包括所述已注册APP对应的服务端对下发的所述语音配置文件进行加密后,分配给所述已注册APP对应的第一验证信息;A receiving unit, configured to receive a voice configuration file delivered by a server corresponding to the registered APP, and the voice configuration file delivered includes the server corresponding to the registered APP performing the voice configuration file issued by the server After encryption, it is assigned to the first verification information corresponding to the registered APP;
    判断单元,用于判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配;The judging unit is used to judge whether the first verification information matches the second verification information pre-stored by the client;
    验证单元,用于在判断出所述第一验证信息与客户端预先保存的第二验证信息匹配时,验证下发的所述语音配置文件正确。The verification unit is configured to verify that the voice configuration file delivered is correct when it is determined that the first verification information matches the second verification information pre-stored by the client.
  9. 根据权利要求8所述的语音合成文件的调用装置,其特征在于,所述判断单元具体用于:The calling device of a speech synthesis file according to claim 8, wherein the judgment unit is specifically configured to:
    根据所述已注册APP的标识从内置于客户端安全运行环境中预先保存的与所述已 注册APP对应的第二验证信息;Second verification information corresponding to the registered APP pre-stored in the secure operating environment of the client according to the identifier of the registered APP;
    判断所述第一验证信息与第二验证信息是否匹配。Determine whether the first verification information matches the second verification information.
  10. 根据权利要求8所述的语音合成文件的调用装置,其特征在于,所述装置还包括:The apparatus for calling a speech synthesis file according to claim 8, wherein the apparatus further comprises:
    训练单元,用于向所述已注册APP对应的服务端发送所述APP开发者提供的反映所述APP开发者特征的语音数据,以便所述已注册APP对应的服务端通过内置的语音基础训练模型训练出所述APP开发者定制化的语音模型,并根据预先储存的文本由所述APP开发者定制化的语音模型生成已注册APP对应的语音合成文件,所述语音基础训练模型为根据所述已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型。The training unit is configured to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP, so that the server corresponding to the registered APP can pass the built-in voice basic training The model trains the APP developer's customized voice model, and generates a speech synthesis file corresponding to the registered APP from the APP developer's customized voice model according to the pre-stored text. The voice basic training model is based on The need for the registered APP to play voice needs to be a model trained by several voice samples provided in advance and can be shared by the registered APP.
  11. 根据权利要求7所述的语音合成文件的调用装置,其特征在于,所述装置还包括:The apparatus for calling a speech synthesis file according to claim 7, wherein the apparatus further comprises:
    计算单元,用于计算所述语音合成文件对应的第一摘要值;A calculation unit, configured to calculate a first summary value corresponding to the speech synthesis file;
    判断单元还用于判断根据所述语音配置文件内预先储存的所述语音合成文件对应的第二摘要值与所述第一摘要值是否相同;The judging unit is also used to judge whether the second summary value corresponding to the speech synthesis file pre-stored in the speech configuration file is the same as the first summary value;
    所述判断单元若判断出所述第二摘要值与所述第一摘要值相同时,则所述已注册APP根据所述语音合成文件进行语音播放。If the judgment unit judges that the second digest value is the same as the first digest value, the registered APP performs voice playback according to the voice synthesis file.
  12. 根据权利要求7所述的语音合成文件的调用装置,其特征在于,所述已注册APP根据所述语音合成文件进行语音播放,具体包括:The apparatus for calling a speech synthesis file according to claim 7, wherein the registered APP performs speech playback according to the speech synthesis file, specifically including:
    所述已注册APP对应的服务端根据预设规则对所述语音合成文件进行加密;所述加密的语音合成文件根据内置解密模块解密后,由所述已注册APP进行语音播放。The server corresponding to the registered APP encrypts the voice synthesis file according to a preset rule; after the encrypted voice synthesis file is decrypted according to the built-in decryption module, the registered APP performs voice playback.
  13. 一种语音系统,其特征在于,包括终端、服务器,终端包括运行在终端中的语音SDK、已注册APP以及APP开发者端;A voice system, including a terminal and a server, the terminal includes a voice SDK running in the terminal, a registered APP, and an APP developer terminal;
    所述APP开发者端用于向所述已注册APP对应的服务端发送所述APP开发者提供的反映所述APP开发者特征的语音数据;The APP developer terminal is used to send the voice data provided by the APP developer reflecting the characteristics of the APP developer to the server corresponding to the registered APP;
    所述服务端用于通过内置的语音基础训练模型训练出所述APP开发者定制化的语音模型,并将预先储存的文本输入所述APP开发者定制化的语音模型以生成已注册APP需要的语音合成文件,所述语音基础训练模型为根据所述已注册APP播放语音的需要利用预先提供的若干语音样本训练得到的、可供已注册APP共用的模型;The server is used to train the APP developer's customized voice model through the built-in voice basic training model, and input the pre-stored text into the APP developer's customized voice model to generate the registered APP needs. A voice synthesis file, the voice basic training model is a model that is obtained by training a number of voice samples provided in advance according to the needs of the registered APP to play voice and can be shared by registered APPs;
    所述语音SDK用于向所述已注册APP对应的服务端拉取所述语音配置文件;接收所述已注册APP对应的服务端下发的语音配置文件,所述下发语音配置文件包括所述 已注册APP对应的服务端对所述下发的语音配置文件进行加密后,分配给所述已注册APP对应的第一验证信息;判断所述第一验证信息与客户端预先保存的第二验证信息是否匹配;在判断出所述第一验证信息与客户端预先保存的第二验证信息匹配时,则验证所述下发的语音配置文件正确;检测客户端是否存在已注册APP所需要使用的语音合成文件,所述已注册APP为预先注册需要使用语音合成文件的APP;若检测出客户端不存在所述语音合成文件,根据已注册APP对应的语音配置文件从所述已注册APP对应的服务端下载所述语音合成文件,所述语音配置文件内置有所述语音合成文件的下载地址;若检测出客户端存在所述语音合成文件,调用客户端的所述语音合成文件,以供所述已注册APP根据所述语音合成文件进行语音播放。The voice SDK is used to pull the voice configuration file from the server corresponding to the registered APP; receiving the voice configuration file delivered by the server corresponding to the registered APP, the distributed voice configuration file includes all The server corresponding to the registered APP encrypts the delivered voice configuration file and distributes it to the first verification information corresponding to the registered APP; judging the first verification information and the second pre-stored by the client Whether the verification information matches; when judging that the first verification information matches the second verification information pre-stored by the client, verify that the delivered voice configuration file is correct; detect whether the client has a registered APP that needs to be used Voice synthesis file, the registered APP is an APP that needs to be pre-registered and needs to use a voice synthesis file; if it is detected that the voice synthesis file does not exist on the client, the voice configuration file corresponding to the registered APP corresponds to the registered APP Server downloads the speech synthesis file, and the speech configuration file has a built-in download address for the speech synthesis file; if it is detected that the speech synthesis file exists on the client, the speech synthesis file of the client is called for The registered APP performs voice playback according to the voice synthesis file.
  14. 一种计算机可读介质,其上存储有计算机可读指令,所述计算机可读指令可被处理器执行以实现权利要求1至6中任一项所述的方法。A computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method of any one of claims 1 to 6.
  15. 一种语音合成文件的调用设备,该设备包括用于存储计算机程序指令的存储器和用于执行程序指令的处理器,其中,当该计算机程序指令被该处理器执行时,触发该设备执行权利要求1至6中任一项所述的方法。A voice synthesis file calling device, the device includes a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, the device is triggered to execute the claims The method according to any one of 1 to 6.
PCT/CN2019/122545 2018-12-26 2019-12-03 Method and device for invoking speech synthesis file WO2020134896A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811596879.5 2018-12-26
CN201811596879.5A CN110021291B (en) 2018-12-26 2018-12-26 Method and device for calling voice synthesis file

Publications (1)

Publication Number Publication Date
WO2020134896A1 true WO2020134896A1 (en) 2020-07-02

Family

ID=67188692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/122545 WO2020134896A1 (en) 2018-12-26 2019-12-03 Method and device for invoking speech synthesis file

Country Status (3)

Country Link
CN (1) CN110021291B (en)
TW (1) TW202027027A (en)
WO (1) WO2020134896A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421542A (en) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021291B (en) * 2018-12-26 2021-01-29 创新先进技术有限公司 Method and device for calling voice synthesis file
CN111953853A (en) * 2020-07-31 2020-11-17 中国工商银行股份有限公司 Voice reading processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187658A1 (en) * 2002-03-29 2003-10-02 Jari Selin Method for text-to-speech service utilizing a uniform resource identifier
WO2005002160A1 (en) * 2003-06-30 2005-01-06 Nortel Networks Limited Method and system for providing text-to-speech instant messaging
CN101098507A (en) * 2007-06-29 2008-01-02 中兴通讯股份有限公司 System and method for providing speech synthesis application united development platform
CN104992703A (en) * 2015-07-24 2015-10-21 百度在线网络技术(北京)有限公司 Speech synthesis method and system
CN105161091A (en) * 2015-08-24 2015-12-16 北京开元智信通软件有限公司 Vehicle-mounted TTS voice broadcast method, system and vehicle-mounted terminal
CN105354096A (en) * 2015-10-29 2016-02-24 中国电子科技集团公司第二十八研究所 BS architecture-based speech automatic generating and broadcasting method
US20180075838A1 (en) * 2015-11-10 2018-03-15 Paul Wendell Mason Method and system for Using A Vocal Sample to Customize Text to Speech Applications
CN110021291A (en) * 2018-12-26 2019-07-16 阿里巴巴集团控股有限公司 A kind of call method and device of speech synthesis file

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7418382B1 (en) * 1998-10-02 2008-08-26 International Business Machines Corporation Structure skeletons for efficient voice navigation through generic hierarchical objects
US6901431B1 (en) * 1999-09-03 2005-05-31 Cisco Technology, Inc. Application server providing personalized voice enabled web application services using extensible markup language documents
CN102968992B (en) * 2012-11-26 2014-11-05 北京奇虎科技有限公司 Voice identification processing method for internet explorer and internet explorer
CN103118002A (en) * 2012-12-21 2013-05-22 北京飞漫软件技术有限公司 Method of speech sound used as secret key to achieve data resource cloud storage management
US9430465B2 (en) * 2013-05-13 2016-08-30 Facebook, Inc. Hybrid, offline/online speech translation system
US9444935B2 (en) * 2014-11-12 2016-09-13 24/7 Customer, Inc. Method and apparatus for facilitating speech application testing
CN107315958A (en) * 2016-04-26 2017-11-03 北京京东尚科信息技术有限公司 The legality identification method and device of data object
US20170345410A1 (en) * 2016-05-26 2017-11-30 Tyler Murray Smith Text to speech system with real-time amendment capability
KR101806499B1 (en) * 2016-06-10 2017-12-07 주식회사 지어소프트 Method for managing files and apparatus using the same
CN107123424B (en) * 2017-04-27 2022-03-11 腾讯科技(深圳)有限公司 Audio file processing method and device
CN107391168B (en) * 2017-06-08 2018-07-03 腾讯科技(深圳)有限公司 animation loading method and device and request processing method and device
CN107517252A (en) * 2017-08-22 2017-12-26 福建中金在线信息科技有限公司 A kind of file download control method, apparatus and system
CN108234636A (en) * 2017-12-29 2018-06-29 阿里巴巴集团控股有限公司 Voice broadcast method, device, system and intellectual broadcast equipment
CN108809960A (en) * 2018-05-23 2018-11-13 北京五八信息技术有限公司 A kind of file uploads and method for down loading, device, equipment, system and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030187658A1 (en) * 2002-03-29 2003-10-02 Jari Selin Method for text-to-speech service utilizing a uniform resource identifier
WO2005002160A1 (en) * 2003-06-30 2005-01-06 Nortel Networks Limited Method and system for providing text-to-speech instant messaging
CN101098507A (en) * 2007-06-29 2008-01-02 中兴通讯股份有限公司 System and method for providing speech synthesis application united development platform
CN104992703A (en) * 2015-07-24 2015-10-21 百度在线网络技术(北京)有限公司 Speech synthesis method and system
CN105161091A (en) * 2015-08-24 2015-12-16 北京开元智信通软件有限公司 Vehicle-mounted TTS voice broadcast method, system and vehicle-mounted terminal
CN105354096A (en) * 2015-10-29 2016-02-24 中国电子科技集团公司第二十八研究所 BS architecture-based speech automatic generating and broadcasting method
US20180075838A1 (en) * 2015-11-10 2018-03-15 Paul Wendell Mason Method and system for Using A Vocal Sample to Customize Text to Speech Applications
CN110021291A (en) * 2018-12-26 2019-07-16 阿里巴巴集团控股有限公司 A kind of call method and device of speech synthesis file

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421542A (en) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium

Also Published As

Publication number Publication date
CN110021291A (en) 2019-07-16
CN110021291B (en) 2021-01-29
TW202027027A (en) 2020-07-16

Similar Documents

Publication Publication Date Title
US10200198B2 (en) Making cryptographic claims about stored data using an anchoring system
US10958436B2 (en) Methods contract generator and validation server for access control of contract data in a distributed system with distributed consensus
US20190116038A1 (en) Attestation With Embedded Encryption Keys
US10410018B2 (en) Cryptographic assurances of data integrity for data crossing trust boundaries
CN110266644B (en) Receipt storage method and node combining code marking and transaction types
WO2020134896A1 (en) Method and device for invoking speech synthesis file
CN111767578B (en) Data inspection method, device and equipment
CN110245947B (en) Receipt storage method and node combining conditional restrictions of transaction and user types
WO2020253469A1 (en) Hot update method and apparatus for script file package
US9954900B2 (en) Automating the creation and maintenance of policy compliant environments
US11405203B2 (en) System and method for securely transferring data using generated encryption keys
US11263632B2 (en) Information sharing methods, apparatuses, and devices
CN110214323A (en) Surround area's abstract model
CN106161367A (en) A kind of verifying dynamic password method and system, client and server
CN115580413B (en) Zero-trust multi-party data fusion calculation method and device
CN112131595A (en) Safe access method and device for SQLite database file
US8745375B2 (en) Handling of the usage of software in a disconnected computing environment
CN111930846B (en) Data processing method, device and equipment
WO2019210471A1 (en) Data invoking method and data invoking apparatus
CN110602051B (en) Information processing method based on consensus protocol and related device
Vella et al. D-Cloud-Collector: Admissible Forensic Evidence from Mobile Cloud Storage
US11138319B2 (en) Light-weight context tracking and repair for preventing integrity and confidentiality violations
CN113129017B (en) Information sharing method, device and equipment
US20240095336A1 (en) Generating token value for enabling a non-application channel to perform operation
JP2002006739A (en) Authentication information generating device and data verifying device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903203

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19903203

Country of ref document: EP

Kind code of ref document: A1