WO2022247267A1 - 语音播放系统、语音播放音色配置方法及相关装置 - Google Patents

语音播放系统、语音播放音色配置方法及相关装置 Download PDF

Info

Publication number
WO2022247267A1
WO2022247267A1 PCT/CN2021/141962 CN2021141962W WO2022247267A1 WO 2022247267 A1 WO2022247267 A1 WO 2022247267A1 CN 2021141962 W CN2021141962 W CN 2021141962W WO 2022247267 A1 WO2022247267 A1 WO 2022247267A1
Authority
WO
WIPO (PCT)
Prior art keywords
timbre
configuration information
field communication
voice
voice playback
Prior art date
Application number
PCT/CN2021/141962
Other languages
English (en)
French (fr)
Inventor
王中一
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to KR1020227032912A priority Critical patent/KR20220137771A/ko
Priority to JP2022552530A priority patent/JP7432000B2/ja
Priority to US17/895,154 priority patent/US20220407562A1/en
Publication of WO2022247267A1 publication Critical patent/WO2022247267A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication

Definitions

  • the present disclosure relates to the technical field of data processing, specifically to the technical fields of voice playback and near-field communication, and in particular to a voice playback system, a voice playback tone color configuration method, a device, electronic equipment, a computer-readable storage medium, and a computer program product.
  • the existing intelligent voice playback device acquires configuration information based on human-computer interaction in a single way.
  • This method usually includes: downloading from the server or receiving corresponding configuration information from other storage devices according to the user's voice instructions, button instructions, and other methods.
  • Embodiments of the present disclosure provide a voice playback system, a voice playback tone color configuration method, a device, electronic equipment, a computer-readable storage medium, and a computer program product.
  • an embodiment of the present disclosure proposes a voice playback system, including: a near-field communication information storage for storing tone color configuration information that can be read through a near-field communication mechanism; a voice playback system equipped with a near-field communication scanner The main body is used to read the timbre configuration information in the near-field communication information storage through the near-field communication scanner, and play the voice content according to the timbre corresponding to the timbre configuration information.
  • an embodiment of the present disclosure provides a voice playback timbre configuration method applied to the voice playback system described in any implementation manner of the first aspect, including: in response to reading multiple different The timbre configuration information is generated based on multiple timbre configuration information, and the fused timbre configuration information is generated; the voice content is played according to the fused timbre corresponding to the fused timbre configuration information.
  • the embodiment of the present disclosure proposes a voice playback timbre configuration device applied to the voice playback body in the voice playback system described in any implementation manner of the first aspect, including: a timbre fusion unit configured to respond to A plurality of different timbre configuration information is read within a preset time period, and fusion timbre configuration information is generated based on the multiple timbre configuration information; the voice playback unit is configured to play the voice content according to the fusion timbre corresponding to the fusion timbre configuration information .
  • an embodiment of the present disclosure provides an electronic device, the electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the at least one processor , the instructions are executed by at least one processor, so that the at least one processor can implement the voice playback tone color configuration method described in any implementation manner in the second aspect when executed.
  • the embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to implement the voice playback timbre described in any implementation manner in the second aspect configuration method.
  • an embodiment of the present disclosure provides a computer program product including a computer program.
  • the computer program When the computer program is executed by a processor, the method for configuring voice playback timbres as described in any implementation manner in the second aspect can be implemented.
  • the voice playback system includes: a near-field communication information storage for storing tone color configuration information that can be read through a near-field communication mechanism; a voice playback body provided with a near-field communication scanner for The field communication scanner reads the timbre configuration information in the near field communication information storage, and presents the voice playback corresponding to the timbre configuration information.
  • the voice playback system independently stores the timbre configuration information in the near-field communication information storage, so that the voice playback body can read the timbre configuration information from the near-field communication information storage through the near-field recognition mechanism, and then Configure the timbre used to play the voice content according to the timbre configuration information, and play the voice content according to the configured timbre, so as to achieve flexible configuration of the timbre used to play the voice content by replacing the memory storing different timbre configuration information .
  • FIG. 1 is a schematic structural diagram of a voice playback system provided by an embodiment of the present disclosure
  • FIG. 2 is an exemplary schematic diagram of another voice playback system provided by an embodiment of the present disclosure
  • FIG. 3 is a flow chart of a voice playback timbre configuration method provided by an embodiment of the present disclosure
  • FIG. 4 is a structural block diagram of a voice playback tone color configuration device provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of an electronic device suitable for implementing a voice playback tone color configuration method provided by an embodiment of the present disclosure.
  • the acquisition, storage and application of the user's personal information involved (for example, the tone corresponding to the tone configuration information is the user's personal tone), all comply with the provisions of relevant laws and regulations, and necessary confidentiality measures have been taken. And do not violate public order and good customs.
  • FIG. 1 shows a schematic structural diagram of a voice playing system 100 .
  • the voice playback system 100 includes: a voice playback body 101 and a near field communication information storage 102 .
  • the near field communication information memory 102 is used for storing the timbre configuration information that can be read through the near field communication mechanism;
  • the timbre configuration information is configuration information for instructing the voice playback body 101 to adjust the timbre used to play the voice content to the target timbre.
  • the timbre configuration corresponding to the timbre of the cartoon character A is configured, that is, the sound output parameters are adjusted to be the same as the sound parameters of the cartoon character A, so as to restore the timbre of the cartoon character A.
  • the sound parameters include: treble amplitude, bass amplitude, Information such as the vibration frequency of the audio.
  • the types of timbres are usually not limited to real timbres, and may also include virtual timbres, synthetic timbres, and the like.
  • the voice content played by the voice playback body can be the voice content imported by the user acquired in real time, or can be the voice content obtained from the storage medium of the server or non-local terminal through network transmission in advance, or can be It is the text information obtained through the above methods, and the voice content obtained by converting the text information through text-to-speech technology (Text To Speech, TTS for short).
  • TTS Text To Speech
  • the user can send an instruction to the voice player through the terminal device in advance, and directly operate the voice player body 101 to obtain the text information of the voice content to be played.
  • the voice playback body 101 reads the tone color configuration information in the near field communication information storage through the near field communication scanner, and according to the The timbre corresponding to the timbre configuration information, and then convert the above text information through TTS to obtain the voice content, and finally complete the playback of the voice content through the configured timbre.
  • the voice playback system also includes a voice package storage server, which is connected to the voice playback main body in communication, and can return the voice corresponding to the download request according to the download request of the voice playback main body. package to expand the diversity of voice content and meet the needs of users.
  • the information storage capacity of the near-field communication information storage 102 is generally small, if the data volume of the specific timbre configuration information is large and exceeds the effective storage limit of the near-field communication information storage 102, it can also be controlled to only further
  • the index or link of the specific timbre configuration information found is stored in the near-field communication scanner provided by the voice player body 101, serving as "false timbre configuration information" that can help obtain the real timbre configuration information.
  • NFC Near field communication
  • RFID radio frequency identification
  • interconnection technologies by integrating the functions of inductive card readers, inductive cards and point-to-point communication on a single chip, mobile terminals are used to realize mobile payment, electronic ticketing, access control, mobile Identification, anti-counterfeiting and other applications.
  • RFID radio frequency identification
  • Bluetooth In addition to NFC between RFID, infrared, Bluetooth and other technologies that can exchange data in a short distance also belong to a near-field communication method. Compared with Bluetooth and infrared, the cost of near-field communication based on NFC chips is relatively low.
  • the near-field communication information memory is a near-field communication chip (NFC chip) storing tone color configuration information.
  • the present disclosure applies near-field communication technology to the field of voice playback, and conveniently adjusts the timbre for playing voice content presented by the voice playback body by means of a near-field communication information storage that independently stores timbre configuration information.
  • the voice playback body 101 may be embodied as a voice playback device without other functions, or may be a smart speaker or a smart mobile terminal integrated with voice playback functional components.
  • the voice playback system provided in this embodiment independently stores the timbre configuration information in the near-field communication information storage, so that the voice playback body can read the timbre configuration information from the near-field communication information storage through the near-field recognition mechanism, and then The corresponding timbre is configured according to the timbre configuration information, and the voice content is played with the timbre, and the timbre can be flexibly changed by replacing the memory storing different timbre configuration information.
  • the present disclosure also provides a schematic diagram of another voice playback system through FIG. 2 .
  • the NFC chip that stores the timbre configuration information can be embedded in the bottle cap 1021, the badge 1022, or the card 1023, so as to use a carrier with a larger surface area and stronger material characteristics. Protect the data stored in the NFC chip.
  • some larger-sized carriers can also be used, such as toys, boxes, bases, etc. The size and shape of the carrier are not limited here. Choose flexibly according to actual needs.
  • the near-field communication information memory is specifically an NFC chip
  • the information therein can be read from the NFC chip by the corresponding scanner without supplying power to the NFC chip, so the corresponding carrier does not need to be provided with a corresponding power supply.
  • the technology used in the near-field communication information storage is bluetooth or infrared technology, it is also necessary to add corresponding power supply components according to actual needs.
  • a timbre storage server 103 is added in FIG. 2 .
  • the timbre storage server 103 communicates with the voice player body 101 and is used to return the target tone color configuration information corresponding to the download request according to the download request of the voice player body. That is, the function of the timbre storage server 103 is to obtain the real, Complete timbre configuration information, and then complete the timbre configuration according to the received target timbre configuration information, and perform corresponding voice playback.
  • the voice playback body 101 in order to store the near-field communication information storage 102 and maintain the required near-field communication distance, can also be provided with an opening for placing, accommodating or wrapping the near-field communication information storage 102 , the shape of the opening corresponds to the shape of the carrier embedded with the NFC information memory 102 .
  • the carrier of the near-field communication information memory 102 is a coin-shaped plastic medal
  • the voice player body 101 can be provided with a coin-shaped grid to place the coin-shaped plastic medal, or be provided with a coin-shaped plastic medal for inputting. interior space etc.
  • the near-field communication information storage 102 it is also possible to set the near-field communication information storage 102 to be magnetically adsorbed on the outer surface of the voice playback body 101, that is, different and capable The magnetic poles attract each other, so that the near-field communication information storage 102 is attracted to the outer surface of the voice playback body 101 by magnetic force. According to the location of the magnetic poles, the near field communication information storage 102 can also be adsorbed inside the voice playback body 101 .
  • a shielding storage box may also be provided on the voice playback body 101 (Not shown in Fig. 1 and Fig. 2), the shielding storage box is used to block the near-field communication scanner from reading the timbre configuration information stored in the near-field communication information storage in the space inside the box, for example, using a blockable
  • the shielding storage box is made of a specific material or a specific weave of the material that interrupts signal transmission.
  • the previous embodiment of setting the shielding storage box is aimed at the scenario where there is only reading the configuration information in one near-field communication information storage 102 under the near-field communication distance, that is, it does not support processing from different near-field communication information storage 102 in this scenario.
  • the ability of the field communication information storage 102 to read different configuration information can also be The configuration information is fused to obtain the fused timbre configuration information.
  • the fused timbre configuration information can be the fused timbre configuration information that is completely different from any timbre configuration information generated by operations such as superposition and replacement based on different timbre configuration information, or it can be Configure the obtained plurality of timbre configuration information according to the preset configuration rules, and use timbres corresponding to different timbre configuration information to play a part of the complete voice content.
  • the soprano timbre and the soprano timbre can be Bass tones are fused to obtain a new fusion tone.
  • the fusion process it can also be configured according to the predetermined weight rules to obtain a variety of different fusion tones.
  • the weight rules can be adjusted according to actual needs. , that is, under the same combination of timbres, different fusion timbre results can be obtained according to the corresponding weight rules during fusion, so that richer timbres can be obtained.
  • the voice content to be played is a storybook, in which a plurality of different characters A, B, and C are set, and the timbre configuration information stored in the first near-field communication information storage corresponds to a soprano Timbre, the tone configuration information stored in the second near-field communication information storage corresponds to the alto tone, and the tone configuration information stored in the third near-field communication information storage corresponds to the bass tone, corresponding to the above-mentioned roles A, B, and C
  • the complete playback of the storybook is completed, so that different playback timbres can be used in the same voice content to achieve effects such as multi-roles and multi-scenes, and improve The playback effect of voice content.
  • FIG. 3 is a flow chart of a method for configuring voice playback timbre provided by an embodiment of the present disclosure, wherein the process 300 includes the following steps:
  • Step 301 In response to reading a plurality of different timbre configuration information within a preset time period, generating fused timbre configuration information based on the plurality of timbre configuration information;
  • This step aims to read a plurality of different timbre configuration information within a preset time period by the execution subject of the voice playback timbre configuration method (for example, the voice playback body 101 shown in FIG. 1 ), and fuse the obtained multiple timbre configurations. information, to obtain the fusion tone configuration information, so as to obtain the corresponding fusion tone according to the fusion tone configuration information, and use the fusion tone to play the voice content.
  • the voice playback timbre configuration method for example, the voice playback body 101 shown in FIG. 1
  • a plurality of different timbre configuration information usually come from different near field communication information storages 102, but it does not rule out the special case that a plurality of different timbre configuration information is stored in one near field communication information storage 102; preset time The segment can be set to 5 seconds, 10 seconds or a custom duration. You can also pre-set preset time segments of different durations, and determine the above weighting rules according to the difference in the interval from the reading moment of the previous timbre configuration information, and get corresponding fusion mechanism.
  • two different timbre configuration information are obtained continuously, and when the reading interval is less than 5 seconds, it is determined that when the fused timbre configuration information is generated, the fusion weight relationship between the first timbre configuration information and the second timbre configuration information is 2 : 1, when the reading interval is greater than 5 seconds or less than 10 seconds, it is determined that when the fusion timbre configuration information is generated, the fusion weight relationship between the first timbre configuration information and the second timbre configuration information is 1:1.
  • the reading interval is greater than 10 seconds, it is determined that when generating the fused timbre configuration information, the fusion weight relationship between the first timbre configuration information and the second timbre configuration information is 1:2.
  • Step 302 Play the voice content according to the fusion tone corresponding to the fusion tone configuration information.
  • the above-mentioned executive body (such as the voice playback body 101 shown in FIG.
  • the voice playback body 101 In the case of the timbre configuration information (that is, the near-field communication information memory 102 storing the timbre configuration information is not within the near-field communication distance), the voice playback corresponding to the default timbre configuration information will be presented, and no longer continue based on the previous near-field communication information.
  • the tone color corresponding to the tone color configuration information read in the field communication information storage 102 is played and the voice is played.
  • the user previously obtained a commemorative badge N of a certain anime character sold in a limited edition, and the commemorative badge N records the network link of the timbre configuration information of the voice actor A corresponding to the anime character.
  • the user After the user obtains the badge N, he can place it next to the smart speaker that supports voice playback in his home, so that the smart speaker can read from the badge N through the near-field communication technology the timbre that is the same as that of the voice actor A.
  • the link of the corresponding timbre configuration information According to the link, the smart speaker downloads the timbre configuration information corresponding to the timbre of voice actor A from the storage server, and the functional components that control voice playback are configured according to the timbre configuration information. Voice to play voice content.
  • the badge N was accidentally discarded by the user, and the smart speaker failed to detect the badge N within the near-field communication distance for two consecutive weeks, so it stopped playing the voice content according to the timbre of voice actor A. Instead, configure the tone as the default tone.
  • the local default timbre is set to Tampering with the tone configuration information obtained in other history, you can also set the smart speaker to delete the tone configuration information configured in the historical data after a preset time, or set the data write permission of the smart speaker to a specific user.
  • the present disclosure also provides an embodiment of a device for configuring voice playback timbre through FIG. 4 .
  • This device embodiment corresponds to the method embodiment shown in FIG. 3 , and the device can specifically Used in various electronic equipment.
  • the device 400 for configuring voice playback timbres in this embodiment may include: a timbre fusion unit 401 and a voice playback unit 402 .
  • the timbre fusion unit 401 is configured to generate fusion timbre configuration information based on a plurality of timbre configuration information in response to reading a plurality of different timbre configuration information within a preset time period;
  • the voice playback unit 402 is configured to The fusion tone corresponding to the fusion tone configuration information plays the voice content.
  • the specific processing of the timbre fusion unit 401, the voice playback unit 402 and the technical effects brought by them can refer to the relevant steps 301-302 in the corresponding embodiment of FIG. description and will not be repeated here.
  • the voice playback timbre configuration device 400 may also include:
  • the failure recovery default unit is configured to modify the playing tone to be the default tone in response to not reading the tone configuration information for a continuous preset period of time.
  • the present disclosure also provides an electronic device, the electronic device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information executable by the at least one processor.
  • An instruction the instruction is executed by at least one processor, so that the at least one processor can implement any of the voice playback tone color configuration methods described above.
  • the present disclosure also provides a readable storage medium, the readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to implement any of the voice playback tone color configuration methods described above. .
  • An embodiment of the present disclosure provides a computer program product.
  • the computer program is executed by a processor, the method for configuring voice playback timbres in any of the above contents can be realized.
  • FIG. 5 shows a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 500 includes a computing unit 501 that can execute according to a computer program stored in a read-only memory (ROM) 502 or loaded from a storage unit 508 into a random-access memory (RAM) 503. Various appropriate actions and treatments. In the RAM 503, various programs and data necessary for the operation of the device 500 can also be stored.
  • the computing unit 501, ROM 502, and RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to the bus 504 .
  • the I/O interface 505 includes: an input unit 506, such as a keyboard, a mouse, etc.; an output unit 507, such as various types of displays, speakers, etc.; a storage unit 508, such as a magnetic disk, an optical disk, etc. ; and a communication unit 509, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 509 allows the device 500 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 501 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 501 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 501 executes various methods and processes described above, such as a method for configuring voice playback timbres.
  • the voice playback timbre configuration method can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 508 .
  • part or all of the computer program may be loaded and/or installed on the device 500 via the ROM 502 and/or the communication unit 509.
  • the computer program When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the method for configuring voice playback timbres described above can be performed.
  • the computing unit 501 may be configured in any other appropriate way (for example, by means of firmware) to execute the voice playback timbre configuration method.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the management difficulties in traditional physical host and virtual private server (VPS, Virtual Private Server) services Large and weak business expansion.
  • cloud server also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the management difficulties in traditional physical host and virtual private server (VPS, Virtual Private Server) services Large and weak business expansion.
  • VPN Virtual Private Server
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Abstract

一种语音播放系统、语音播放音色配置方法、装置、电子设备、计算机可读存储介质及计算机程序产品,涉及语音播放、近场通信技术领域。语音播放系统(100)包括:近场通信信息存储器(102),用于存储可通过近场通信机制读取的音色配置信息;设置有近场通信扫描器的语音播放本体(101),用于通过该近场通信扫描器读取近场通信信息存储器(102)中的音色配置信息,并根据该音色配置信息对应的音色播放语音内容。该系统可实现灵活的音色配置,提升用户与智能设备之间的交互效率。

Description

语音播放系统、语音播放音色配置方法及相关装置
相关申请的交叉引用
本专利申请要求于2021年05月25日提交的、申请号为202110570865.1、发明名称为“语音播放系统、语音播放音色配置方法及相关装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本公开涉及数据处理技术领域,具体涉及语音播放、近场通信技术领域,尤其涉及一种语音播放系统、一种语音播放音色配置方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
现有智能语音播放设备基于人机交互方式获取配置信息的方式较为单一。这种方式通常为:根据用户的语音指示、按键指示等方式从服务器下载或从其他存储设备中接收相应的配置信息。
如何进一步丰富用户与智能设备之间的交互方式和提升交互效率,是本领域技术人员的研究重点。
发明内容
本公开实施例提出了一种语音播放系统、一种语音播放音色配置方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
第一方面,本公开实施例提出了一种语音播放系统,包括:近场通信信息存储器,用于存储可通过近场通信机制读取的音色配置信息;设置有近场通信扫描器的语音播放本体,用于通过该近场通信扫描器读取该近场通信信息存储器中的音色配置信息,并根据该音色配置信息对应的音色播放语音内容。
第二方面,本公开实施例提供了一种应用于如第一方面任一实现方式描述的语音播放系统的语音播放音色配置方法,包括:响应于在预设时间段内读取到多个不同的音色配置信息,基于多个该音色配置信息生成融合音色配置 信息;根据该融合音色配置信息对应的融合音色播放语音内容。
第三方面,本公开实施例提出了一种应用于如第一方面任一实现方式描述的语音播放系统中的语音播放本体的语音播放音色配置装置,包括:音色融合单元,被配置成响应于在预设时间段内读取到多个不同的音色配置信息,基于多个该音色配置信息生成融合音色配置信息;语音播放单元,被配置成根据该融合音色配置信息对应的融合音色播放语音内容。
第四方面,本公开实施例提供了一种电子设备,该电子设备包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,该指令被至少一个处理器执行,以使至少一个处理器执行时能够实现如第二方面中任一实现方式描述的语音播放音色配置方法。
第五方面,本公开实施例提供了一种存储有计算机指令的非瞬时计算机可读存储介质,该计算机指令用于使计算机执行时能够实现如第二方面中任一实现方式描述的语音播放音色配置方法。
第六方面,本公开实施例提供了一种包括计算机程序的计算机程序产品,该计算机程序在被处理器执行时能够实现如第二方面中任一实现方式描述的语音播放音色配置方法。
本公开实施例提供的语音播放系统包括:近场通信信息存储器,用于存储可通过近场通信机制读取到的音色配置信息;设置有近场通信扫描器的语音播放本体,用于通过近场通信扫描器读取近场通信信息存储器中的音色配置信息,并呈现与音色配置信息相对应的语音播放。
本公开实施例所提供的语音播放系统将音色配置信息独立存储至近场通信信息存储器中,使得语音播放本体可以通过近场识别机制隔空从近场通信信息存储器中读取到音色配置信息,进而根据音色配置信息对用于播放语音内容的音色进行配置,并根据配置后的音色播放语音内容,以实现通过更换存储有不同音色配置信息的存储器来实现对用于播放语音内容的音色进行灵活配置。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:
图1是本公开实施例提供的一种语音播放系统的结构示意图;
图2是本公开实施例提供的另一种语音播放系统的示例性示意图;
图3为本公开实施例提供的一种语音播放音色配置方法的流程图;
图4为本公开实施例提供的一种语音播放音色配置装置的结构框图;
图5为本公开实施例提供的一种适用于执行语音播放音色配置方法的电子设备的结构示意图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
本公开的技术方案中,所涉及的用户个人信息(例如音色配置信息对应的音色为用户的个人音色)的获取,存储和应用等,均符合相关法律法规的规定,采取了必要的保密措施,且不违背公序良俗。
图1示出了一种语音播放系统100的结构示意图。
语音播放系统100中包括有:语音播放本体101、近场通信信息存储器102。其中,近场通信信息存储器102用于存储可通过近场通信机制读取到的音色配置信息;语音播放本体101上设置有近场通信扫描器,用于通过该近场通信扫描器读取该近场通信信息存储器中的音色配置信息,并根据该音色配置信息对应的音色播放语音内容。
其中,音色配置信息为用于指示语音播放本体101将后续用于播放语音内容的音色调整为目标音色的配置信息,例如当用于播放语音内容的音色为卡通人物A的音色时,可根据与该卡通人物A的音色对应的音色配置进行配置,即将声音输出参数调整为与该卡通人物A的声音参数相同,以还原该卡 通人物A的音色,通常该声音参数包括:高音幅度、低音幅度、音频的振动频率等信息。当然,音色的种类通常并不局限于真人音色,还可以包括虚拟音色、合成音色等。
其中,语音播放本体所播放的语音内容,可以为实时获取的由用户传入的语音内容,也可以为预先通过网络传输等方式从服务器、非本地终端的存储介质中获取的语音内容,也可以是通过上述方式获取到的文本信息,以通过文本-语音技术(Text To Speech,简称TTS)等方式将文本信息转换得到的语音内容。
示例性的,用户可以预先通过终端设备向语音播放本地发出指令、直接操作语音播放本体101以获取到待播放语音内容的文本信息后,在用户使用用于存储有可通过近场通信机制读取的音色配置信息的进场语音信息存储器向该语音播放本体101发送语音配置信息后,该语音播放本体101通过近场通信扫描器读取该近场通信信息存储器中的音色配置信息,并根据该音色配置信息对应的音色,然后将上述文本信息通过TTS转化得到语音内容,最终通过配置好的音色完成对该语音内容的播放。
在一些可选的实施例中,该语音播放系统中还包括语音包存储服务器,该语音包存储服务器与语音播放本体通信相连,可以根据该语音播放本体的下载请求返回与该下载请求对应的语音包,以扩充语音内容的多样性,贴合用户的需求。
另外,考虑到近场通信信息存储器102的信息存储容量普遍偏小,若具体的音色配置信息的数据量较大、超出了近场通信信息存储器102的有效存储上限,还可以控制仅将能够进一步查询到具体的音色配置信息的索引或链接存储至语音播放本体101设置的近场通信扫描器中,充当能够帮忙获取到真正的音色配置信息的“伪音色配置信息”。
近场通信,英文全称为Near Field Communication,英文简称为NFC,是一种新兴的技术,使用了NFC技术的设备(例如移动电话)可以在彼此靠近的情况下进行数据交换,是由非接触式射频识别(RFID)及互连互通技术整合演变而来的,通过在单一芯片上集成感应式读卡器、感应式卡片和点对点通信的功能,利用移动终端实现移动支付、电子票务、门禁、移动身份识别、防伪等应用。除借助介于RFID的NFC外,红外、蓝牙等可在近距离进行数据交换的技术也属于一种近场通信方式。相比蓝牙、红外,基于NFC芯片的 近场通信方式,成本相对较低。此时,近场通信信息存储器即为存储有音色配置信息的近场通信芯片(NFC芯片)。
本公开就是将近场通信技术应用在了语音播放领域,借助独立存储有音色配置信息的近场通信信息存储器来便捷的调整语音播放本体所呈现的用于播放语音内容的音色。
具体的,语音播放本体101可具体表现为不带其它功能的语音播放装置,也可以表现为集成有语音播放功能组件的智能音箱、智能移动终端等。
本实施例所提供的语音播放系统,将音色配置信息独立存储至近场通信信息存储器中,使得语音播放本体可以通过近场识别机制隔空从近场通信信息存储器中读取到音色配置信息,进而根据音色配置信息配置对应的音色,并以该音色完成语音内容的播放,并可通过更换存储有不同音色配置信息的存储器来实现对音色进行灵活的变更。
在上述实施例的基础上,本公开还通过图2提供了另一种语音播放系统的示意图。
如图2所示,考虑到近场通信信息存储器的可用性,结合实际情况还为其设置了多种多样的载体,例如图2所示的瓶盖1021、徽章1022、卡牌1023,以NFC芯片作为具体的近场通信信息存储器为例,存储有音色配置信息的NFC芯片可内嵌于瓶盖1021、徽章1022、或卡牌1023中,以借助具有更大表面积、更结实的材料特性的载体保护NFC芯片中存储的数据。除瓶盖、徽章、卡牌、勋章等几种尺寸较小的载体外,也可以选用一些尺寸较大的载体,例如玩具、盒子、底座等等,此处不限定载体的尺寸、形态,可根据实际需求灵活选择。
需要说明的是,当近场通信信息存储器具体为NFC芯片时,无需向NFC芯片供电即可由相应的扫描器从NFC芯片中读取到其中的信息,因此其相应的载体中无需设置相应的供电组件。但若近场通信信息存储器所采用的技术为蓝牙或红外等技术时,还需要结合实际需求增设相应的供电组件。
此外,图2中还增设了音色存储服务器103,该音色存储服务器103与语音播放本体101通信连接,用于根据语音播放本体的下载请求返回与下载请求对应的目标音色配置信息。即该音色存储服务器103的作用是在语音播放本体无法直接从近场通信信息存储器102中读取到完整的音色配置信息 时,根据读取到的“伪音色配置信息”来从中获取到真实、完整的音色配置信息,进而根据接收到的目标音色配置信息完成音色配置,并进行相应的语音播放。
在上述任意实施例的基础上,为便于存放近场通信信息存储器102以及保持需要的近场通信距离,语音播放本体101上还可以设置有用于放置、容纳或包裹近场通信信息存储器102的开口,该开口的形状与内嵌有近场通信信息存储器102的载体的形状相对应。例如当近场通信信息存储器102的载体为硬币状的塑料勋章时,语音播放本体101上可设置有硬币状态的栅格来放置该硬币状的塑料勋章,或者设置有供硬币状的塑料勋章投入的内部空间等。
除上述开口式设计外,还可以设置近场通信信息存储器102通过磁吸方式吸附于语音播放本体101的外表面,即分别在语音播放本体101和近场通信信息存储器102上设置有不同且可以互相吸引的磁极,以利用磁力将近场通信信息存储器102吸附在语音播放本体101的外表面。根据磁极的设置位置,也可以将近场通信信息存储器102吸附在语音播放本体101的内部。
在上述任意实施例的基础上,为避免在近场通信距离下可能同时存在多个近场通信信息存储器102所导致的配置信息读取干扰,还可以在语音播放本体101上设置有屏蔽收纳盒(图1和图2均未示出),该屏蔽收纳盒用于阻断近场通信扫描器对处于盒内空间的近场通信信息存储器中存储的音色配置信息的读取,例如采用可阻断信号传输的特定材料或材料的特定编织方式来制作该屏蔽收纳盒。
上一设置屏蔽收纳盒的实施例所针对的是在近场通信距离下仅存在读取一个近场通信信息存储器102中的配置信息的场景,即在此场景下不支持处理分别从不同的近场通信信息存储器102读取到不同的配置信息的能力。但在某些支持处理从不同的近场通信信息存储器102读取到不同的配置信息的能力的场景下时,还可以对分别从不同的近场通信信息存储器102中读取到的不同的音色配置信息进行融合,以得到融合音色配置信息,该融合音色配置信息可以是基于不用的音色配置信息进行叠加、替换等操作生成的完全不同于任何一个音色配置信息的融合音色配置信息,也可以是根据预设的配置 规则对获取到的多个音色配置信息进行配置,分别使用不同的音色配置信息对应的音色播放完整的语音内容中的一部分。
示例性的,第一近场通信信息存储器中存储的音色配置信息对应有女高音音色、第二近场通信信息存储器中存储的音色配置信息对应有女低音音色时,可以将女高音音色和女低音音色进行融合,以得到全新的融合音色,在融合过程中,也可以按照预先确定的权重规则进行配置,以得到多种不同的融合音色,该权重规则可以根据实际的需求进行配比性调整,即在同种音色组合下,可以根据融合时所对应的权重规则不同得到不同的融合音色结果,从而可以获得更丰富的音色。
示例性的,待播放的语音内容为故事话本,该故事话本中设置有多个不同的角色A、B、C,在第一近场通信信息存储器中存储的音色配置信息对应有女高音音色,第二近场通信信息存储器中存储的音色配置信息对应有女低音音色,第三近场通信信息存储器中存储的音色配置信息对应有男低音音色,对应的为上述角色A、B、C所对应的台词分配女高音音色、女低音音色、男低音音色后,完成该故事话本的完整播放,以在同一段语音内容中利用不同的播放音色达到诸如多角色、多场景的效果,提升语音内容的播放效果。
一种具体的实现方式可参见如图3所示的流程图,图3为本公开实施例提供的一种语音播放音色配置方法的流程图,其中流程300包括以下步骤:
步骤301:响应于在预设时间段内读取到多个不同的音色配置信息,基于多个音色配置信息生成融合音色配置信息;
本步骤旨在由语音播放音色配置方法的执行主体(例如图1所示的语音播放本体101)在预设时间段内读取到多个不同的音色配置信息,融合获取到的多个音色配置信息,得到融合音色配置信息,以便根据该融合音色配置信息得到对应的融合音色,利用该融合音色播放语音内容。
其中,多个不同的音色配置信息通常分别来自于不同的近场通信信息存储器102,但也不排除一个近场通信信息存储器102中存储有多个不同的音色配置信息的特殊情况;预设时间段可以设置为5秒、10秒或自定义时长,还可以预先设置不同时长的预设时间段,并根据与上一个音色配置信息的读取时刻的间隔时长不同,确定上述的权重规则,得到相应的融合机制。例如连续获取到两个不同的音色配置信息,在读取间隔时间小于5秒时,确定在生成融合音色配 置信息时,第一音色配置信息与第二音色配置信息之间的融合权重关系为2:1,在读取间隔时间大于5秒时、小于10秒时,确定在生成融合音色配置信息时,第一音色配置信息与第二音色配置信息之间的融合权重关系为1:1,在读取间隔时间大于10秒时,确定在生成融合音色配置信息时,第一音色配置信息与第二音色配置信息之间的融合权重关系为1:2。
步骤302:根据融合音色配置信息对应的融合音色播放语音内容。
另外,为提升用户不断获取到存储有新的音色配置信息的近场通信信息存储器的主动性,还可以在上述执行主体(例如图1所示的语音播放本体101)连续预设时长未读取到音色配置信息的情况下(即存储有音色配置信息的近场通信信息存储器102不在近场通信距离内),就呈现与默认音色配置信息相对应的语音播放,不再继续基于从之前的近场通信信息存储器102中读取到的音色配置信息对应的音色播放语音播放。
例如用户之前获得限量版发售的某款动漫人物的纪念徽章N,该纪念徽章N中记录有该动漫人物对应的声优A的音色配置信息的网络链接。用户在获取该徽章N后,可通过将其放置在其家中支持语音播放的智能音箱的旁边的方式,使该智能音箱通过近场通信技术从徽章N中读取到了获取到与声优A的音色对应的音色配置信息的链接,该智能音箱根据该链接从存储服务器中下载到了声优A的音色对应的音色配置信息,并控制语音播放的功能组件根据该音色配置信息进行配置后,利用声优A的音色来播放语音内容。
但在之后的某一天徽章N被用户不小心丢弃,智能音箱在连续的两周内未能在近场通信距离内持续检测到徽章N,就不再继续根据声优A的音色来播放语音内容,转而将音色配置成默认的音色。
进一步的,为了防止用户通过非法手段私自篡改该智能音箱(语音播放本体)的本地数据,以通过修改默认音色对应的音色配置信息为其他历史获取到的音色配置信息的方式,将本地的默认音色篡改为其他历史获取到的音色配置信息,还可以设置智能音箱在预设时间内后自行删去历史数据中配置过的音色配置信息,或者设置智能音箱的数据写入权限为特定用户。
作为对图3所示方法的实现,本公开还通过图4提供了一种语音播放音色配置装置的一个实施例,该装置实施例与图3所示的方法实施例相对应, 该装置具体可以应用于各种电子设备中。
如图4所示,本实施例的语音播放音色配置装置400可以包括:音色融合单元401、语音播放单元402。其中音色融合单元401,被配置成响应于在预设时间段内读取到多个不同的音色配置信息,基于多个该音色配置信息生成融合音色配置信息;语音播放单元402,被配置成根据该融合音色配置信息对应的融合音色播放语音内容。
在本实施例中,语音播放音色配置装置400中:音色融合单元401、语音播放单元402的具体处理及其所带来的技术效果可分别参考图3对应实施例中的步骤301-302的相关说明,在此不再赘述。
在本实施例的一些可选的实现方式中,语音播放音色配置装置400中还可以包括:
失效恢复默认单元,被配置成响应于连续预设时长未读取到音色配置信息,修正播放音色为默认音色。
根据本公开的实施例,本公开还提供了一种电子设备,该电子设备包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,该指令被至少一个处理器执行,以使至少一个处理器执行时能够实现上述任一内容的语音播放音色配置方法。
根据本公开的实施例,本公开还提供了一种可读存储介质,该可读存储介质存储有计算机指令,该计算机指令用于使计算机执行时能够实现上述任一内容的语音播放音色配置方法。
本公开实施例提供了一种计算机程序产品,该计算机程序在被处理器执行时能够实现上述任一内容的语音播放音色配置方法。
图5示出了可以用来实施本公开的实施例的示例电子设备500的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图5所示,设备500包括计算单元501,其可以根据存储在只读存储 器(ROM)502中的计算机程序或者从存储单元508加载到随机访问存储器(RAM)503中的计算机程序,来执行各种适当的动作和处理。在RAM 503中,还可存储设备500操作所需的各种程序和数据。计算单元501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
设备500中的多个部件连接至I/O接口505,包括:输入单元506,例如键盘、鼠标等;输出单元507,例如各种类型的显示器、扬声器等;存储单元508,例如磁盘、光盘等;以及通信单元509,例如网卡、调制解调器、无线通信收发机等。通信单元509允许设备500通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元501可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元501的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元501执行上文所描述的各个方法和处理,例如语音播放音色配置方法。例如,在一些实施例中,语音播放音色配置方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元508。在一些实施例中,计算机程序的部分或者全部可以经由ROM 502和/或通信单元509而被载入和/或安装到设备500上。当计算机程序加载到RAM 503并由计算单元501执行时,可以执行上文描述的语音播放音色配置方法的一个或多个步骤。备选地,在其他实施例中,计算单元501可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行语音播放音色配置方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至 该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括: 局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决传统物理主机与虚拟专用服务器(VPS,Virtual Private Server)服务中存在的管理难度大,业务扩展性弱的缺陷。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (14)

  1. 一种语音播放系统,包括:
    近场通信信息存储器,用于存储可通过近场通信机制读取的音色配置信息;
    设置有近场通信扫描器的语音播放本体,用于通过所述近场通信扫描器读取所述近场通信信息存储器中的音色配置信息,并根据所述音色配置信息对应的音色播放语音内容。
  2. 根据权利要求1所述的语音播放系统,其中,所述近场通信信息存储器为存储有所述音色配置信息的近场通信芯片。
  3. 根据权利要求2所述的语音播放系统,其中,所述近场通信芯片内嵌于勋章、徽章、卡片或瓶盖。
  4. 根据权利要求1所述的语音播放系统,其中,所述语音播放本体上设置有用于放置、容纳或包裹所述近场通信信息存储器的开口,所述开口的形状与内嵌有所述近场通信信息存储器的载体的形状相对应。
  5. 根据权利要求1所述的语音播放系统,其中,所述近场通信信息存储器通过磁吸方式吸附于所述语音播放本体的外表面。
  6. 根据权利要求1所述的语音播放系统,其中,所述语音播放本体上设置有屏蔽收纳盒,所述屏蔽收纳盒用于阻断所述近场通信扫描器对处于盒内空间的近场通信信息存储器中存储的音色配置信息的读取。
  7. 根据权利要求1-6中任一项所述的语音播放系统,还包括:
    语音包存储服务器,与所述语音播放本体通信连接,用于根据所述语音播放本体的下载请求返回与所述下载请求对应的语音包。
  8. 一种语音播放音色配置方法,应用于如权利要求1-7任一项所述的语音播放系统,包括:
    响应于在预设时间段内读取到多个不同的音色配置信息,基于多个所述音色配置信息生成融合音色配置信息;
    根据所述融合音色配置信息对应的融合音色播放语音内容。
  9. 根据权利要求8所述的方法,还包括:
    响应于连续预设时长未读取到音色配置信息,修正播放音色为默认音色。
  10. 一种语音播放音色配置装置,应用于如权利要求1-7任一项所述的语音播放系统中的语音播放本体,包括:
    音色融合单元,被配置成响应于在预设时间段内读取到多个不同的音色配置信息,基于多个所述音色配置信息生成融合音色配置信息;
    语音播放单元,被配置成根据所述融合音色配置信息对应的融合音色播放语音内容。
  11. 根据权利要求10所述的装置,还包括:
    失效恢复默认单元,被配置成响应于连续预设时长未读取到音色配置信息,修正播放音色为默认音色。
  12. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求8或9所述的语音播放音色配置方法。
  13. 一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行权利要求8或9所述的语音播放音色配置方法。
  14. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求8或9所述的语音播放音色配置方法。
PCT/CN2021/141962 2021-05-25 2021-12-28 语音播放系统、语音播放音色配置方法及相关装置 WO2022247267A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020227032912A KR20220137771A (ko) 2021-05-25 2021-12-28 음성 재생 시스템, 음성 재생 음색 구성 방법 및 관련 장치
JP2022552530A JP7432000B2 (ja) 2021-05-25 2021-12-28 音声再生システム、音声再生のための音色構成方法および装置、電子機器、記憶媒体並びにコンピュータプログラム
US17/895,154 US20220407562A1 (en) 2021-05-25 2022-08-25 System for playing voice, method for configuring voice playing timbre and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110570865.1 2021-05-25
CN202110570865.1A CN113257223A (zh) 2021-05-25 2021-05-25 语音播放系统、语音播放音色配置方法及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/895,154 Continuation US20220407562A1 (en) 2021-05-25 2022-08-25 System for playing voice, method for configuring voice playing timbre and related apparatus

Publications (1)

Publication Number Publication Date
WO2022247267A1 true WO2022247267A1 (zh) 2022-12-01

Family

ID=77184221

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141962 WO2022247267A1 (zh) 2021-05-25 2021-12-28 语音播放系统、语音播放音色配置方法及相关装置

Country Status (2)

Country Link
CN (1) CN113257223A (zh)
WO (1) WO2022247267A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113257223A (zh) * 2021-05-25 2021-08-13 北京百度网讯科技有限公司 语音播放系统、语音播放音色配置方法及相关装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202838710U (zh) * 2012-06-29 2013-03-27 初建军 基于近场通信的电子读物系统
JP2014194600A (ja) * 2013-03-28 2014-10-09 Video Research:Kk 情報提供装置及びシステム及び方法及びプログラム
CN106452511A (zh) * 2016-08-29 2017-02-22 天津全球行科技有限公司 基于近场通讯技术的自助游讲解系统及实现方法
CN109979430A (zh) * 2017-12-28 2019-07-05 深圳市优必选科技有限公司 一种机器人讲故事的方法、装置、机器人及存储介质
CN111276123A (zh) * 2018-11-16 2020-06-12 阿拉的(深圳)人工智能有限公司 一种语音播报留言的方法、装置、计算机设备及存储介质
CN111367490A (zh) * 2020-02-28 2020-07-03 广州华多网络科技有限公司 语音播放方法、装置及电子设备
CN213100809U (zh) * 2020-07-29 2021-05-04 赵晓麟 智能发声玩偶
CN113257223A (zh) * 2021-05-25 2021-08-13 北京百度网讯科技有限公司 语音播放系统、语音播放音色配置方法及相关装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1790478B (zh) * 2005-11-30 2010-12-08 北京中星微电子有限公司 一种音效文件播放方法及其装置
JP2010183289A (ja) * 2009-02-04 2010-08-19 Seiko Epson Corp 携帯端末及びその管理システム
CN107195289B (zh) * 2016-05-28 2018-06-22 浙江大学 一种可编辑的多级音色合成系统及方法
CN206650803U (zh) * 2016-10-12 2017-11-17 纳思达股份有限公司 一种nfc多媒体播放设备
CN107578764A (zh) * 2017-08-10 2018-01-12 北京和声创景影视技术有限公司 一种数字运动声效合成方法
CN108319674A (zh) * 2018-01-25 2018-07-24 芜湖应天光电科技有限责任公司 一种基于rfid智能识别的磁悬浮音响
KR102036859B1 (ko) * 2018-09-04 2019-10-25 주식회사 아카인텔리전스 캐릭터 인형과 결합된 인공지능 스피커
CN111524501B (zh) * 2020-03-03 2023-09-26 北京声智科技有限公司 语音播放方法、装置、计算机设备及计算机可读存储介质
CN112289289A (zh) * 2020-11-19 2021-01-29 赵利胜 一种可编辑的普遍音色合成分析系统及方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202838710U (zh) * 2012-06-29 2013-03-27 初建军 基于近场通信的电子读物系统
JP2014194600A (ja) * 2013-03-28 2014-10-09 Video Research:Kk 情報提供装置及びシステム及び方法及びプログラム
CN106452511A (zh) * 2016-08-29 2017-02-22 天津全球行科技有限公司 基于近场通讯技术的自助游讲解系统及实现方法
CN109979430A (zh) * 2017-12-28 2019-07-05 深圳市优必选科技有限公司 一种机器人讲故事的方法、装置、机器人及存储介质
CN111276123A (zh) * 2018-11-16 2020-06-12 阿拉的(深圳)人工智能有限公司 一种语音播报留言的方法、装置、计算机设备及存储介质
CN111367490A (zh) * 2020-02-28 2020-07-03 广州华多网络科技有限公司 语音播放方法、装置及电子设备
CN213100809U (zh) * 2020-07-29 2021-05-04 赵晓麟 智能发声玩偶
CN113257223A (zh) * 2021-05-25 2021-08-13 北京百度网讯科技有限公司 语音播放系统、语音播放音色配置方法及相关装置

Also Published As

Publication number Publication date
CN113257223A (zh) 2021-08-13

Similar Documents

Publication Publication Date Title
US11417341B2 (en) Method and system for processing comment information
EP3095113B1 (en) Digital personal assistant interaction with impersonations and rich multimedia in responses
CN108701128A (zh) 解释和解析条件自然语言查询
CN108701127A (zh) 电子设备及其操作方法
JP2020149038A (ja) デバイスをウェイクアップするための方法及び装置
WO2022247267A1 (zh) 语音播放系统、语音播放音色配置方法及相关装置
CN107452378A (zh) 基于人工智能的语音交互方法和装置
CN109671435A (zh) 用于唤醒智能设备的方法和装置
JP2014067366A (ja) 情報処理装置、情報処理方法、及び、プログラム
CN108877803A (zh) 用于呈现信息的方法和装置
CN110188871A (zh) 运算方法、装置及相关产品
CN110134768A (zh) 文本的处理方法、装置、设备及存储介质
WO2022247222A1 (zh) 全息投影系统、全息投影画面处理方法及相关装置
US10334441B2 (en) Working method of NFC token
CN108804667A (zh) 用于呈现信息的方法和装置
CN110278273A (zh) 多媒体文件上传方法、装置、终端、服务器和存储介质
CN110379406A (zh) 语音评论转换方法、系统、介质和电子设备
CN107438961A (zh) 使用可听和声传送数据
CN109783733A (zh) 用户画像生成装置及方法、信息处理装置及存储介质
US20220407562A1 (en) System for playing voice, method for configuring voice playing timbre and related apparatus
WO2023061229A1 (zh) 视频生成方法及设备
CN111105803A (zh) 快速识别性别的方法及装置、用于识别性别的算法模型的生成方法
CN110442698A (zh) 对话内容生成方法及系统
CN110310636A (zh) 交互控制方法、装置、设备及音频设备
US20190278464A1 (en) Personalized visual representations of an artificially intelligent agent

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022552530

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227032912

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE