WO2019007308A1 - 语音播报方法及装置 - Google Patents

语音播报方法及装置 Download PDF

Info

Publication number
WO2019007308A1
WO2019007308A1 PCT/CN2018/094116 CN2018094116W WO2019007308A1 WO 2019007308 A1 WO2019007308 A1 WO 2019007308A1 CN 2018094116 W CN2018094116 W CN 2018094116W WO 2019007308 A1 WO2019007308 A1 WO 2019007308A1
Authority
WO
WIPO (PCT)
Prior art keywords
broadcast
label set
label
target
object type
Prior art date
Application number
PCT/CN2018/094116
Other languages
English (en)
French (fr)
Inventor
徐凌锦
康永国
徐扬凯
徐犇
袁海光
徐冉
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to EP18828877.3A priority Critical patent/EP3651152A4/en
Priority to KR1020197002335A priority patent/KR102305992B1/ko
Priority to JP2019503523A priority patent/JP6928642B2/ja
Priority to US16/616,611 priority patent/US20200184948A1/en
Publication of WO2019007308A1 publication Critical patent/WO2019007308A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • G10L2013/105Duration

Definitions

  • the present disclosure relates to the field of voice processing technologies, and in particular, to a voice broadcast method and apparatus.
  • the broadcast effect of the full live broadcast is able to satisfy the user's expectations and can play a role in conveying emotions.
  • the full live broadcast of labor costs is high.
  • TTS text-to-speech
  • the present disclosure aims to solve at least one of the technical problems in the related art to some extent.
  • the first object of the present disclosure is to provide a voice broadcast method, so as to realize the emotions carried by the content to be broadcasted to the listener during the broadcast, so that the listener can feel the emotion carried by the content.
  • the effect of the broadcast of the existing TTS broadcast mode can not play a role in conveying emotions, and it is impossible for the listener to feel the content of the need to broadcast or the emotions carried by the information.
  • a second object of the present disclosure is to provide a voice broadcast device.
  • a third object of the present disclosure is to propose a smart device.
  • a fourth object of the present disclosure is to propose a computer program product.
  • a fifth object of the present disclosure is to propose a computer readable storage medium.
  • the first aspect of the present disclosure provides a voice broadcast method, including:
  • the voice broadcast method of the embodiment of the present disclosure obtains a broadcast label set that matches the object to be broadcast according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the object to be broadcasted, according to the broadcast label set.
  • the characterized broadcast rules broadcast the object to be broadcast.
  • the broadcast of the object according to the broadcast label is an implementation means for the speech synthesis markup language specification, which is convenient for people to listen to the voice through various terminal devices.
  • the second aspect of the present disclosure provides a voice broadcast apparatus, including:
  • a first acquiring module configured to acquire an object to be broadcasted
  • An identification module configured to identify a target object type to which the to-be-advertised object belongs
  • a second acquiring module configured to acquire, according to the target object type, a set of broadcast tags that match the to-be-advertised object; wherein the set of broadcast tags is used to represent a broadcast rule of the to-be-advertised object;
  • a broadcast module configured to broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
  • the voice broadcast apparatus of the embodiment of the present disclosure acquires a broadcast label set that matches the to-be-advertised object according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the to-be-advertised object, according to the broadcast label set.
  • the characterized broadcast rules broadcast the object to be broadcast.
  • the broadcast of the object according to the broadcast label is an implementation means for the speech synthesis markup language specification, which is convenient for people to listen to the voice through various terminal devices.
  • a third aspect of the present disclosure provides a smart device including: a memory and a processor, wherein the processor operates and reads the executable program code stored in the memory A program corresponding to the program code is executed for implementing the voice broadcast method according to the first aspect of the embodiments of the present disclosure.
  • a fourth aspect of the present disclosure provides a computer program product that, when executed by a processor, executes a voice broadcast method as described in the first aspect.
  • a fifth aspect of the present disclosure provides a computer readable storage medium having stored thereon a computer program, and when the computer program is executed by the processor, the voice broadcast method according to the first aspect embodiment is implemented. .
  • FIG. 1 is a schematic flowchart diagram of a voice broadcast method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart diagram of another voice broadcast method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart diagram of another voice broadcast method according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a voice broadcast apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of another voice broadcast apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a smart device according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart diagram of a voice broadcast method according to an embodiment of the present disclosure.
  • the voice broadcast method includes the following steps:
  • the object to be broadcast is content or information that needs to be broadcasted.
  • the to-be-advertised object may be obtained by a related application in the electronic device to broadcast it, such as a Baidu APP.
  • a related application in the electronic device such as a Baidu APP.
  • the user can input the content or information to be broadcasted by voice/text.
  • the electronic device is, for example, a personal computer (PC), a cloud device or a mobile device, a mobile device such as a smart phone, or a tablet computer.
  • PC personal computer
  • cloud device or a mobile device
  • mobile device such as a smart phone
  • tablet computer a tablet computer
  • the related application installed in the electronic device is a Baidu APP
  • the user when the user wants to feel the emotion carried by the object to be broadcast, the user can click to enter the Baidu APP interface, and press and hold the button in the interface.
  • the voice input “degree secret”, you can enter the secret plug-in, and then the user can determine the content or information to be broadcast by voice/text input, and then the secret plug-in can obtain the need to broadcast.
  • Content or information that is, the object to be broadcasted.
  • the broadcast rules are different for different object types. Therefore, before the object to be broadcasted is broadcasted, the target object type of the object to be broadcast needs to be identified, so that the matching broadcast rule is selected according to the target object type to broadcast the object to be broadcasted.
  • the target object type of the object to be broadcasted may be identified according to key information of the object to be broadcasted, for example, the object type may be poetry, weather, time, calculation, and the like.
  • the key information of the object to be broadcasted may be, for example, the source of the object to be broadcasted (application), or may be the title of the object to be broadcasted, or may be the identifier of the object to be broadcasted, which is not limited thereto.
  • the broadcast label set is used to represent the broadcast rule of the to-be-advertised object.
  • the broadcast label set corresponding to the object type may be formed for the broadcast rule, and then the mapping relationship between the object type and the broadcast label set is established in advance, and when the target object type of the object to be broadcast is determined.
  • the mapping relationship between the object type and the broadcast tag set may be queried, and the broadcast tag set matching the object to be broadcasted is obtained.
  • the broadcast tag set mainly includes pauses, accents, volume, pitch, speed of sound, sound source, audio introduction, multi-tone word identification, digital reading identification and the like.
  • Pause tags Build labels that implement word level, phrase level, short sentence level, full sentence level, and timed pauses.
  • Accent Label Build an accent label that implements different sizes.
  • Volume, tone, sonic, and thick labels Build labels that adjust the corresponding broadcasts by percentage.
  • Audio Import Tab Constructs a label that inserts an audio file into a piece of text.
  • Multi-tone word identification label Constructs a label that can mark the correct reading of multi-tone words.
  • Digital Read Label Constructs a label that can be labeled with a correct number of digits, including numbers, integers, numbers, scores, scores, phone numbers, zip codes, and more.
  • Sound Source Label Build a label that selects the speaker.
  • a sonic tag can be set.
  • the sonic tag can display a short extension on the word "light”, that is, a short extension on the fourth word to extend the broadcast time of the "light” word.
  • the "before the bed bright moonlight” is marked, for example, the complete first five-word poem can be marked, and finally the complete format is output, and the broadcast label set matching the five-word poem is synthesized, and the broadcast label set is collected. Includes word-level pause labels, accent labels, and sonic labels.
  • S104 Broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
  • the five-character poem As an example, in the specific application, when it is determined that the object type of the object to be broadcast is a five-character poem, as long as the broadcast label set matching the five-word poem is added, and the five-character poem is broadcast according to the broadcast rule represented by the broadcast label set, the five-word poem can be realized. Aloud reading effect.
  • the voice broadcast method of the embodiment obtains a broadcast label set that matches the to-be-recorded object according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the to-be-advertised object, and is characterized according to the broadcast label set.
  • the broadcast rule broadcasts the object to be broadcast.
  • the broadcast of the object according to the broadcast label is an implementation method of the Speech Synthesis Markup Language (SSML) specification, which is convenient for people to listen to the voice through various terminal devices.
  • SSML Speech Synthesis Markup Language
  • FIG. 2 is a schematic flowchart of another voice broadcast method according to an embodiment of the present disclosure.
  • the voice broadcast method may include the following steps:
  • the broadcast rules under different object types can be obtained for each object type in advance. For example, taking the object type as a poem as an example, the broadcast rule is a reading rule of poetry.
  • the object type is poetry
  • a set of broadcast labels matching the poetry can be formed.
  • the "pre-bed” can be marked according to the five-character poem reading rules.
  • Need word level pause set a pause label, which can show pause after the words "before the bed", that is, pause after the second word; "ming” needs to be reread, set a reread label,
  • the pause label can be displayed for rereading on the word "bright”, that is, rereading on the third word;
  • "light” needs to be extended for a short time, and a sonic label can be set, which can be displayed as short on the word "light”.
  • the mapping relationship between the object type and the broadcast tag set is determined.
  • the mapping relationship may be queried, and the broadcast tag set matching the object to be broadcasted is obtained, which is easy to implement and simple to operate. .
  • the first broadcast label set mainly includes pauses, accents, volume, pitch, sound speed, sound source, audio introduction, multi-tone word identification, digital reading identification and the like.
  • the user's broadcast request may be, for example, a raining sound while the weather is being broadcasted, and the user may be prompted to go out with an umbrella.
  • the user's broadcast request may be, for example, a hail sound while the weather is being broadcast, and the user may be prompted to try not to go out.
  • the second set of tags includes a background sound tag, an English reading tag, a poetry tag, a voice emoji tag, and the like.
  • the background sound label on the basis of the audio introduction label implementation, the background sound label is constructed, so that the broadcast content and the audio effect are combined.
  • Poetry label According to the poetry type and the name of the poem, the poems are classified, and the rhyming and other reading rules are respectively marked for each category, and the poetry category advanced label is generated by the combination of the labels in the first broadcast label set.
  • Voice emoji tag Create an audio file library that may be used in different emotions and scenarios, and introduce corresponding resources in different scenarios to generate a voice broadcast emoji. For example, when asking for weather, if it is rainy, there will be corresponding rain. Broadcast.
  • the second broadcast label set matching the to-be-advertised object may be a background sound label.
  • the background sound label may be added, so that when the weather is broadcast, the rain sound can be heard. Or hail sound.
  • the second set of broadcast tags that match the object to be broadcasted may be an English reading tag.
  • the English reading tag may be added to achieve an English reading effect.
  • the second broadcast label set matching the object to be broadcasted may be a poetry label.
  • the poetry label may be added to realize the reading effect of the poetry.
  • a second broadcast label set matching the object to be broadcasted is formed, which can realize personalized customization of the voice broadcast, effectively improve the applicability of the voice broadcast method, and improve the user experience.
  • the first broadcast label set may be formed according to the reading rule, and the second broadcast label set matching the broadcast requirement is a poetry label, and then the first broadcast label set and the second broadcast label set may be used to form the broadcast label. set.
  • the first broadcast label set can be obtained according to the content to be broadcasted, and the second broadcast label set matching the broadcast request is a background sound label, and then the first broadcast label set and the second broadcast label set can be formed by using the first broadcast label set.
  • Broadcast label collection Specifically, the single broadcast effect can be realized by using the background sound label and the fixed broadcast content, and different broadcast effects in different weathers are sequentially labeled, and finally the weather broadcast label set is generated.
  • S210 Broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
  • the effect of different user needs can be broadcast according to the weather broadcast label set and the weather keyword.
  • step S210 For the implementation process of step S210, refer to the foregoing embodiment, and details are not described herein again.
  • the voice broadcast method of the embodiment obtains a broadcast rule under different object types for each object type, forms a broadcast tag set corresponding to the object type according to the broadcast rule, and constructs a mapping relationship between the object type and the broadcast tag set, which is easy to It is easy to implement and easy to operate.
  • the target type of the object to be broadcasted is obtained by the object to be broadcasted, and the mapping relationship between the object type and the broadcast tag set is obtained according to the target object type, and the first broadcast tag set that matches the object to be broadcasted is obtained, and the user's
  • the broadcast request needs to form a second broadcast label set that matches the to-be-recorded object according to the broadcast requirement, and uses the first broadcast label set and the second broadcast label set to form a broadcast label set, and broadcast the to-be-recorded object according to the broadcast rule represented by the broadcast label set. It can realize the personalized customization of voice broadcast, effectively improve the applicability of the voice broadcast method and enhance the user experience.
  • step S209 specifically includes the following sub-steps:
  • the first broadcast label set mainly includes tabs such as pause, accent, volume, pitch, speed of sound, sound source, audio introduction, multi-tone word identification, digital reading identification, etc., and the broadcast object is broadcasted, and only part of it may be used.
  • the label therefore, may be selected from the first set of broadcast labels to select a broadcast label corresponding to the broadcast, to form a first target broadcast label set, which is highly targeted and improves the processing efficiency of the system.
  • the broadcast label set matching the broadcast requirement of the user may only include some broadcast labels in the second broadcast label set.
  • the broadcast label set matching the user's broadcast request is only The background sound label, therefore, the partial broadcast label can be selected from the second broadcast label set to form the second target broadcast label set, which is highly targeted and improves the processing efficiency of the system.
  • the background sound tag may be selected from the second broadcast tag set to form a second target broadcast tag set.
  • a poem tag may be selected from the second set of broadcast tags to form a second target broadcast tag set.
  • the first target broadcast label set is formed by selecting a partial broadcast label from the first broadcast label set, and the partial broadcast label is selected from the second broadcast label set to form a second target broadcast label set, and the first The target broadcast label set and/or the second target broadcast label set form a broadcast label set, which can realize personalized customization of the voice broadcast, is highly targeted, and effectively improves the processing efficiency of the system.
  • the present disclosure also proposes a voice broadcast device.
  • FIG. 4 is a schematic structural diagram of a voice broadcast apparatus according to an embodiment of the present disclosure.
  • the voice broadcast apparatus 400 includes a first acquisition module 410, an identification module 420, a second acquisition module 430, and a broadcast module 440. among them,
  • the first obtaining module 410 is configured to acquire an object to be broadcasted.
  • the identification module 420 is configured to identify a target object type to which the object to be broadcast belongs.
  • the identifying module 420 is specifically configured to identify a target object type of the object to be broadcast according to key information of the object to be broadcasted.
  • the second obtaining module 430 is configured to obtain, according to the target object type, a set of broadcast tags that match the object to be broadcasted; wherein the set of broadcast tags is used to represent the broadcast rule of the object to be broadcasted.
  • the broadcast module 440 is configured to broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
  • the voice broadcast apparatus 400 further includes:
  • the construction module 450 is configured to acquire a broadcast rule under different object types for each object type, form a broadcast tag set corresponding to the object type according to the broadcast rule, and construct a mapping relationship between the object type and the broadcast tag set.
  • the second obtaining module 430 includes:
  • the query obtaining unit 431 is configured to query a mapping relationship between the object type and the broadcast label set according to the target object type, and obtain a first broadcast label set that matches the to-be-recorded object, where the first broadcast label set is a broadcast label set.
  • the requirement obtaining unit 432 is configured to obtain a broadcast request requirement of the user after obtaining the first broadcast label set that matches the to-be-advertised object according to the mapping relationship between the query object type and the broadcast label set according to the target object type.
  • the first forming unit 433 is configured to form a second broadcast label set that matches the to-be-advertised object according to the broadcast requirement.
  • the second forming unit 434 is configured to form a broadcast label set by using the first broadcast label set and the second broadcast label set.
  • the second forming unit 434 is specifically configured to: select a partial broadcast label from the first broadcast label set to form a first target broadcast label set, and select a partial broadcast label from the second broadcast label set to form a second target broadcast label set. And forming a set of broadcast tags by using the first target broadcast tag set and/or the second target broadcast tag set.
  • the voice broadcast apparatus of the embodiment obtains the broadcast label set that matches the to-be-recorded object according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the to-be-advertised object, and is characterized according to the broadcast label set.
  • the broadcast rule broadcasts the object to be broadcast.
  • the broadcast of the object according to the broadcast label is an implementation means for the speech synthesis markup language specification, which is convenient for people to listen to the voice through various terminal devices.
  • FIG. 6 illustrates a block diagram of an exemplary smart device 20 suitable for use in implementing embodiments of the present disclosure.
  • the smart device 20 shown in FIG. 6 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • smart device 20 is represented in the form of a general purpose computing device.
  • the components of smart device 20 may include, but are not limited to, one or more processors or processing units 21, system memory 22, and a bus 23 that connects different system components, including system memory 22 and processing unit 21.
  • Bus 23 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, an Industry Standard Architecture (hereinafter referred to as ISA) bus, a Micro Channel Architecture (MAC) bus, an enhanced ISA bus, and video electronics.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnection
  • the smart device 20 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by smart device 20, including volatile and non-volatile media, removable and non-removable media.
  • System memory 22 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32.
  • the smart device may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 6, commonly referred to as "hard disk drives").
  • a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
  • a removable non-volatile disk for example, a compact disk read-only memory (Compact)
  • each drive can be coupled to bus 23 via one or more data medium interfaces.
  • Memory 22 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of various embodiments of the present disclosure.
  • a program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 22, such program modules 42 including, but not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment.
  • Program module 42 typically performs the functions and/or methods of the embodiments described in this disclosure.
  • the smart device 20 can also communicate with one or more external devices 50 (eg, a keyboard, pointing device, display 60, etc.), and can also communicate with one or more devices that enable the user to interact with the smart device 20, and/or with Any device (eg, a network card, modem, etc.) that enables the smart device 20 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 24.
  • the smart device 20 can also pass through the network adapter 25 and one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet. ) Communication. As shown, network adapter 25 communicates with other modules of smart device 20 over bus 23.
  • smart device 20 includes but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives. And data backup storage systems, etc.
  • the processing unit 21 executes various function applications and data processing by running a program stored in the system memory 22, for example, implementing the voice broadcast method shown in Figs.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
  • a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • the computer readable signal medium may comprise a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or Connect to an external computer (for example, using an Internet service provider to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • an Internet service provider for example, using an Internet service provider to connect via the Internet.
  • the present disclosure also proposes a computer program product that, when executed by a processor, executes a voice broadcast method as described in the foregoing embodiments.
  • the present disclosure also proposes a computer readable storage medium having stored thereon a computer program capable of implementing the voice announcement method as described in the foregoing embodiments when the computer program is executed by the processor.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” and “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
  • Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing the steps of a custom logic function or process.
  • the scope of the preferred embodiments of the present disclosure includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an inverse order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present disclosure pertain.
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
  • computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
  • portions of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware and in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), and the like.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present disclosure have been shown and described above, it is understood that the foregoing embodiments are illustrative and are not to be construed as limiting the scope of the disclosure The embodiments are subject to variations, modifications, substitutions and variations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Information Transfer Between Computers (AREA)
  • Circuits Of Receivers In General (AREA)

Abstract

本公开提出一种语音播报方法及装置,其中,方法包括:获取待播报对象;识别待播报对象的目标对象类型;根据目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则;根据播报标签集合所表征的播报规则播报待播报对象。该方法能够在播报时将待播报内容所携带的情感展现给听众,使听众能够在听觉上感受到该内容所携带的情感,且按照播报标签来播报对象是对语音合成标记语言(speech Synthesis Markup Language,简称SSML)规范的一种实现手段,有利于人们通过各种终端设备来聆听语音。

Description

语音播报方法及装置
相关申请的交叉引用
本公开要求百度在线网络技术(北京)有限公司于2017年07月05日提交的、发明名称为“语音播报方法及装置”的、中国专利申请号“201710541569.2”的优先权。
技术领域
本公开涉及语音处理技术领域,尤其涉及一种语音播报方法及装置。
背景技术
随着语音交互型产品的增长,语音播报效果越来越引发用户的关注。目前,全真人播报的播报效果是能够满足用户期望的,能够起到传达感情的作用。但是,全真人播报人力成本较高。
为了降低人力成本,目前多采用从文本到语音(Text To Speech,简称TTS)播报方式对需要播报的内容或者信息进行播报。
发明内容
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本公开的第一个目的在于提出一种语音播报方法,以实现在播报时将待播报内容所携带的情感展现给听众,使听众能够在听觉上感受到该内容所携带的情感,以及解决现有TTS播报方式的播报效果无法起到传达感情的作用,无法让听众在听觉上感受到需要播报的内容或者信息所携带的情感的问题。
本公开的第二个目的在于提出一种语音播报装置。
本公开的第三个目的在于提出一种智能设备。
本公开的第四个目的在于提出一种计算机程序产品。
本公开的第五个目的在于提出一种计算机可读存储介质。
为达上述目的,本公开第一方面实施例提出了一种语音播报方法,包括:
获取待播报对象;
识别所述待播报对象的目标对象类型;
根据所述目标对象类型获取与所述待播报对象匹配的播报标签集合;其中,所述播报 标签集合用于表征出所述待播报对象的播报规则;
根据所述播报标签集合所表征的所述播报规则播报所述待播报对象。
本公开实施例的语音播报方法,通过根据待播报对象的目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则,根据播报标签集合所表征的播报规则播报待播报对象。本实施例中,能够实现在播报时将待播报内容所携带的情感展现给听众,使听众能够在听觉上感受到该内容所携带的情感。本实施例中按照播报标签来播报对象是对语音合成标记语言规范的一种实现手段,有利于人们通过各种终端设备来聆听语音。
为达上述目的,本公开第二方面实施例提出了一种语音播报装置,包括:
第一获取模块,用于获取待播报对象;
识别模块,用于识别所述待播报对象所隶属的目标对象类型;
第二获取模块,用于根据所述目标对象类型获取与所述待播报对象匹配的播报标签集合;其中,所述播报标签集合用于表征出所述待播报对象的播报规则;
播报模块,用于根据所述播报标签集合所表征的所述播报规则播报所述待播报对象。
本公开实施例的语音播报装置,通过根据待播报对象的目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则,根据播报标签集合所表征的播报规则播报待播报对象。本实施例中,能够实现在播报时将待播报内容所携带的情感展现给听众,使听众能够在听觉上感受到该内容所携带的情感。本实施例中按照播报标签来播报对象是对语音合成标记语言规范的一种实现手段,有利于人们通过各种终端设备来聆听语音。
为达上述目的,本公开第三方面实施例提出了一种智能设备,包括:存储器和处理器其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现如本公开实施例第一方面所述的语音播报方法。
为达上述目的,本公开第四方面实施例提出了一种计算机程序产品,当所述计算机程序产品中的指令由处理器执行时,执行如第一方面实施例所述的语音播报方法。
为达上述目的,本公开第五方面实施例提出了一种计算机可读存储介质,其上存储有计算机程序,当计算机程序被处理器执行时实现如第一方面实施例所述的语音播报方法。
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例中所需要使用的附图 作简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种语音播报方法的流程示意图;
图2为本公开实施例提供的另一种语音播报方法的流程示意图;
图3为本公开实施例提供的另一种语音播报方法的流程示意图;
图4为本公开实施例提供的一种语音播报装置的结构示意图;
图5为本公开实施例提供的另一种语音播报装置的结构示意图;
图6为本公开实施例提供的一种智能设备的结构示意图。
具体实施方式
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。
下面参考附图描述本公开实施例的语音播报方法及装置。
图1为本公开实施例提供的一种语音播报方法的流程示意图。
如图1所示,该语音播报方法包括以下步骤:
S101,获取待播报对象。
在本公开实施例中,待播报对象为需要播报的内容或者信息。
可选地,可以由电子设备中的相关应用程序获取待播报对象,以对其进行播报,比如百度APP。当用户启动电子设备中安装的相关应用程序后,用户可以语音/文字输入需要播报的内容或者信息。
其中,电子设备例如为个人电脑(Personal Computer,PC),云端设备或者移动设备,移动设备例如智能手机,或者平板电脑等。
举例而言,假设电子设备中安装的相关应用程序为百度APP,当用户想要在听觉上感受到待播报对象所携带的情感时,可以点击进入百度APP界面,并长按界面中的“按住说话”按钮,语音输入“度秘”后,即可进入度秘插件,进而用户可以通过语音/文字输入的方式,确定需要播报的内容或者信息,而后,度秘插件即可获取需要播报的内容或者信息,即获取待播报对象。
S102,识别待播报对象的目标对象类型。
由于不同的播报对象具有不同的对象类型,对于不同的对象类型,其播报规则不同。因此,在播报待播报对象前,需要识别待播报对象的目标对象类型,以根据目标对象类型选择匹配的播报规则播报待播报对象。
可选地,可以根据待播报对象的关键信息,识别待播报对象的目标对象类型,例如对象类型可以为诗词、天气、时间、计算等。
其中,待播报对象的关键信息例如可以为待播报对象的来源(应用程序),或者可以为待播报对象的标题,或者可以为待播报对象的标识码,对此不作限制。
S103,根据目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则。
由于不同的对象类型具有不同的播报规则,可以针对播报规则形成对象类型对应的播报标签集合,而后,预先建立对象类型与播报标签集合之间的映射关系,在确定待播报对象的目标对象类型时,可以查询对象类型与播报标签集合之间的映射关系,从中获取与待播报对象匹配的播报标签集合。
其中,播报标签集合主要包括停顿、重音、音量、音调、音速、音源、音频引入、多音字标识、数字读法标识等标签。
停顿标签:构建实现词语级别、短语级别、短句级别、整句级别、按时间的停顿的标签。
重音标签:构建实现大小不同的重音标签。
音量标签、音调标签、音速标签、粗细标签:构建实现按百分比调节相应播报的标签。
音频引入标签:构建在一段文字中插入一段音频文件的标签。
多音字标识标签:构建可以标注多音字正确读法的标签。
数字读法标识标签:构建可以标注数字正确读法的标签,其中,数字包括:整数、数字串、比分、分数、电话、邮编等。
声源标签:构建可选择发音人的标签。
举例而言,当目标对象类型为诗词时,诗词作为中华民族的传统文化,在朗读中具有独具特色的音韵、音律,因此,可以根据诗词的朗读规则,形成与诗词匹配的播报标签集合,以五言诗句“床前明月光”为例,可以根据五言诗朗读规则,标出“床前”后需要词语级停顿,设置一个停顿标签,该停顿标签可以显示出在“床前”两个字后面进行停顿,即在第二个字后面进行停顿;“明”需要重读,设置一个重读标签,该停顿标签可以显示出在“明”字上进行重读,即在第三个字上进行重读;“光”需要短延长,设置一个音速标签,该音速标签可以显示出在“光”字上进行短延长,即在第四个字上进行短延长,延长“光”字的播报时间。并通过添加播报标签集合中的标签,将“床前明月光”标记出来,以此为例,可以标记完整首五言诗,最终将完整格式输出,合成与五言诗匹配的播报标签集合,该播报标签集合包括词语级别的停顿标签、重音标签以及音速标签等。
S104,根据播报标签集合所表征的播报规则播报待播报对象。
以五言诗为例,在具体应用时,在确定了待播报对象的对象类型为五言诗时,只要添加与五言诗匹配的播报标签集合,根据播报标签集合所表征的播报规则播报五言诗,便能够实现五言诗声情并茂的朗读效果。
本实施例的语音播报方法,通过根据待播报对象的目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则,根据播报标签集合所表征的播报规则播报待播报对象。本实施例中,能够实现在播报时将待播报内容所携带的情感展现给听众,使听众能够在听觉上感受到该内容所携带的情感。本实施例中按照播报标签来播报对象是对语音合成标记语言(speech Synthesis Markup Language,简称SSML)规范的一种实现手段,有利于人们通过各种终端设备来聆听语音。
进一步地,本公开实施例还可以根据用户的播报需求,形成自定义的播报标签,具体地,参见图2,图2为本公开实施例提供的另一种语音播报方法的流程示意图。
参见图2,该语音播报方法可以包括以下步骤:
S201,针对每个对象类型,获取不同对象类型下的播报规则。
由于不同的对象类型具有不同的播报规则,因此,可以预先针对每个对象类型,获取不同对象类型下的播报规则,例如,以对象类型为诗词为例,播报规则即为诗词的朗读规则。
S202,根据播报规则形成对象类型对应的播报标签集合。
例如,当对象类型为诗词时,可以根据诗词的朗读规则,形成与诗词匹配的播报标签集合,以五言诗句“床前明月光”为例,可以根据五言诗朗读规则,标出“床前”后需要词语级停顿,设置一个停顿标签,该停顿标签可以显示出在“床前”两个字后面进行停顿,即在第二个字后面进行停顿;“明”需要重读,设置一个重读标签,该停顿标签可以显示出在“明”字上进行重读,即在第三个字上进行重读;“光”需要短延长,设置一个音速标签,该音速标签可以显示出在“光”字上进行短延长,即在第四个字上进行短延长,延长“光”字的播报时间。并通过添加播报标签集合中的标签,将“床前明月光”标记出来,以此为例,可以标记完整首五言诗,最终将完整格式输出,合成与五言诗匹配的播报标签集合,该播报标签集合包括词语级别的停顿标签、重音标签以及音速标签等。
S203,构建对象类型与播报标签集合之间的映射关系。
可选地,构建对象类型与播报标签集合之间的映射关系,在确定待播报对象的目标对象类型时,可以查询映射关系,从中获取与待播报对象匹配的播报标签集合,易于实现且操作简单。
S204,获取待播报对象。
S205,识别待播报对象的目标对象类型。
S206,根据目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与待播报对象匹配的第一播报标签集合。
其中,第一播报标签集合主要包括停顿、重音、音量、音调、音速、音源、音频引入、多音字标识、数字读法标识等标签。
步骤S204~S206的执行过程可以参见上述实施例,在此不再赘述。
S207,获取用户的播报需求。
例如,当目标对象类型为天气时,在天气播报时,尤其在播报阴雨天时,用户的播报需求例如可以为:在播报天气的同时,能够有下雨的声音,并且可以提示用户出门带伞,或者,在播报冰雹时,用户的播报需求例如可以为:在播报天气的同时,能够有下冰雹的声音,并且可以提示用户尽量不要出门。
S208,根据播报需求形成与待播报对象匹配的第二播报标签集合。
在本公开的实施例中,第二标签集合包括背景音标签、英文读法标签、诗词标签、语音emoji标签等。
其中,背景音标签:在音频引入标签实现的基础上,构建背景音标签,使播报内容和音频效果相结合。
英文读法标签:与多音字标识标签的实现方式类似,可以构建区分按字母读或者按词读的标签。
诗词标签:根据诗词类型、词牌名对诗词进行分类,分别对每一类进行音韵等朗读规则标注,通过对第一播报标签集合中的标签的组合生成诗词类目高级标签。
语音emoji标签:建立不同情感和场景下,可能用到的音频文件库,在各个不同场景中引入相应资源,生成语音播报emoji,比如在询问天气时,若为下雨天,则有相应地雨声播报。
例如,当目标对象类型为天气时,与待播报对象匹配的第二播报标签集合可以为背景音标签,在具体应用时,可以通过添加背景音标签,实现在播报天气的时候,能够有雨声或者冰雹声音。
又例如,当待播报对象为英文时,与待播报对象匹配的第二播报标签集合可以为英文读法标签,在具体应用时,可以通过添加英文读法标签,实现英文声情并茂的朗读效果。
再例如,当目标对象类型为诗词时,与待播报对象匹配的第二播报标签集合可以为诗词标签,在具体应用时,可以通过添加诗词标签,实现诗词声情并茂的朗读效果。
本步骤中,根据用户的播报需求,形成与待播报对象匹配的第二播报标签集合,能够实现语音播报的个性化定制,有效提升语音播报方法的适用性,提升用户体验。
S209,利用第一播报标签集合和第二播报标签集合,形成播报标签集合。
以诗词播报为例,可以根据朗读规则形成第一播报标签集合,与播报需求匹配的第二播报标签集合为诗词标签,而后,可以利用第一播报标签集合和第二播报标签集合,形成播报标签集合。
以天气播报为例,可以根据待播报内容获取第一播报标签集合,与播报需求匹配的第二播报标签集合为背景音标签,而后,可以利用第一播报标签集合和第二播报标签集合,形成播报标签集合。具体地,可以通过背景音标签、加固定的播报内容实现单条播报效果,依次标注不同天气下的不同播报效果,最终生成天气的播报标签集合。
S210,根据播报标签集合所表征的播报规则播报待播报对象。
以天气播报为例,在播报天气时,可以根据天气的播报标签集合和天气关键字,播报出不同的用户需求的效果。
步骤S210的执行过程可以参见上述实施例,在此不再赘述。
本实施例的语音播报方法,通过针对每个对象类型,获取不同对象类型下的播报规则,根据播报规则形成对象类型对应的播报标签集合,构建对象类型与播报标签集合之间的映射关系,易于实现且操作简单。通过获取待播报对象,识别待播报对象的目标对象类型,根据目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与待播报对象匹配的第一播报标签集合,获取用户的播报需求,根据播报需求形成与待播报对象匹配的第二播报标签集合,利用第一播报标签集合和第二播报标签集合,形成播报标签集合,根据播报标签集合所表征的播报规则播报待播报对象,能够实现语音播报的个性化定制,有效提升语音播报方法的适用性,提升用户体验。
为了具体说明上述实施例,参见图3,在图2所示实施例的基础上,步骤S209具体包括以下子步骤:
S301,从第一播报标签集合中选取部分播报标签形成第一目标播报标签集合。
可以理解的是,第一播报标签集合主要包括停顿、重音、音量、音调、音速、音源、音频引入、多音字标识、数字读法标识等标签,对待播报对象进行播报,可能只使用其中的部分标签,因此,在具体使用时,可以从第一播报标签集合中选取部分涉及本次播报的播报标签,形成第一目标播报标签集合,针对性强,且提升系统的处理效率。
S302,从第二播报标签集合中选择部分播报标签形成第二目标播报标签集合。
可以理解的是,与用户的播报需求匹配的播报标签集合可能只包含第二播报标签集合中的某几个播报标签,例如,当播报天气时,与用户的播报需求匹配的播报标签集合仅为背景音标签,因此,可以从第二播报标签集合中选择部分播报标签形成第二目标播报标签集合,针对性强,且提升系统的处理效率。
以天气播报为例,可以从第二播报标签集合中选择背景音标签形成第二目标播报标签 集合。
以诗词播报为例,可以从第二播报标签集合中选择诗词标签形成第二目标播报标签集合。
S303,利用第一目标播报标签集合和/或第二目标播报标签集合,形成播报标签集合。
本实施例的语音播报方法,通过从第一播报标签集合中选取部分播报标签形成第一目标播报标签集合,从第二播报标签集合中选择部分播报标签形成第二目标播报标签集合,利用第一目标播报标签集合和/或第二目标播报标签集合,形成播报标签集合,能够实现语音播报的个性化定制,针对性强,且有效提升系统的处理效率。
为了实现上述实施例,本公开还提出一种语音播报装置。
图4为本公开实施例提供的一种语音播报装置的结构示意图。
如图4所示,该语音播报装置400包括:第一获取模块410、识别模块420、第二获取模块430,以及播报模块440。其中,
第一获取模块410,用于获取待播报对象。
识别模块420,用于识别待播报对象所隶属的目标对象类型。
进一步地,识别模块420,具体用于根据待播报对象的关键信息,识别待播报对象的目标对象类型。
第二获取模块430,用于根据目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则。
播报模块440,用于根据播报标签集合所表征的播报规则播报待播报对象。
进一步地,在本公开实施例的一种可能的实现方式中,在图4的基础上,参见图5,该语音播报装置400还进一步包括:
构建模块450,用于针对每个对象类型,获取不同对象类型下的播报规则,根据播报规则形成对象类型对应的播报标签集合,构建对象类型与播报标签集合之间的映射关系。
在本公开实施例一种可能实现的方式中,第二获取模块430,包括:
查询获取单元431,用于根据目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与待播报对象匹配的第一播报标签集合,其中第一播报标签集合为播报标签集合。
需求获取单元432,用于在根据目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与待播报对象匹配的第一播报标签集合之后,获取用户的播报需求。
第一形成单元433,用于根据播报需求形成与待播报对象匹配的第二播报标签集合。
第二形成单元434,用于利用第一播报标签集合和第二播报标签集合,形成播报标签集合。
进一步地,第二形成单元434,具体用于:从第一播报标签集合中选取部分播报标签形成第一目标播报标签集合,从第二播报标签集合中选择部分播报标签形成第二目标播报标签集合,以及利用第一目标播报标签集合和/或第二目标播报标签集合,形成播报标签集合。
需要说明的是,前述图1-图3实施例对语音播报方法实施例的解释说明也适用于该实施例的语音播报装置400,此处不再赘述。
本实施例的语音播报装置,通过根据待播报对象的目标对象类型获取与待播报对象匹配的播报标签集合;其中,播报标签集合用于表征出待播报对象的播报规则,根据播报标签集合所表征的播报规则播报待播报对象。本实施例中,能够实现在播报时将待播报内容所携带的情感展现给听众,使听众能够在听觉上感受到该内容所携带的情感。本实施例中按照播报标签来播报对象是对语音合成标记语言规范的一种实现手段,有利于人们通过各种终端设备来聆听语音。
图6示出了适于用来实现本公开实施方式的示例性智能设备20的框图。图6显示的智能设备20仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,智能设备20以通用计算设备的形式表现。智能设备20的组件可以包括但不限于:一个或者多个处理器或者处理单元21,系统存储器22,连接不同系统组件(包括系统存储器22和处理单元21)的总线23。
总线23表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture;以下简称:ISA)总线,微通道体系结构(Micro Channel Architecture;以下简称:MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association;以下简称:VESA)局域总线以及外围组件互连(Peripheral Component Interconnection;以下简称:PCI)总线。
智能设备20典型地包括多种计算机系统可读介质。这些介质可以是任何能够被智能设备20访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器22可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory;以下简称:RAM)30和/或高速缓存存储器32。智能设备可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图6未显示,通常称为“硬盘驱动器”)。尽管图6中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如:光盘只读存储器(Compact Disc Read  Only Memory;以下简称:CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory;以下简称:DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线23相连。存储器22可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器22中,这样的程序模块42包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本公开所描述的实施例中的功能和/或方法。
智能设备20也可以与一个或多个外部设备50(例如键盘、指向设备、显示器60等)通信,还可与一个或者多个使得用户能与该智能设备20交互的设备通信,和/或与使得该智能设备20能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口24进行。并且,智能设备20还可以通过网络适配器25与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器25通过总线23与智能设备20的其它模块通信。应当明白,尽管图中未示出,可以结合智能设备20使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元21通过运行存储在系统存储器22中的程序,从而执行各种功能应用以及数据处理,例如实现图1-图3所示的语音播报方法。
可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(Read Only Memory;以下简称:ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory;以下简称:EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中 承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network;以下简称:LAN)或广域网(Wide Area Network;以下简称:WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
为了实现上述实施例,本公开还提出一种计算机程序产品,当计算机程序产品中的指令由处理器执行时,执行如前述实施例所述的语音播报方法。
为了实现上述实施例,本公开还提出一种计算机可读存储介质,其上存储有计算机程序,当该计算机程序被处理器执行时能够实现如前述实施例所述的语音播报方法。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序, 包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (14)

  1. 一种语音播报方法,其特征在于,包括:
    获取待播报对象;
    识别所述待播报对象的目标对象类型;
    根据所述目标对象类型获取与所述待播报对象匹配的播报标签集合;其中,所述播报标签集合用于表征出所述待播报对象的播报规则;
    根据所述播报标签集合所表征的所述播报规则播报所述待播报对象。
  2. 根据权利要求1所述的语音播报方法,其特征在于,所述根据所述目标类型获取与所述待播报对象匹配的播报标签集合,包括:
    根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合,其中所述第一播报标签集合为所述播报标签集合。
  3. 根据权利要求2所述的语音播报方法,其特征在于,所述根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合之后,还包括:
    获取用户的播报需求;
    根据所述播报需求形成所述与所述待播报对象匹配的第二播报标签集合;
    利用所述第一播报标签集合和所述第二播报标签集合,形成所述播报标签集合。
  4. 根据权利要求3所述的语音播报方法,其特征在于,所述利用所述第一播报标签集合和所述第二播报标签集合,形成所述播报标签集合,包括:
    从所述第一播报标签集合中选取部分播报标签形成第一目标播报标签集合;
    从所述第二播报标签集合中选择部分播报标签形成第二目标播报标签集合;
    利用所述第一目标播报标签集合和/或第二目标播报标签集合,形成所述播报标签集合。
  5. 根据权利要求1-4任一项所述的语音播报方法,其特征在于,所述获取待播报对象之前,还包括:
    针对每个对象类型,获取不同对象类型下的播报规则;
    根据所述播报规则形成所述对象类型对应的播报标签集合;
    构建所述对象类型与播报标签集合之间的所述映射关系。
  6. 根据权利要求1-5任一项所述的语音播报方法,其特征在于,所述识别所述待播报对象的目标对象类型,包括:
    根据所述待播报对象的关键信息,识别所述待播报对象的所述目标对象类型。
  7. 一种语音播报装置,其特征在于,包括:
    第一获取模块,用于获取待播报对象;
    识别模块,用于识别所述待播报对象所隶属的目标对象类型;
    第二获取模块,用于根据所述目标对象类型获取与所述待播报对象匹配的播报标签集合;其中,所述播报标签集合用于表征出所述待播报对象的播报规则;
    播报模块,用于根据所述播报标签集合所表征的所述播报规则播报所述待播报对象。
  8. 根据权利要求7所述的语音播报装置,其特征在于,所述第二获取模块,包括:
    查询获取单元,用于根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合,其中所述第一播报标签集合为所述播报标签集合。
  9. 根据权利要求8所述的语音播报装置,其特征在于,所述第二获取模块,还包括:
    需求获取单元,用于在根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合之后,获取用户的播报需求;
    第一形成单元,用于根据所述播报需求形成所述与所述待播报对象匹配的第二播报标签集合;
    第二形成单元,用于利用所述第一播报标签集合和所述第二播报标签集合,形成所述播报标签集合。
  10. 根据权利要求9所述的语音播报装置,其特征在于,所述第二形成单元,具体用于从所述第一播报标签集合中选取部分播报标签形成第一目标播报标签集合,从所述第二播报标签集合中选择部分播报标签形成第二目标播报标签集合,以及利用所述第一目标播报标签集合和/或第二目标播报标签集合,形成所述播报标签集合。
  11. 根据权利要求7-10任一项所述的语音播报装置,其特征在于,还包括:
    构建模块,用于针对每个对象类型,获取不同对象类型下的播报规则,根据所述播报规则形成所述对象类型对应的播报标签集合,构建所述对象类型与播报标签集合之间的所述映射关系。
  12. 根据权利要求7-11任一项所述的语音播报装置,其特征在于,所述识别模块,具体用于根据所述待播报对象的关键信息,识别所述待播报对象的所述目标对象类型。
  13. 一种智能设备,其特征在于,包括存储器和处理器其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现如权利要求1-6中任一所述的语音播报方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-6中任一项所述的语音播报方法。
PCT/CN2018/094116 2017-07-05 2018-07-02 语音播报方法及装置 WO2019007308A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP18828877.3A EP3651152A4 (en) 2017-07-05 2018-07-02 METHOD AND DEVICE FOR VOICE TRANSMISSION
KR1020197002335A KR102305992B1 (ko) 2017-07-05 2018-07-02 음성 플레이 방법 및 장치
JP2019503523A JP6928642B2 (ja) 2017-07-05 2018-07-02 音声放送方法及び装置
US16/616,611 US20200184948A1 (en) 2017-07-05 2018-07-02 Speech playing method, an intelligent device, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710541569.2 2017-07-05
CN201710541569.2A CN107437413B (zh) 2017-07-05 2017-07-05 语音播报方法及装置

Publications (1)

Publication Number Publication Date
WO2019007308A1 true WO2019007308A1 (zh) 2019-01-10

Family

ID=60459727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094116 WO2019007308A1 (zh) 2017-07-05 2018-07-02 语音播报方法及装置

Country Status (6)

Country Link
US (1) US20200184948A1 (zh)
EP (1) EP3651152A4 (zh)
JP (1) JP6928642B2 (zh)
KR (1) KR102305992B1 (zh)
CN (1) CN107437413B (zh)
WO (1) WO2019007308A1 (zh)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437413B (zh) * 2017-07-05 2020-09-25 百度在线网络技术(北京)有限公司 语音播报方法及装置
CN108053820A (zh) * 2017-12-13 2018-05-18 广东美的制冷设备有限公司 空气调节器的语音播报方法及装置
CN108600911B (zh) 2018-03-30 2021-05-18 联想(北京)有限公司 一种输出方法及电子设备
CN109582271B (zh) * 2018-10-26 2020-04-03 北京蓦然认知科技有限公司 一种动态设置tts播放参数的方法、装置及设备
CN109523987A (zh) * 2018-11-30 2019-03-26 广东美的制冷设备有限公司 事件语音播报方法、装置及家电设备
CN110032626B (zh) * 2019-04-19 2022-04-12 百度在线网络技术(北京)有限公司 语音播报方法和装置
CN110189742B (zh) * 2019-05-30 2021-10-08 芋头科技(杭州)有限公司 确定情感音频、情感展示、文字转语音的方法和相关装置
CN110456687A (zh) * 2019-07-19 2019-11-15 安徽亿联网络科技有限公司 一种多模式智能场景控制系统
US11380300B2 (en) 2019-10-11 2022-07-05 Samsung Electronics Company, Ltd. Automatically generating speech markup language tags for text
CN112698807B (zh) * 2020-12-29 2023-03-31 上海掌门科技有限公司 语音播报方法、设备及计算机可读介质
CN113611282B (zh) * 2021-08-09 2024-05-14 苏州市广播电视总台 广播节目智能播报系统及方法
CN115985022A (zh) * 2022-12-14 2023-04-18 江苏丰东热技术有限公司 设备情况实时语音播报方法、装置、电子设备及存储介质
CN118314901B (zh) * 2024-06-05 2024-08-20 深圳市声扬科技有限公司 语音播放方法、装置、电子设备以及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693725A (zh) * 2011-03-25 2012-09-26 通用汽车有限责任公司 依赖于文本信息语境的语音识别
US20140067397A1 (en) * 2012-08-29 2014-03-06 Nuance Communications, Inc. Using emoticons for contextual text-to-speech expressivity
CN105139848A (zh) * 2015-07-23 2015-12-09 小米科技有限责任公司 数据转换方法和装置
CN105931631A (zh) * 2016-04-15 2016-09-07 北京地平线机器人技术研发有限公司 语音合成系统和方法
CN106652995A (zh) * 2016-12-31 2017-05-10 深圳市优必选科技有限公司 文本语音播报方法及系统
US20170186418A1 (en) * 2014-06-05 2017-06-29 Nuance Communications, Inc. Systems and methods for generating speech of multiple styles from text
CN107437413A (zh) * 2017-07-05 2017-12-05 百度在线网络技术(北京)有限公司 语音播报方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100724868B1 (ko) * 2005-09-07 2007-06-04 삼성전자주식회사 다수의 합성기를 제어하여 다양한 음성 합성 기능을제공하는 음성 합성 방법 및 그 시스템
US7822606B2 (en) * 2006-07-14 2010-10-26 Qualcomm Incorporated Method and apparatus for generating audio information from received synthesis information
KR101160193B1 (ko) * 2010-10-28 2012-06-26 (주)엠씨에스로직 감성적 음성합성 장치 및 그 방법
WO2015162737A1 (ja) * 2014-04-23 2015-10-29 株式会社東芝 音訳作業支援装置、音訳作業支援方法及びプログラム
JP6596891B2 (ja) * 2015-04-08 2019-10-30 ソニー株式会社 送信装置、送信方法、受信装置、及び、受信方法
CN106557298A (zh) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 面向智能机器人的背景配音输出方法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693725A (zh) * 2011-03-25 2012-09-26 通用汽车有限责任公司 依赖于文本信息语境的语音识别
US20140067397A1 (en) * 2012-08-29 2014-03-06 Nuance Communications, Inc. Using emoticons for contextual text-to-speech expressivity
US20170186418A1 (en) * 2014-06-05 2017-06-29 Nuance Communications, Inc. Systems and methods for generating speech of multiple styles from text
CN105139848A (zh) * 2015-07-23 2015-12-09 小米科技有限责任公司 数据转换方法和装置
CN105931631A (zh) * 2016-04-15 2016-09-07 北京地平线机器人技术研发有限公司 语音合成系统和方法
CN106652995A (zh) * 2016-12-31 2017-05-10 深圳市优必选科技有限公司 文本语音播报方法及系统
CN107437413A (zh) * 2017-07-05 2017-12-05 百度在线网络技术(北京)有限公司 语音播报方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3651152A4 *

Also Published As

Publication number Publication date
EP3651152A1 (en) 2020-05-13
CN107437413B (zh) 2020-09-25
EP3651152A4 (en) 2021-04-21
CN107437413A (zh) 2017-12-05
KR20190021409A (ko) 2019-03-05
KR102305992B1 (ko) 2021-09-28
JP2019533212A (ja) 2019-11-14
US20200184948A1 (en) 2020-06-11
JP6928642B2 (ja) 2021-09-01

Similar Documents

Publication Publication Date Title
WO2019007308A1 (zh) 语音播报方法及装置
US10614803B2 (en) Wake-on-voice method, terminal and storage medium
CN108831437B (zh) 一种歌声生成方法、装置、终端和存储介质
CN107423363B (zh) 基于人工智能的话术生成方法、装置、设备及存储介质
WO2020098115A1 (zh) 字幕添加方法、装置、电子设备及计算机可读存储介质
US11011175B2 (en) Speech broadcasting method, device, apparatus and computer-readable storage medium
WO2021083071A1 (zh) 语音转换、文件生成、播音、语音处理方法、设备及介质
WO2018059342A1 (zh) 一种双音源音频数据的处理方法及装置
JP6078964B2 (ja) 音声対話システム及びプログラム
CN107463700B (zh) 用于获取信息的方法、装置及设备
WO2023029904A1 (zh) 文本内容匹配方法、装置、电子设备及存储介质
US8620670B2 (en) Automatic realtime speech impairment correction
WO2014154097A1 (en) Automatic page content reading-aloud method and device thereof
CN111142667A (zh) 一种基于文本标记生成语音的系统和方法
CN112908292A (zh) 文本的语音合成方法、装置、电子设备及存储介质
CN110413834B (zh) 语音评论修饰方法、系统、介质和电子设备
WO2023287360A2 (zh) 多媒体处理方法、装置、电子设备及存储介质
WO2018120820A1 (zh) 一种演示文稿的制作方法和装置
CN112599130B (zh) 一种基于智慧屏的智能会议系统
CN110379406A (zh) 语音评论转换方法、系统、介质和电子设备
US20140297285A1 (en) Automatic page content reading-aloud method and device thereof
WO2023184266A1 (zh) 语音控制方法及装置、计算机可读存储介质、电子设备
CN113761865A (zh) 声文重对齐及信息呈现方法、装置、电子设备和存储介质
CN115312032A (zh) 语音识别训练集的生成方法及装置
WO2018224032A1 (zh) 多媒体管理方法和装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019503523

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20197002335

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18828877

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018828877

Country of ref document: EP

Effective date: 20200205