WO2019007308A1 - 语音播报方法及装置 - Google Patents
语音播报方法及装置 Download PDFInfo
- Publication number
- WO2019007308A1 WO2019007308A1 PCT/CN2018/094116 CN2018094116W WO2019007308A1 WO 2019007308 A1 WO2019007308 A1 WO 2019007308A1 CN 2018094116 W CN2018094116 W CN 2018094116W WO 2019007308 A1 WO2019007308 A1 WO 2019007308A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- broadcast
- label set
- label
- target
- object type
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013507 mapping Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 10
- 230000008451 emotion Effects 0.000 abstract description 17
- 230000015572 biosynthetic process Effects 0.000 abstract description 5
- 238000003786 synthesis reaction Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 230000003287 optical effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
- G10L2013/105—Duration
Definitions
- the present disclosure relates to the field of voice processing technologies, and in particular, to a voice broadcast method and apparatus.
- the broadcast effect of the full live broadcast is able to satisfy the user's expectations and can play a role in conveying emotions.
- the full live broadcast of labor costs is high.
- TTS text-to-speech
- the present disclosure aims to solve at least one of the technical problems in the related art to some extent.
- the first object of the present disclosure is to provide a voice broadcast method, so as to realize the emotions carried by the content to be broadcasted to the listener during the broadcast, so that the listener can feel the emotion carried by the content.
- the effect of the broadcast of the existing TTS broadcast mode can not play a role in conveying emotions, and it is impossible for the listener to feel the content of the need to broadcast or the emotions carried by the information.
- a second object of the present disclosure is to provide a voice broadcast device.
- a third object of the present disclosure is to propose a smart device.
- a fourth object of the present disclosure is to propose a computer program product.
- a fifth object of the present disclosure is to propose a computer readable storage medium.
- the first aspect of the present disclosure provides a voice broadcast method, including:
- the voice broadcast method of the embodiment of the present disclosure obtains a broadcast label set that matches the object to be broadcast according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the object to be broadcasted, according to the broadcast label set.
- the characterized broadcast rules broadcast the object to be broadcast.
- the broadcast of the object according to the broadcast label is an implementation means for the speech synthesis markup language specification, which is convenient for people to listen to the voice through various terminal devices.
- the second aspect of the present disclosure provides a voice broadcast apparatus, including:
- a first acquiring module configured to acquire an object to be broadcasted
- An identification module configured to identify a target object type to which the to-be-advertised object belongs
- a second acquiring module configured to acquire, according to the target object type, a set of broadcast tags that match the to-be-advertised object; wherein the set of broadcast tags is used to represent a broadcast rule of the to-be-advertised object;
- a broadcast module configured to broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
- the voice broadcast apparatus of the embodiment of the present disclosure acquires a broadcast label set that matches the to-be-advertised object according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the to-be-advertised object, according to the broadcast label set.
- the characterized broadcast rules broadcast the object to be broadcast.
- the broadcast of the object according to the broadcast label is an implementation means for the speech synthesis markup language specification, which is convenient for people to listen to the voice through various terminal devices.
- a third aspect of the present disclosure provides a smart device including: a memory and a processor, wherein the processor operates and reads the executable program code stored in the memory A program corresponding to the program code is executed for implementing the voice broadcast method according to the first aspect of the embodiments of the present disclosure.
- a fourth aspect of the present disclosure provides a computer program product that, when executed by a processor, executes a voice broadcast method as described in the first aspect.
- a fifth aspect of the present disclosure provides a computer readable storage medium having stored thereon a computer program, and when the computer program is executed by the processor, the voice broadcast method according to the first aspect embodiment is implemented. .
- FIG. 1 is a schematic flowchart diagram of a voice broadcast method according to an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart diagram of another voice broadcast method according to an embodiment of the present disclosure
- FIG. 3 is a schematic flowchart diagram of another voice broadcast method according to an embodiment of the present disclosure.
- FIG. 4 is a schematic structural diagram of a voice broadcast apparatus according to an embodiment of the present disclosure.
- FIG. 5 is a schematic structural diagram of another voice broadcast apparatus according to an embodiment of the present disclosure.
- FIG. 6 is a schematic structural diagram of a smart device according to an embodiment of the present disclosure.
- FIG. 1 is a schematic flowchart diagram of a voice broadcast method according to an embodiment of the present disclosure.
- the voice broadcast method includes the following steps:
- the object to be broadcast is content or information that needs to be broadcasted.
- the to-be-advertised object may be obtained by a related application in the electronic device to broadcast it, such as a Baidu APP.
- a related application in the electronic device such as a Baidu APP.
- the user can input the content or information to be broadcasted by voice/text.
- the electronic device is, for example, a personal computer (PC), a cloud device or a mobile device, a mobile device such as a smart phone, or a tablet computer.
- PC personal computer
- cloud device or a mobile device
- mobile device such as a smart phone
- tablet computer a tablet computer
- the related application installed in the electronic device is a Baidu APP
- the user when the user wants to feel the emotion carried by the object to be broadcast, the user can click to enter the Baidu APP interface, and press and hold the button in the interface.
- the voice input “degree secret”, you can enter the secret plug-in, and then the user can determine the content or information to be broadcast by voice/text input, and then the secret plug-in can obtain the need to broadcast.
- Content or information that is, the object to be broadcasted.
- the broadcast rules are different for different object types. Therefore, before the object to be broadcasted is broadcasted, the target object type of the object to be broadcast needs to be identified, so that the matching broadcast rule is selected according to the target object type to broadcast the object to be broadcasted.
- the target object type of the object to be broadcasted may be identified according to key information of the object to be broadcasted, for example, the object type may be poetry, weather, time, calculation, and the like.
- the key information of the object to be broadcasted may be, for example, the source of the object to be broadcasted (application), or may be the title of the object to be broadcasted, or may be the identifier of the object to be broadcasted, which is not limited thereto.
- the broadcast label set is used to represent the broadcast rule of the to-be-advertised object.
- the broadcast label set corresponding to the object type may be formed for the broadcast rule, and then the mapping relationship between the object type and the broadcast label set is established in advance, and when the target object type of the object to be broadcast is determined.
- the mapping relationship between the object type and the broadcast tag set may be queried, and the broadcast tag set matching the object to be broadcasted is obtained.
- the broadcast tag set mainly includes pauses, accents, volume, pitch, speed of sound, sound source, audio introduction, multi-tone word identification, digital reading identification and the like.
- Pause tags Build labels that implement word level, phrase level, short sentence level, full sentence level, and timed pauses.
- Accent Label Build an accent label that implements different sizes.
- Volume, tone, sonic, and thick labels Build labels that adjust the corresponding broadcasts by percentage.
- Audio Import Tab Constructs a label that inserts an audio file into a piece of text.
- Multi-tone word identification label Constructs a label that can mark the correct reading of multi-tone words.
- Digital Read Label Constructs a label that can be labeled with a correct number of digits, including numbers, integers, numbers, scores, scores, phone numbers, zip codes, and more.
- Sound Source Label Build a label that selects the speaker.
- a sonic tag can be set.
- the sonic tag can display a short extension on the word "light”, that is, a short extension on the fourth word to extend the broadcast time of the "light” word.
- the "before the bed bright moonlight” is marked, for example, the complete first five-word poem can be marked, and finally the complete format is output, and the broadcast label set matching the five-word poem is synthesized, and the broadcast label set is collected. Includes word-level pause labels, accent labels, and sonic labels.
- S104 Broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
- the five-character poem As an example, in the specific application, when it is determined that the object type of the object to be broadcast is a five-character poem, as long as the broadcast label set matching the five-word poem is added, and the five-character poem is broadcast according to the broadcast rule represented by the broadcast label set, the five-word poem can be realized. Aloud reading effect.
- the voice broadcast method of the embodiment obtains a broadcast label set that matches the to-be-recorded object according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the to-be-advertised object, and is characterized according to the broadcast label set.
- the broadcast rule broadcasts the object to be broadcast.
- the broadcast of the object according to the broadcast label is an implementation method of the Speech Synthesis Markup Language (SSML) specification, which is convenient for people to listen to the voice through various terminal devices.
- SSML Speech Synthesis Markup Language
- FIG. 2 is a schematic flowchart of another voice broadcast method according to an embodiment of the present disclosure.
- the voice broadcast method may include the following steps:
- the broadcast rules under different object types can be obtained for each object type in advance. For example, taking the object type as a poem as an example, the broadcast rule is a reading rule of poetry.
- the object type is poetry
- a set of broadcast labels matching the poetry can be formed.
- the "pre-bed” can be marked according to the five-character poem reading rules.
- Need word level pause set a pause label, which can show pause after the words "before the bed", that is, pause after the second word; "ming” needs to be reread, set a reread label,
- the pause label can be displayed for rereading on the word "bright”, that is, rereading on the third word;
- "light” needs to be extended for a short time, and a sonic label can be set, which can be displayed as short on the word "light”.
- the mapping relationship between the object type and the broadcast tag set is determined.
- the mapping relationship may be queried, and the broadcast tag set matching the object to be broadcasted is obtained, which is easy to implement and simple to operate. .
- the first broadcast label set mainly includes pauses, accents, volume, pitch, sound speed, sound source, audio introduction, multi-tone word identification, digital reading identification and the like.
- the user's broadcast request may be, for example, a raining sound while the weather is being broadcasted, and the user may be prompted to go out with an umbrella.
- the user's broadcast request may be, for example, a hail sound while the weather is being broadcast, and the user may be prompted to try not to go out.
- the second set of tags includes a background sound tag, an English reading tag, a poetry tag, a voice emoji tag, and the like.
- the background sound label on the basis of the audio introduction label implementation, the background sound label is constructed, so that the broadcast content and the audio effect are combined.
- Poetry label According to the poetry type and the name of the poem, the poems are classified, and the rhyming and other reading rules are respectively marked for each category, and the poetry category advanced label is generated by the combination of the labels in the first broadcast label set.
- Voice emoji tag Create an audio file library that may be used in different emotions and scenarios, and introduce corresponding resources in different scenarios to generate a voice broadcast emoji. For example, when asking for weather, if it is rainy, there will be corresponding rain. Broadcast.
- the second broadcast label set matching the to-be-advertised object may be a background sound label.
- the background sound label may be added, so that when the weather is broadcast, the rain sound can be heard. Or hail sound.
- the second set of broadcast tags that match the object to be broadcasted may be an English reading tag.
- the English reading tag may be added to achieve an English reading effect.
- the second broadcast label set matching the object to be broadcasted may be a poetry label.
- the poetry label may be added to realize the reading effect of the poetry.
- a second broadcast label set matching the object to be broadcasted is formed, which can realize personalized customization of the voice broadcast, effectively improve the applicability of the voice broadcast method, and improve the user experience.
- the first broadcast label set may be formed according to the reading rule, and the second broadcast label set matching the broadcast requirement is a poetry label, and then the first broadcast label set and the second broadcast label set may be used to form the broadcast label. set.
- the first broadcast label set can be obtained according to the content to be broadcasted, and the second broadcast label set matching the broadcast request is a background sound label, and then the first broadcast label set and the second broadcast label set can be formed by using the first broadcast label set.
- Broadcast label collection Specifically, the single broadcast effect can be realized by using the background sound label and the fixed broadcast content, and different broadcast effects in different weathers are sequentially labeled, and finally the weather broadcast label set is generated.
- S210 Broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
- the effect of different user needs can be broadcast according to the weather broadcast label set and the weather keyword.
- step S210 For the implementation process of step S210, refer to the foregoing embodiment, and details are not described herein again.
- the voice broadcast method of the embodiment obtains a broadcast rule under different object types for each object type, forms a broadcast tag set corresponding to the object type according to the broadcast rule, and constructs a mapping relationship between the object type and the broadcast tag set, which is easy to It is easy to implement and easy to operate.
- the target type of the object to be broadcasted is obtained by the object to be broadcasted, and the mapping relationship between the object type and the broadcast tag set is obtained according to the target object type, and the first broadcast tag set that matches the object to be broadcasted is obtained, and the user's
- the broadcast request needs to form a second broadcast label set that matches the to-be-recorded object according to the broadcast requirement, and uses the first broadcast label set and the second broadcast label set to form a broadcast label set, and broadcast the to-be-recorded object according to the broadcast rule represented by the broadcast label set. It can realize the personalized customization of voice broadcast, effectively improve the applicability of the voice broadcast method and enhance the user experience.
- step S209 specifically includes the following sub-steps:
- the first broadcast label set mainly includes tabs such as pause, accent, volume, pitch, speed of sound, sound source, audio introduction, multi-tone word identification, digital reading identification, etc., and the broadcast object is broadcasted, and only part of it may be used.
- the label therefore, may be selected from the first set of broadcast labels to select a broadcast label corresponding to the broadcast, to form a first target broadcast label set, which is highly targeted and improves the processing efficiency of the system.
- the broadcast label set matching the broadcast requirement of the user may only include some broadcast labels in the second broadcast label set.
- the broadcast label set matching the user's broadcast request is only The background sound label, therefore, the partial broadcast label can be selected from the second broadcast label set to form the second target broadcast label set, which is highly targeted and improves the processing efficiency of the system.
- the background sound tag may be selected from the second broadcast tag set to form a second target broadcast tag set.
- a poem tag may be selected from the second set of broadcast tags to form a second target broadcast tag set.
- the first target broadcast label set is formed by selecting a partial broadcast label from the first broadcast label set, and the partial broadcast label is selected from the second broadcast label set to form a second target broadcast label set, and the first The target broadcast label set and/or the second target broadcast label set form a broadcast label set, which can realize personalized customization of the voice broadcast, is highly targeted, and effectively improves the processing efficiency of the system.
- the present disclosure also proposes a voice broadcast device.
- FIG. 4 is a schematic structural diagram of a voice broadcast apparatus according to an embodiment of the present disclosure.
- the voice broadcast apparatus 400 includes a first acquisition module 410, an identification module 420, a second acquisition module 430, and a broadcast module 440. among them,
- the first obtaining module 410 is configured to acquire an object to be broadcasted.
- the identification module 420 is configured to identify a target object type to which the object to be broadcast belongs.
- the identifying module 420 is specifically configured to identify a target object type of the object to be broadcast according to key information of the object to be broadcasted.
- the second obtaining module 430 is configured to obtain, according to the target object type, a set of broadcast tags that match the object to be broadcasted; wherein the set of broadcast tags is used to represent the broadcast rule of the object to be broadcasted.
- the broadcast module 440 is configured to broadcast the to-be-advertised object according to the broadcast rule represented by the broadcast tag set.
- the voice broadcast apparatus 400 further includes:
- the construction module 450 is configured to acquire a broadcast rule under different object types for each object type, form a broadcast tag set corresponding to the object type according to the broadcast rule, and construct a mapping relationship between the object type and the broadcast tag set.
- the second obtaining module 430 includes:
- the query obtaining unit 431 is configured to query a mapping relationship between the object type and the broadcast label set according to the target object type, and obtain a first broadcast label set that matches the to-be-recorded object, where the first broadcast label set is a broadcast label set.
- the requirement obtaining unit 432 is configured to obtain a broadcast request requirement of the user after obtaining the first broadcast label set that matches the to-be-advertised object according to the mapping relationship between the query object type and the broadcast label set according to the target object type.
- the first forming unit 433 is configured to form a second broadcast label set that matches the to-be-advertised object according to the broadcast requirement.
- the second forming unit 434 is configured to form a broadcast label set by using the first broadcast label set and the second broadcast label set.
- the second forming unit 434 is specifically configured to: select a partial broadcast label from the first broadcast label set to form a first target broadcast label set, and select a partial broadcast label from the second broadcast label set to form a second target broadcast label set. And forming a set of broadcast tags by using the first target broadcast tag set and/or the second target broadcast tag set.
- the voice broadcast apparatus of the embodiment obtains the broadcast label set that matches the to-be-recorded object according to the target object type of the object to be broadcasted; wherein the broadcast label set is used to represent the broadcast rule of the to-be-advertised object, and is characterized according to the broadcast label set.
- the broadcast rule broadcasts the object to be broadcast.
- the broadcast of the object according to the broadcast label is an implementation means for the speech synthesis markup language specification, which is convenient for people to listen to the voice through various terminal devices.
- FIG. 6 illustrates a block diagram of an exemplary smart device 20 suitable for use in implementing embodiments of the present disclosure.
- the smart device 20 shown in FIG. 6 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
- smart device 20 is represented in the form of a general purpose computing device.
- the components of smart device 20 may include, but are not limited to, one or more processors or processing units 21, system memory 22, and a bus 23 that connects different system components, including system memory 22 and processing unit 21.
- Bus 23 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
- these architectures include, but are not limited to, an Industry Standard Architecture (hereinafter referred to as ISA) bus, a Micro Channel Architecture (MAC) bus, an enhanced ISA bus, and video electronics.
- ISA Industry Standard Architecture
- MAC Micro Channel Architecture
- VESA Video Electronics Standards Association
- PCI Peripheral Component Interconnection
- the smart device 20 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by smart device 20, including volatile and non-volatile media, removable and non-removable media.
- System memory 22 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32.
- the smart device may further include other removable/non-removable, volatile/non-volatile computer system storage media.
- storage system 34 may be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 6, commonly referred to as "hard disk drives").
- a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
- a removable non-volatile disk for example, a compact disk read-only memory (Compact)
- each drive can be coupled to bus 23 via one or more data medium interfaces.
- Memory 22 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of various embodiments of the present disclosure.
- a program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 22, such program modules 42 including, but not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment.
- Program module 42 typically performs the functions and/or methods of the embodiments described in this disclosure.
- the smart device 20 can also communicate with one or more external devices 50 (eg, a keyboard, pointing device, display 60, etc.), and can also communicate with one or more devices that enable the user to interact with the smart device 20, and/or with Any device (eg, a network card, modem, etc.) that enables the smart device 20 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 24.
- the smart device 20 can also pass through the network adapter 25 and one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet. ) Communication. As shown, network adapter 25 communicates with other modules of smart device 20 over bus 23.
- smart device 20 includes but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives. And data backup storage systems, etc.
- the processing unit 21 executes various function applications and data processing by running a program stored in the system memory 22, for example, implementing the voice broadcast method shown in Figs.
- the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
- the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
- a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
- the computer readable signal medium may comprise a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
- Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server.
- the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or Connect to an external computer (for example, using an Internet service provider to connect via the Internet).
- LAN local area network
- WAN wide area network
- an Internet service provider for example, using an Internet service provider to connect via the Internet.
- the present disclosure also proposes a computer program product that, when executed by a processor, executes a voice broadcast method as described in the foregoing embodiments.
- the present disclosure also proposes a computer readable storage medium having stored thereon a computer program capable of implementing the voice announcement method as described in the foregoing embodiments when the computer program is executed by the processor.
- first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
- features defining “first” and “second” may include at least one of the features, either explicitly or implicitly.
- the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
- Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing the steps of a custom logic function or process.
- the scope of the preferred embodiments of the present disclosure includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an inverse order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present disclosure pertain.
- a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
- computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
- the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
- portions of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof.
- multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
- a suitable instruction execution system For example, if implemented in hardware and in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), and the like.
- each functional unit in various embodiments of the present disclosure may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
- the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
- the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like. While the embodiments of the present disclosure have been shown and described above, it is understood that the foregoing embodiments are illustrative and are not to be construed as limiting the scope of the disclosure The embodiments are subject to variations, modifications, substitutions and variations.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Electrically Operated Instructional Devices (AREA)
- Information Transfer Between Computers (AREA)
- Circuits Of Receivers In General (AREA)
Abstract
Description
Claims (14)
- 一种语音播报方法,其特征在于,包括:获取待播报对象;识别所述待播报对象的目标对象类型;根据所述目标对象类型获取与所述待播报对象匹配的播报标签集合;其中,所述播报标签集合用于表征出所述待播报对象的播报规则;根据所述播报标签集合所表征的所述播报规则播报所述待播报对象。
- 根据权利要求1所述的语音播报方法,其特征在于,所述根据所述目标类型获取与所述待播报对象匹配的播报标签集合,包括:根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合,其中所述第一播报标签集合为所述播报标签集合。
- 根据权利要求2所述的语音播报方法,其特征在于,所述根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合之后,还包括:获取用户的播报需求;根据所述播报需求形成所述与所述待播报对象匹配的第二播报标签集合;利用所述第一播报标签集合和所述第二播报标签集合,形成所述播报标签集合。
- 根据权利要求3所述的语音播报方法,其特征在于,所述利用所述第一播报标签集合和所述第二播报标签集合,形成所述播报标签集合,包括:从所述第一播报标签集合中选取部分播报标签形成第一目标播报标签集合;从所述第二播报标签集合中选择部分播报标签形成第二目标播报标签集合;利用所述第一目标播报标签集合和/或第二目标播报标签集合,形成所述播报标签集合。
- 根据权利要求1-4任一项所述的语音播报方法,其特征在于,所述获取待播报对象之前,还包括:针对每个对象类型,获取不同对象类型下的播报规则;根据所述播报规则形成所述对象类型对应的播报标签集合;构建所述对象类型与播报标签集合之间的所述映射关系。
- 根据权利要求1-5任一项所述的语音播报方法,其特征在于,所述识别所述待播报对象的目标对象类型,包括:根据所述待播报对象的关键信息,识别所述待播报对象的所述目标对象类型。
- 一种语音播报装置,其特征在于,包括:第一获取模块,用于获取待播报对象;识别模块,用于识别所述待播报对象所隶属的目标对象类型;第二获取模块,用于根据所述目标对象类型获取与所述待播报对象匹配的播报标签集合;其中,所述播报标签集合用于表征出所述待播报对象的播报规则;播报模块,用于根据所述播报标签集合所表征的所述播报规则播报所述待播报对象。
- 根据权利要求7所述的语音播报装置,其特征在于,所述第二获取模块,包括:查询获取单元,用于根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合,其中所述第一播报标签集合为所述播报标签集合。
- 根据权利要求8所述的语音播报装置,其特征在于,所述第二获取模块,还包括:需求获取单元,用于在根据所述目标对象类型,查询对象类型与播报标签集合之间的映射关系,得到获取与所述待播报对象匹配的第一播报标签集合之后,获取用户的播报需求;第一形成单元,用于根据所述播报需求形成所述与所述待播报对象匹配的第二播报标签集合;第二形成单元,用于利用所述第一播报标签集合和所述第二播报标签集合,形成所述播报标签集合。
- 根据权利要求9所述的语音播报装置,其特征在于,所述第二形成单元,具体用于从所述第一播报标签集合中选取部分播报标签形成第一目标播报标签集合,从所述第二播报标签集合中选择部分播报标签形成第二目标播报标签集合,以及利用所述第一目标播报标签集合和/或第二目标播报标签集合,形成所述播报标签集合。
- 根据权利要求7-10任一项所述的语音播报装置,其特征在于,还包括:构建模块,用于针对每个对象类型,获取不同对象类型下的播报规则,根据所述播报规则形成所述对象类型对应的播报标签集合,构建所述对象类型与播报标签集合之间的所述映射关系。
- 根据权利要求7-11任一项所述的语音播报装置,其特征在于,所述识别模块,具体用于根据所述待播报对象的关键信息,识别所述待播报对象的所述目标对象类型。
- 一种智能设备,其特征在于,包括存储器和处理器其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现如权利要求1-6中任一所述的语音播报方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-6中任一项所述的语音播报方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18828877.3A EP3651152A4 (en) | 2017-07-05 | 2018-07-02 | METHOD AND DEVICE FOR VOICE TRANSMISSION |
KR1020197002335A KR102305992B1 (ko) | 2017-07-05 | 2018-07-02 | 음성 플레이 방법 및 장치 |
JP2019503523A JP6928642B2 (ja) | 2017-07-05 | 2018-07-02 | 音声放送方法及び装置 |
US16/616,611 US20200184948A1 (en) | 2017-07-05 | 2018-07-02 | Speech playing method, an intelligent device, and computer readable storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710541569.2 | 2017-07-05 | ||
CN201710541569.2A CN107437413B (zh) | 2017-07-05 | 2017-07-05 | 语音播报方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019007308A1 true WO2019007308A1 (zh) | 2019-01-10 |
Family
ID=60459727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/094116 WO2019007308A1 (zh) | 2017-07-05 | 2018-07-02 | 语音播报方法及装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20200184948A1 (zh) |
EP (1) | EP3651152A4 (zh) |
JP (1) | JP6928642B2 (zh) |
KR (1) | KR102305992B1 (zh) |
CN (1) | CN107437413B (zh) |
WO (1) | WO2019007308A1 (zh) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107437413B (zh) * | 2017-07-05 | 2020-09-25 | 百度在线网络技术(北京)有限公司 | 语音播报方法及装置 |
CN108053820A (zh) * | 2017-12-13 | 2018-05-18 | 广东美的制冷设备有限公司 | 空气调节器的语音播报方法及装置 |
CN108600911B (zh) | 2018-03-30 | 2021-05-18 | 联想(北京)有限公司 | 一种输出方法及电子设备 |
CN109582271B (zh) * | 2018-10-26 | 2020-04-03 | 北京蓦然认知科技有限公司 | 一种动态设置tts播放参数的方法、装置及设备 |
CN109523987A (zh) * | 2018-11-30 | 2019-03-26 | 广东美的制冷设备有限公司 | 事件语音播报方法、装置及家电设备 |
CN110032626B (zh) * | 2019-04-19 | 2022-04-12 | 百度在线网络技术(北京)有限公司 | 语音播报方法和装置 |
CN110189742B (zh) * | 2019-05-30 | 2021-10-08 | 芋头科技(杭州)有限公司 | 确定情感音频、情感展示、文字转语音的方法和相关装置 |
CN110456687A (zh) * | 2019-07-19 | 2019-11-15 | 安徽亿联网络科技有限公司 | 一种多模式智能场景控制系统 |
US11380300B2 (en) | 2019-10-11 | 2022-07-05 | Samsung Electronics Company, Ltd. | Automatically generating speech markup language tags for text |
CN112698807B (zh) * | 2020-12-29 | 2023-03-31 | 上海掌门科技有限公司 | 语音播报方法、设备及计算机可读介质 |
CN113611282B (zh) * | 2021-08-09 | 2024-05-14 | 苏州市广播电视总台 | 广播节目智能播报系统及方法 |
CN115985022A (zh) * | 2022-12-14 | 2023-04-18 | 江苏丰东热技术有限公司 | 设备情况实时语音播报方法、装置、电子设备及存储介质 |
CN118314901B (zh) * | 2024-06-05 | 2024-08-20 | 深圳市声扬科技有限公司 | 语音播放方法、装置、电子设备以及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693725A (zh) * | 2011-03-25 | 2012-09-26 | 通用汽车有限责任公司 | 依赖于文本信息语境的语音识别 |
US20140067397A1 (en) * | 2012-08-29 | 2014-03-06 | Nuance Communications, Inc. | Using emoticons for contextual text-to-speech expressivity |
CN105139848A (zh) * | 2015-07-23 | 2015-12-09 | 小米科技有限责任公司 | 数据转换方法和装置 |
CN105931631A (zh) * | 2016-04-15 | 2016-09-07 | 北京地平线机器人技术研发有限公司 | 语音合成系统和方法 |
CN106652995A (zh) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | 文本语音播报方法及系统 |
US20170186418A1 (en) * | 2014-06-05 | 2017-06-29 | Nuance Communications, Inc. | Systems and methods for generating speech of multiple styles from text |
CN107437413A (zh) * | 2017-07-05 | 2017-12-05 | 百度在线网络技术(北京)有限公司 | 语音播报方法及装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100724868B1 (ko) * | 2005-09-07 | 2007-06-04 | 삼성전자주식회사 | 다수의 합성기를 제어하여 다양한 음성 합성 기능을제공하는 음성 합성 방법 및 그 시스템 |
US7822606B2 (en) * | 2006-07-14 | 2010-10-26 | Qualcomm Incorporated | Method and apparatus for generating audio information from received synthesis information |
KR101160193B1 (ko) * | 2010-10-28 | 2012-06-26 | (주)엠씨에스로직 | 감성적 음성합성 장치 및 그 방법 |
WO2015162737A1 (ja) * | 2014-04-23 | 2015-10-29 | 株式会社東芝 | 音訳作業支援装置、音訳作業支援方法及びプログラム |
JP6596891B2 (ja) * | 2015-04-08 | 2019-10-30 | ソニー株式会社 | 送信装置、送信方法、受信装置、及び、受信方法 |
CN106557298A (zh) * | 2016-11-08 | 2017-04-05 | 北京光年无限科技有限公司 | 面向智能机器人的背景配音输出方法及装置 |
-
2017
- 2017-07-05 CN CN201710541569.2A patent/CN107437413B/zh active Active
-
2018
- 2018-07-02 JP JP2019503523A patent/JP6928642B2/ja active Active
- 2018-07-02 WO PCT/CN2018/094116 patent/WO2019007308A1/zh unknown
- 2018-07-02 EP EP18828877.3A patent/EP3651152A4/en not_active Withdrawn
- 2018-07-02 US US16/616,611 patent/US20200184948A1/en not_active Abandoned
- 2018-07-02 KR KR1020197002335A patent/KR102305992B1/ko active IP Right Grant
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693725A (zh) * | 2011-03-25 | 2012-09-26 | 通用汽车有限责任公司 | 依赖于文本信息语境的语音识别 |
US20140067397A1 (en) * | 2012-08-29 | 2014-03-06 | Nuance Communications, Inc. | Using emoticons for contextual text-to-speech expressivity |
US20170186418A1 (en) * | 2014-06-05 | 2017-06-29 | Nuance Communications, Inc. | Systems and methods for generating speech of multiple styles from text |
CN105139848A (zh) * | 2015-07-23 | 2015-12-09 | 小米科技有限责任公司 | 数据转换方法和装置 |
CN105931631A (zh) * | 2016-04-15 | 2016-09-07 | 北京地平线机器人技术研发有限公司 | 语音合成系统和方法 |
CN106652995A (zh) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | 文本语音播报方法及系统 |
CN107437413A (zh) * | 2017-07-05 | 2017-12-05 | 百度在线网络技术(北京)有限公司 | 语音播报方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3651152A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP3651152A1 (en) | 2020-05-13 |
CN107437413B (zh) | 2020-09-25 |
EP3651152A4 (en) | 2021-04-21 |
CN107437413A (zh) | 2017-12-05 |
KR20190021409A (ko) | 2019-03-05 |
KR102305992B1 (ko) | 2021-09-28 |
JP2019533212A (ja) | 2019-11-14 |
US20200184948A1 (en) | 2020-06-11 |
JP6928642B2 (ja) | 2021-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019007308A1 (zh) | 语音播报方法及装置 | |
US10614803B2 (en) | Wake-on-voice method, terminal and storage medium | |
CN108831437B (zh) | 一种歌声生成方法、装置、终端和存储介质 | |
CN107423363B (zh) | 基于人工智能的话术生成方法、装置、设备及存储介质 | |
WO2020098115A1 (zh) | 字幕添加方法、装置、电子设备及计算机可读存储介质 | |
US11011175B2 (en) | Speech broadcasting method, device, apparatus and computer-readable storage medium | |
WO2021083071A1 (zh) | 语音转换、文件生成、播音、语音处理方法、设备及介质 | |
WO2018059342A1 (zh) | 一种双音源音频数据的处理方法及装置 | |
JP6078964B2 (ja) | 音声対話システム及びプログラム | |
CN107463700B (zh) | 用于获取信息的方法、装置及设备 | |
WO2023029904A1 (zh) | 文本内容匹配方法、装置、电子设备及存储介质 | |
US8620670B2 (en) | Automatic realtime speech impairment correction | |
WO2014154097A1 (en) | Automatic page content reading-aloud method and device thereof | |
CN111142667A (zh) | 一种基于文本标记生成语音的系统和方法 | |
CN112908292A (zh) | 文本的语音合成方法、装置、电子设备及存储介质 | |
CN110413834B (zh) | 语音评论修饰方法、系统、介质和电子设备 | |
WO2023287360A2 (zh) | 多媒体处理方法、装置、电子设备及存储介质 | |
WO2018120820A1 (zh) | 一种演示文稿的制作方法和装置 | |
CN112599130B (zh) | 一种基于智慧屏的智能会议系统 | |
CN110379406A (zh) | 语音评论转换方法、系统、介质和电子设备 | |
US20140297285A1 (en) | Automatic page content reading-aloud method and device thereof | |
WO2023184266A1 (zh) | 语音控制方法及装置、计算机可读存储介质、电子设备 | |
CN113761865A (zh) | 声文重对齐及信息呈现方法、装置、电子设备和存储介质 | |
CN115312032A (zh) | 语音识别训练集的生成方法及装置 | |
WO2018224032A1 (zh) | 多媒体管理方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2019503523 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20197002335 Country of ref document: KR Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18828877 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018828877 Country of ref document: EP Effective date: 20200205 |