WO2018076664A1 - Procédé et dispositif de diffusion vocale - Google Patents

Procédé et dispositif de diffusion vocale Download PDF

Info

Publication number
WO2018076664A1
WO2018076664A1 PCT/CN2017/084581 CN2017084581W WO2018076664A1 WO 2018076664 A1 WO2018076664 A1 WO 2018076664A1 CN 2017084581 W CN2017084581 W CN 2017084581W WO 2018076664 A1 WO2018076664 A1 WO 2018076664A1
Authority
WO
WIPO (PCT)
Prior art keywords
corpus
file
identification information
data packet
voice broadcast
Prior art date
Application number
PCT/CN2017/084581
Other languages
English (en)
Chinese (zh)
Inventor
王正
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018076664A1 publication Critical patent/WO2018076664A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present application relates to, but is not limited to, the field of voice processing technologies, and in particular, to a method and apparatus for voice broadcast.
  • the voice processing technology includes a voice recognition technology and a voice broadcast technology.
  • the traditional voice broadcast technology is implemented by a voice synthesis method, that is, the voice information is synthesized according to the input text information and played to the user.
  • the voice information generated by the speech synthesis method has many shortcomings of listening.
  • the voice information broadcasted makes the user sound harsh, old-fashioned, lacking emotion, and has the same effect in different environments, and the user experience is not good.
  • the desired effect is that the tone of the broadcast is easy and enjoyable; when the user wants to listen to a touching love story, the desired effect is that the tone of the announcement is emotional.
  • the voice information synthesized by the traditional voice broadcast technology cannot identify the specific application scenarios, and it is impossible to distinguish the broadcasts with different voices. It can only "spit" the words one by one, which is very old-fashioned. Therefore, the existing voice broadcast technology cannot realize the user's demand for emotional information of the voice information, and the user experience is poor.
  • the embodiments of the present invention are directed to providing a method and apparatus for voice broadcast, so that during voice broadcast, a voice broadcast with emotion can be provided according to different application environments to improve the user experience.
  • the embodiment of the invention provides a method for voice broadcast, which includes:
  • Generating a data packet of the voice broadcast content where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information;
  • the data packet is sent to the terminal to cause the terminal to play the corresponding corpus file based on the data packet.
  • the generating the data packet of the voice broadcast content includes: acquiring the text information corresponding to the voice broadcast content, and acquiring, in the mapping table for indicating the correspondence between the text information and the corpus identification information, The corpus identification information corresponding to the text information corresponding to the voice broadcast content, and the data packet of the voice broadcast content is generated based on the acquired text information and the corpus identification information.
  • the corpus identification information in the data packet includes: a corpus label or a corpus number.
  • the method further includes: after generating the corresponding corpus file, saving the corpus established by using the generated corpus file.
  • the embodiment of the invention further provides another method for voice broadcast, the method comprising:
  • the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information;
  • the corpus identification information in the data packet includes: a corpus label or a corpus number.
  • the obtaining the corpus file corresponding to the obtained corpus identification information includes: when the corpus identification information is a corpus label, according to a predetermined mapping for indicating a correspondence between the corpus label and the corpus number a table, determining a corpus number corresponding to the corpus identification information; and obtaining a corpus file corresponding to the determined corpus number.
  • the obtaining the corpus file corresponding to the obtained corpus identification information includes: when the corpus file corresponding to the corpus identification information exists locally, the corpus file corresponding to the corpus identification information is locally obtained; the local does not exist When the corpus file corresponding to the corpus identification information is used, the corpus file corresponding to the corpus identification information is downloaded from the server according to the corpus identification information.
  • the method further includes: saving the acquired corpus file;
  • the method further includes:
  • the corpus file is deleted according to the preset corpus file deletion policy
  • the preset corpus file deletion strategy includes: deleting all the corpus files in the local area, deleting according to the corpus file proportion, or deleting according to the corpus file use frequency.
  • the embodiment of the invention further provides a device for voice broadcast, the device comprising: a first recording module, a generating module and a sending module; wherein
  • the first recording module is configured to generate a corresponding corpus file for the real person recording of the text information that needs to be voice broadcasted;
  • a generating module configured to generate a data packet of the voice broadcast content, where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information;
  • a sending module configured to send the data packet to the terminal, so that the terminal plays the corresponding corpus file based on the data packet.
  • the generating module is configured to implement the data packet for generating the voice broadcast content by acquiring the text information corresponding to the voice broadcast content, and setting the correspondence between the text information and the corpus identification information in advance.
  • the mapping table the corpus identification information corresponding to the text information corresponding to the voice broadcast content is obtained, and the data packet of the voice broadcast content is generated based on the acquired text information and the corpus identification information.
  • the corpus identification information in the data packet includes: a corpus label or a corpus number.
  • the generating module is further configured to save the corpus established by using the generated corpus file after generating the corresponding corpus file.
  • the device of the present invention further provides another device for voice broadcast, the device comprising: a second recording module, a receiving module and a processing module; wherein
  • a second recording module configured to perform live-action recording on text information that needs to be voiced Corresponding corpus file
  • a receiving module configured to receive a data packet of the voice broadcast content, where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information;
  • a processing module configured to parse the data packet to obtain corpus identification information; obtain a corpus file corresponding to the derived corpus identification information; and play the corpus file.
  • the corpus identification information in the data packet includes: a corpus label or a corpus number.
  • the processing module may be configured to obtain a corpus file corresponding to the obtained corpus identification information when the corpus identification information is a corpus label, and according to a predetermined corpus label and corpus a mapping table of correspondences between the numbers, determining a corpus number corresponding to the corpus identification information; and acquiring a corpus file corresponding to the determined corpus number.
  • the processing module may be configured to obtain a corpus file corresponding to the obtained corpus identification information by: when the corpus file corresponding to the corpus identification information exists in the device, from the device Obtaining a corpus file corresponding to the corpus identification information;
  • the corpus file corresponding to the corpus identification information is downloaded from the server according to the corpus identification information.
  • the processing module is further configured to save the acquired corpus file after acquiring the corpus file corresponding to the obtained corpus identification information;
  • the processing module may be further configured to: when the data size of the saved corpus file is greater than or equal to the data size threshold, delete the corpus file according to the preset corpus file deletion policy;
  • the preset corpus file deletion strategy includes: deleting all corpus files in the device, deleting according to the corpus file proportion, or deleting according to the corpus file use frequency.
  • the corpus file is generated by performing real person recording on the text information that needs to be broadcasted by voice; the data packet of the voice broadcast content is generated, and the data packet includes: text information corresponding to the voice broadcast content and the text information corresponding to the text information The corpus identification information; the data packet is sent to the terminal, so that the terminal plays the corresponding corpus file based on the data packet.
  • the voice broadcast is performed, an emotional voice broadcast can be provided according to different application environments to enhance the user experience.
  • FIG. 1 is a flow chart of a first embodiment of a method for voice announcement according to the present invention
  • FIG. 2 is a schematic diagram of a mapping table of correspondence between text information and corpus tags according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a correspondence mapping table of three types of text information, a corpus label, and a corpus number according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a correspondence table between a corpus label and a corpus number according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a second embodiment of a method for voice announcement according to the present invention.
  • FIG. 6 is a flowchart of a third embodiment of a method for voice announcement according to the present invention.
  • FIG. 7 is a flowchart of a fourth embodiment of a method for voice announcement according to the present invention.
  • FIG. 8 is a schematic diagram of a first component structure of an apparatus for voice broadcast according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a second component structure of an apparatus for voice broadcast according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a first embodiment of a method for voice announcement according to the present invention. As shown in FIG. 1, the method includes:
  • Step 100 Perform real-life recording on the text information that needs to be voiced to generate a corresponding corpus file.
  • the corpus can be created by using the generated corpus file, and the corpus can be saved to the server.
  • each corpus file needs to be numbered, and each corpus file corresponds to a unique corpus number.
  • corpus number of the corpus file For: J-001.mp3, the corpus number of Joke 2 is: J-002.mp3, and the corpus number of joke n is: J-00n.mp3.
  • the corpus number of the weather information m is: w-00m.mp3
  • the corpus number of the news information i is: news-00i.mp3, where the values of n, m, and i are integers greater than 0.
  • Step 101 The server generates a data packet of the voice broadcast content, and sends the data packet to the terminal.
  • the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information.
  • the server generates a data packet of the voice broadcast content
  • the method includes: the server acquires the text information corresponding to the voice broadcast content, and obtains, in a mapping table for indicating the correspondence between the text information and the corpus identification information, the content corresponding to the voice broadcast content.
  • the corpus identification information corresponding to the text information, and the data packet of the voice broadcast content is generated based on the acquired text information and the corpus identification information.
  • the corpus identification information in the data packet may be a corpus label or a corpus number.
  • the corpus tag is an intermediate identifier used to connect text information and corpus number to facilitate management and download of corpus files by the server and terminal.
  • the terminal first obtains the user voice information expressing the user's needs, and then sends the user voice information to the server; after receiving the user voice information sent by the terminal, the server uses the voice recognition technology to complete the analysis of the user voice information and extract the user.
  • the demand information the server searches for the text information of the user to be replied according to the extracted user demand information, and obtains the text information corresponding to the text information to be replied from the mapping table for indicating the correspondence between the text information and the corpus identification information. Corpus identification information.
  • the user's voice information obtained by the terminal is “tell me a joke”, and the voice information is sent to the server, and the server parses and extracts the user demand information to obtain the key message “joke”, and the server can according to the “joke”.
  • the server may obtain the corpus identification information corresponding to the replied text information according to two ways.
  • Method 1 The server determines the corpus tag corresponding to the text information according to the preset “text information and corpus tag correspondence mapping table”, and the corpus identification information is the corpus tag at this time.
  • Method 2 The server directly determines the corpus number corresponding to the text information according to the pre-set "text information, corpus label, and corpus number correspondence mapping table", and the corpus identification information is the corpus number.
  • FIG. 2 is a schematic diagram of a mapping relationship between a text information and a corpus tag according to an embodiment of the present invention.
  • a mapping table between a text information and a corpus tag corresponding to a voice broadcast is established; for example, the joke 1 is embossed.
  • the text information of the joke 1 is as follows: "Since entering the summer, I have won the favor of mosquitoes. I advise the mosquitoes to have rain! Dew! All! Dip! But mosquitoes don't listen! ⁇ Me! Just yell at me! Just yell at me!”, create a unique corpus tag voice_tag_001; through the mapping table shown in Figure 2, the server can send the text message to be replied and the corresponding corpus tag to the terminal at the same time.
  • FIG. 3 is a schematic diagram of mapping correspondence between text information, corpus labels, and corpus numbers according to an embodiment of the present invention.
  • establishing mapping correspondence between text information, corpus labels, and corpus numbers for voice broadcast is established.
  • a unique corpus tag voice_tag_001 and a corpus number J-001.mp3 are created for the text information content of the joke 1; through the mapping table shown in FIG. 3, the server can directly determine the corpus number corresponding to the corpus file that needs to be voice broadcasted.
  • the correspondence mapping tables given in Mode 1 and Mode 2 are all stored in the server, so that the server searches for the corpus identification information corresponding to the replied text information, and generates a data packet.
  • Step 102 After receiving and parsing the data packet, the terminal obtains corpus identification information; and obtains a corpus file corresponding to the obtained corpus identification information.
  • the server obtains the corpus identification information in the manner 1 in step 101
  • the corpus identification information in the data packet received by the terminal is a corpus label
  • the terminal needs to be based on the predetermined corpus label and the corpus number.
  • the mapping table of the correspondence relationship determines the corpus number corresponding to the corpus identification information, and the terminal acquires the corresponding corpus file according to the corpus number.
  • FIG. 4 is a schematic diagram of a correspondence table between a corpus label and a corpus number according to an embodiment of the present invention.
  • a corpus number of each corpus file corresponds to a unique corpus label; for example, a language
  • the corpus number corresponding to the material tag voice_tag_001 is J-001.mp3, so the corresponding corpus number can be found through the corpus tag.
  • the “corpusage label and corpus number correspondence mapping table” given in FIG. 4 is stored in the terminal, so that the terminal can determine the corpus number according to the table when the corpus label is obtained.
  • the server obtains the corpus identification information by using the method 2 in step 101, the corpus identification information in the data packet received by the terminal is a corpus number, and the terminal can directly obtain the corresponding corpus file according to the corpus number.
  • the server may use mode 1 to perform corpus file playback management, that is, the server obtains corpus identification information according to the mapping table shown in FIG. 2, that is, a corpus tag; and the terminal stores the corpus according to the local storage.
  • the corpus label and the corpus number correspondence map (as shown in FIG. 4) obtain the corresponding corpus number.
  • the corpus file playback management can be performed by means of mode 2, that is, by updating " The text information, the corpus label and the corpus number three correspondence mapping table are used to realize the management of the latest extended corpus file, so that when the terminal needs the latest extended corpus file, the server can directly determine the corresponding corpus number and send it to the terminal, the terminal
  • the corpus number is used to obtain the latest extended corpus file; when it is necessary to delete the outdated corpus file, only the "text information, corpus label, and corpus number correspondence mapping table" in the update server is used. Therefore, when the mode 2 is used to manage the current hot corpus file, the update operation of the local application of the terminal is omitted, and the user is prevented from obtaining the latest corpus file when the user refuses to upgrade the local application, thereby improving the user experience.
  • the server needs to synchronously update the terminal local application when expanding the corpus; if the server only uses the mode 2 to perform the corpus file playback management, the server performs the corpus on the corpus.
  • the server has a large amount of management of the "text information, corpus label, and corpus number mapping map", which causes waste of resources; when the terminal demands real-time update of the corpus file
  • the server can use the combination of mode 1 and mode 2 to manage the corpus file playback. Therefore, in the embodiment of the present invention, the corpus file play management method can be flexibly selected according to the actual needs of the terminal.
  • the acquiring, by the terminal, the corpus file corresponding to the obtained corpus identification information may include: when the corpus file corresponding to the corpus identification information exists in the terminal, the corpus file corresponding to the corpus identification information is obtained from the terminal; When there is a corpus file corresponding to the corpus identification information, the corpus file corresponding to the corpus identification information is downloaded from the server according to the corpus identification information.
  • the terminal saves the corpus file downloaded from the server in the local storage.
  • the terminal obtains the corpus number corresponding to the corpus file requested by the user, first finds whether there is a corpus file matching the corpus number locally, and if so, the terminal Obtained directly from the local; if not, the terminal downloads from the server.
  • the corpus file with a higher frequency of the user can be stored locally in the terminal, which avoids the need for the user to download from the server every time the search is performed, which saves the operation steps, saves the traffic, and improves the user experience.
  • Step 103 The terminal plays the corpus file.
  • the terminal may also delete the corpus file, and the manner of deleting the corpus file may be as follows:
  • Manner 1 When the data size of the corpus file saved by the terminal is greater than or equal to the data size threshold, the terminal deletes the corpus file according to the preset corpus file deletion policy.
  • Mode 2 The terminal deletes the corpus file according to the preset corpus file deletion policy every fixed period of time.
  • the preset corpus file deletion strategy may include: deleting all corpus files in the terminal, deleting according to the corpus file proportion, or deleting according to the corpus file use frequency.
  • Deletion according to the proportion of corpus files can include: delete Or A corpus file, M is the total number of corpus files, X is the percentage of the deleted corpus file, and X is a positive number less than or equal to 100. For example, X can take 50. When the total number of corpus files in the terminal is 45, the product of 45 times 50% is rounded up or down, and the rounded value is taken as the number of deletions.
  • the deleted corpus file can be random. It can also be specified by the terminal.
  • the frequency deletion according to the corpus file may include: deleting P corpus files with the lowest frequency of use, and P is an integer greater than 0.
  • a data size threshold for the terminal to save the corpus file such as 20M.
  • the trigger point for the terminal to delete the corpus file includes: when the terminal completes the text to speech (TTS) broadcast. After each TTS broadcast is completed, the terminal first checks the data size of the saved corpus file, and determines whether the data size of the saved corpus file is greater than or equal to the data size threshold. If yes, the terminal deletes the corpus file according to the preset corpus file deletion policy; If no, the terminal does not perform the delete operation.
  • TTS text to speech
  • the actions of the terminal accessing the corpus file and deleting the corpus file are performed synchronously. There is no concurrent operation of the corpus file, and the processing flow is simple; however, the corpus file is determined after each TTS broadcast. Data size, once the corpus file operation is deleted, the time consumption of the deletion operation will affect the speed of the TTS broadcast corpus file and reduce the user experience.
  • the terminal can set a timer locally, and the timing can be set according to the actual situation.
  • the trigger point for the terminal to delete the corpus file includes: the timer is timed to arrive. After each timer expires, the terminal deletes the corpus file according to the preset corpus file deletion policy. Here, the terminal can start the timer after downloading the first corpus file.
  • mode 2 When mode 2 is used to delete the corpus file, the action of the terminal accessing the corpus file and deleting the corpus file is performed separately, and does not affect the original corpus playback process, and the user experience is not affected; however, the concurrency of corpus file access and deletion needs to be processed.
  • the problem is that the complexity of the code is increased, and if the corpus file is not used for a long time, the terminal may clear all the corpus files, and the terminal can only re-download from the cloud when using the corpus file again, which not only increases the consumption of data traffic, but also increases the number of files. Download time.
  • a real person recording is performed on the text information that needs to be voice broadcasted to generate a corresponding corpus file; the server generates a data packet of the voice broadcast content, and sends the data packet to the terminal; the data packet includes: a text corresponding to the voice broadcast content The corpus identification information corresponding to the information and the text information; after receiving and parsing the data packet, the terminal obtains corpus identification information; acquires a corpus file corresponding to the obtained corpus identification information; and the terminal plays the corpus file.
  • an emotional voice broadcast can be provided according to different application environments to enhance the user experience.
  • the method for determining the corpus identification information by using the mode 1 disclosed in the first embodiment is further illustrated.
  • FIG. 5 is a flowchart of a second embodiment of a method for voice broadcast according to the present invention, the method includes:
  • Step 500 Perform real-life recording on the text information that needs to be voiced to generate a corresponding corpus file, build a corpus using the generated corpus file, and upload the corpus to the server.
  • mapping table 1 a "text information and corpus tag correspondence mapping table" as shown in FIG. 2, which is used to indicate the text information of the voice broadcast content to be replied by the server. Corresponding corpus label. Upload the completed mapping table 1 to the server.
  • Step 501 The terminal acquires voice information of the user and sends the voice information to the server.
  • the terminal itself may receive user voice information, and the terminal may also obtain voice information input by the user through an application having a voice search function.
  • Step 502 The server saves the corpus and the mapping table 1, receives the user voice information, and generates a data packet of the voice broadcast content, and the server sends the data packet to the terminal.
  • the server uses the voice recognition technology to complete the analysis of the user voice information and extract the user demand information, and then searches for the text information of the user to be replied according to the extracted user demand information.
  • the server can find the corpus tag corresponding to the text information in the mapping table 1, and the server encapsulates the searched text information and the corresponding corpus tag in the In the packet. If there is no corresponding corpus file in the text information searched by the server, the corresponding corpus label does not exist in the text information, and the server encapsulates the searched text information in the data packet.
  • Step 503 The terminal receives and parses the data packet.
  • the terminal After receiving and parsing the data packet, the terminal obtains the text information corresponding to the user requirement, and may also contain the corresponding corpus label.
  • Step 504 Determine whether a corpus label is included in the data packet. If yes, go to step 505; if no, go to step 508.
  • step 508 if the data packet received by the terminal does not include a corpus label, the information required by the user is not subjected to live-action emotional recording, and there is no corpus file and corpus label, and step 508 is performed; if the terminal receives the data packet Contains a corpus label indicating the presence in the server corpus A corpus file corresponding to user demand information.
  • Step 505 The terminal acquires a corresponding corpus file according to the corpus tag.
  • the terminal can not directly find the corresponding corpus file through the obtained corpus label, and needs to be based on the pre-made "tag label and corpus number correspondence mapping table" as shown in FIG. 4 (hereinafter referred to as "mapping table 2"). ), first determine the corpus number corresponding to the corpus tag.
  • the terminal obtains the corresponding corpus file by using the determined corpus number.
  • the terminal locally searches for the corpus file that was previously downloaded and stored. If not, the terminal sends a corpus number to the server to download the corresponding corpus file.
  • Step 506 Determine whether the corpus file is successfully obtained. If yes, go to step 507; if no, go to step 508.
  • step 507 If the terminal obtains the corresponding corpus file from the local device, or downloads the corpus file from the server, the corpus file is delivered to the voice playing module, and then step 507 is performed; if the terminal cannot obtain the corpus file locally, and the slave server cannot successfully download the corresponding file. For the corpus file, go to step 508.
  • Step 507 Play the corpus file.
  • the voice play module plays the corpus file according to the user's needs. For example, after the terminal prompts the user to play the corpus file, the user can choose to play the corpus file immediately or play it later.
  • Step 508 The terminal synthesizes and plays the received text information.
  • the terminal uses the voice synthesis method to perform speech synthesis and play on the received text information.
  • Step 509 Delete the corpus file.
  • the deletion operation is performed by deleting the corpus file mode 1 disclosed in the first embodiment, the deletion operation is triggered after the execution of the step 508 is completed. If the delete operation is performed by deleting the corpus file mode 2, the deleted trigger point is determined according to the set fixed time period, and the execution time of step 509 is not limited at this time.
  • a third embodiment of the present invention proposes a method for voice announcement.
  • FIG. 6 is a flowchart of a third embodiment of a method for voice broadcast according to the present invention. As shown in FIG. 6, the method includes:
  • Step 600 Perform real-life recording on the text information that needs to be voice broadcasted to generate a corresponding corpus file.
  • the corpus established by using the generated corpus file is saved.
  • Step 601 Generate a data packet of the voice broadcast content, where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information.
  • the generating the data packet of the voice broadcast content may further include: acquiring the text information corresponding to the voice broadcast content, and acquiring, in the mapping table for indicating the correspondence between the text information and the corpus identification information, The corpus identification information corresponding to the text information corresponding to the voice broadcast content, and the data packet of the voice broadcast content is generated based on the acquired text information and the corpus identification information.
  • the corpus identification information in the data packet may include: a corpus label or a corpus number.
  • Step 602 Send the data packet to the terminal, so that the terminal plays the corresponding corpus file based on the data packet.
  • the fourth embodiment of the present invention proposes another method for voice announcement.
  • FIG. 7 is a flowchart of a fourth embodiment of a method for voice broadcast according to the present invention. As shown in FIG. 7, the method includes:
  • Step 700 Perform real-life recording on the text information that needs to be voice broadcasted to generate a corresponding corpus file.
  • Step 701 Receive a data packet of the voice broadcast content, where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information.
  • the corpus identification information in the data packet may include: a corpus label or a corpus number.
  • determining a corpus number corresponding to the corpus identification information according to a predetermined mapping table for indicating a correspondence between the corpus label and the corpus number; correspondingly, obtaining and The corpus file corresponding to the obtained corpus identification information includes: obtaining a corpus file corresponding to the determined corpus number.
  • Step 702 Parse the data packet to obtain corpus identification information; and obtain a corpus file corresponding to the obtained corpus identification information.
  • the obtaining the corpus file corresponding to the obtained corpus identification information may include: when the corpus file corresponding to the corpus identification information exists locally, the corpus file corresponding to the corpus identification information is locally obtained; the local does not exist and the corpus file When the corpus file corresponding to the corpus identification information is used, the corpus file corresponding to the corpus identification information is downloaded from the server according to the corpus identification information.
  • it also includes: saving the acquired corpus file.
  • the method further includes: when the data size of the saved corpus file is greater than or equal to the data size threshold, deleting the corpus file according to the preset corpus file deletion policy; or deleting the policy according to the preset corpus file every fixed time period Delete the corpus file.
  • the preset corpus file deletion strategy may include: deleting all the corpus files in the local area, deleting according to the proportion of the corpus file, or deleting according to the frequency of use of the corpus file.
  • Step 703 Play the corpus file.
  • FIG. 8 is a first schematic structural diagram of a device for voice broadcast according to an embodiment of the present invention. As shown in FIG. 8, the device includes: a first recording module 800, a generating module 801, and a sending module 802.
  • the first recording module 800 is configured to perform real-life recording on the text information that needs to be voice-recorded to generate a corresponding corpus file.
  • the generating module 801 is configured to generate a data packet of the voice broadcast content, where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information.
  • the sending module 802 is configured to send the data packet to the terminal, so that the terminal plays the corresponding corpus file based on the data packet.
  • the generating module 801 can be configured to obtain text information corresponding to the content of the voice broadcast, in advance a mapping table for indicating a correspondence between the text information and the corpus identification information, acquiring corpus identification information corresponding to the text information corresponding to the voice broadcast content, and generating the voice broadcast content based on the acquired text information and the corpus identification information data pack.
  • the corpus identification information in the data packet may include: a corpus label or a corpus number.
  • the generating module 801 may be further configured to save the corpus established by using the generated corpus file after generating the corresponding corpus file.
  • FIG. 9 is a second schematic structural diagram of a device for voice broadcast according to an embodiment of the present invention. As shown in FIG. 9, the device includes: a second recording module 900, a receiving module 901, and a processing module 902;
  • the second recording module 900 is configured to generate a corresponding corpus file for the real person recording of the text information that needs to be voice broadcasted.
  • the receiving module 901 is configured to receive a data packet of the voice broadcast content, where the data packet includes: text information corresponding to the voice broadcast content and corpus identification information corresponding to the text information.
  • the processing module 902 is configured to parse the data packet to obtain corpus identification information, acquire a corpus file corresponding to the derived corpus identification information, and play the corpus file.
  • the corpus identification information in the data packet may include: a corpus label or a corpus number.
  • the processing module 902 may be configured to determine the corpus number corresponding to the corpus identification information according to a predetermined mapping table for indicating a correspondence between the corpus label and the corpus number when the corpus identification information is a corpus label.
  • the processing module 902 can be configured to acquire a corpus file corresponding to the determined corpus number.
  • the processing module 902 may be configured to acquire a corpus file corresponding to the corpus identification information from the device when a corpus file corresponding to the corpus identification information exists in the device.
  • the corpus file corresponding to the corpus identification information is downloaded from the server according to the corpus identification information.
  • the processing module 902 can also be configured to save the acquired corpus file.
  • the processing module 902 may be configured to delete the corpus file according to the preset corpus file deletion policy when the data size of the saved corpus file is greater than or equal to the data size threshold; or, according to the preset corpus file, every fixed time period Delete the policy to delete the corpus file.
  • the preset corpus file deletion strategy includes: deleting all corpus files in the device, deleting according to the proportion of the corpus files, or deleting according to the frequency of use of the corpus files.
  • the first recording module 800, the generating module 801, the sending module 802, the second recording module 900, the receiving module 901, and the processing module 902 can all be configured by a central processing unit (CPU) located in the terminal device.
  • CPU central processing unit
  • MPU Micro Processor Unit
  • DSP Digital Signal Processor
  • FPGA Field Programmable Gate Array
  • the embodiment of the invention further provides a computer readable storage medium storing computer executable instructions, which are implemented by the processor to implement the method described in the foregoing embodiments.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • These computer program instructions can also be stored in a bootable computer or other programmable voice broadcast device.
  • a computer readable memory that operates in a particular manner, causing instructions stored in the computer readable memory to produce an article of manufacture comprising an instruction device implemented in one or more flows and/or block diagrams of the flowchart The function specified in the box or in multiple boxes.
  • These computer program instructions can also be loaded onto a computer or other programmable voice broadcast device such that a series of operational steps are performed on a computer or other programmable device to produce computer implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the above technical solution realizes that when the voice broadcast is broadcast, an emotional voice broadcast can be provided according to different application environments, thereby improving the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé et un dispositif de diffusion vocale, le procédé consistant : à effectuer un enregistrement en personne réelle d'informations textuelles qui doivent être diffusées par voix, pour générer un fichier de corpus correspondant (600) ; à générer un paquet de données de contenus de diffusion vocale, le paquet de données comprenant : des informations textuelles correspondant aux contenus de diffusion vocale et des informations d'identifiant de corpus correspondant aux informations textuelles (601) ; et à envoyer le paquet de données à un terminal, de telle sorte que le terminal lit, sur la base du paquet de données, le fichier de corpus correspondant (602).
PCT/CN2017/084581 2016-10-27 2017-05-16 Procédé et dispositif de diffusion vocale WO2018076664A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610969867.7A CN107995249B (zh) 2016-10-27 2016-10-27 一种语音播报的方法和装置
CN201610969867.7 2016-10-27

Publications (1)

Publication Number Publication Date
WO2018076664A1 true WO2018076664A1 (fr) 2018-05-03

Family

ID=62023030

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/084581 WO2018076664A1 (fr) 2016-10-27 2017-05-16 Procédé et dispositif de diffusion vocale

Country Status (2)

Country Link
CN (1) CN107995249B (fr)
WO (1) WO2018076664A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110797014A (zh) * 2018-07-17 2020-02-14 中兴通讯股份有限公司 一种语音识别方法、装置及计算机存储介质
CN113110819A (zh) * 2019-04-15 2021-07-13 创新先进技术有限公司 语音播报方法及装置
CN116405801A (zh) * 2023-05-31 2023-07-07 中瑞科技术有限公司 一种可预警播报的对讲机系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110930999A (zh) * 2018-09-19 2020-03-27 上海博泰悦臻电子设备制造有限公司 语音互动方法、装置及车辆
CN109448694A (zh) * 2018-12-27 2019-03-08 苏州思必驰信息科技有限公司 一种快速合成tts语音的方法及装置
CN110017847B (zh) * 2019-03-21 2021-03-16 腾讯大地通途(北京)科技有限公司 一种自适应导航语音播报方法、装置及系统
US10990939B2 (en) 2019-04-15 2021-04-27 Advanced New Technologies Co., Ltd. Method and device for voice broadcast

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101094445A (zh) * 2007-06-29 2007-12-26 中兴通讯股份有限公司 一种实现文本短信语音播放的系统及方法
CN101110861A (zh) * 2006-07-18 2008-01-23 中兴通讯股份有限公司 一种在智能网中播放文本语音的系统和方法
US20090313022A1 (en) * 2008-06-12 2009-12-17 Chi Mei Communication Systems, Inc. System and method for audibly outputting text messages
CN101763878A (zh) * 2008-11-21 2010-06-30 北京搜狗科技发展有限公司 语音文件插播方法及装置
CN102055923A (zh) * 2009-11-06 2011-05-11 深圳Tcl新技术有限公司 具备语音播报功能的电视机及其实现方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750365B (zh) * 2012-06-14 2014-09-03 华为软件技术有限公司 即时语音消息的检索方法和系统,以及用户设备和服务器
JP5753869B2 (ja) * 2013-03-26 2015-07-22 富士ソフト株式会社 音声認識端末およびコンピュータ端末を用いる音声認識方法
CN103581857A (zh) * 2013-11-05 2014-02-12 华为终端有限公司 一种语音提示的方法、语音合成服务器及终端
CN104899002A (zh) * 2015-05-29 2015-09-09 深圳市锐曼智能装备有限公司 机器人基于对话预测的在线与离线的识别切换方法及系统
CN104882143A (zh) * 2015-05-31 2015-09-02 深圳市启明创新科技开发有限公司 一种云智能学习机器人
CN105551493A (zh) * 2015-11-30 2016-05-04 北京光年无限科技有限公司 儿童语音机器人数据处理方法、装置及儿童语音机器人

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110861A (zh) * 2006-07-18 2008-01-23 中兴通讯股份有限公司 一种在智能网中播放文本语音的系统和方法
CN101094445A (zh) * 2007-06-29 2007-12-26 中兴通讯股份有限公司 一种实现文本短信语音播放的系统及方法
US20090313022A1 (en) * 2008-06-12 2009-12-17 Chi Mei Communication Systems, Inc. System and method for audibly outputting text messages
CN101763878A (zh) * 2008-11-21 2010-06-30 北京搜狗科技发展有限公司 语音文件插播方法及装置
CN102055923A (zh) * 2009-11-06 2011-05-11 深圳Tcl新技术有限公司 具备语音播报功能的电视机及其实现方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110797014A (zh) * 2018-07-17 2020-02-14 中兴通讯股份有限公司 一种语音识别方法、装置及计算机存储介质
CN110797014B (zh) * 2018-07-17 2024-06-07 中兴通讯股份有限公司 一种语音识别方法、装置及计算机存储介质
CN113110819A (zh) * 2019-04-15 2021-07-13 创新先进技术有限公司 语音播报方法及装置
CN113110819B (zh) * 2019-04-15 2024-04-19 创新先进技术有限公司 语音播报方法及装置
CN116405801A (zh) * 2023-05-31 2023-07-07 中瑞科技术有限公司 一种可预警播报的对讲机系统
CN116405801B (zh) * 2023-05-31 2023-09-08 中瑞科技术有限公司 一种可预警播报的对讲机系统

Also Published As

Publication number Publication date
CN107995249A (zh) 2018-05-04
CN107995249B (zh) 2021-01-26

Similar Documents

Publication Publication Date Title
WO2018076664A1 (fr) Procédé et dispositif de diffusion vocale
US10097884B2 (en) Media playback method, client and system
CN104025548B (zh) 用于被递送媒体的灵活缓存的方法和设备
US20160240195A1 (en) Information processing method and electronic device
CN105681821B (zh) 一种音频的播放方法、播放系统及服务器
CN109147802B (zh) 一种播放语速调节方法及装置
WO2019007308A1 (fr) Procédé et dispositif de diffusion vocale
US9454342B2 (en) Generating a playlist based on a data generation attribute
US20150317699A1 (en) Method, apparatus, device and system for inserting audio advertisement
CN107248415A (zh) 一种闹钟铃声生成的方法、装置及用户终端
CN104732989B (zh) 智能音乐播放控制系统及其控制方法
US20150255055A1 (en) Personalized News Program
CN105354293A (zh) 一种移动终端上进行播放对象推送的辅助实现方法及装置
US20210200933A1 (en) Computing device and corresponding method for generating data representing text
CN110347848A (zh) 一种演示文稿管理方法及装置
CN105592232B (zh) 一种歌词的同步方法及装置
CN105897854A (zh) 移动终端闹钟响应方法、装置及系统
WO2020024508A1 (fr) Procédé et appareil d'obtention d'informations vocales
CN110381097B (zh) 一种语音分享音频的方法、系统及车载终端
US10957304B1 (en) Extracting content from audio files using text files
CN115331695A (zh) 音频处理方法、计算机设备、存储介质和程序产品
CN115230724A (zh) 交互方法、电子设备及计算机存储介质
CN118471208A (zh) 语义解析方法、装置和可读存储介质
CN111339348A (zh) 信息服务方法、装置和系统
CN112217854A (zh) 一种支持云端快速响应的点读系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17865496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17865496

Country of ref document: EP

Kind code of ref document: A1