CN111459445A - Webpage end audio generation method and device and storage medium - Google Patents

Webpage end audio generation method and device and storage medium Download PDF

Info

Publication number
CN111459445A
CN111459445A CN202010127254.5A CN202010127254A CN111459445A CN 111459445 A CN111459445 A CN 111459445A CN 202010127254 A CN202010127254 A CN 202010127254A CN 111459445 A CN111459445 A CN 111459445A
Authority
CN
China
Prior art keywords
audio
text
segmented
stream
output stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010127254.5A
Other languages
Chinese (zh)
Inventor
郁霖
雷欣
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenwen Intelligent Information Technology Co ltd
Original Assignee
Wenwen Intelligent Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenwen Intelligent Information Technology Co ltd filed Critical Wenwen Intelligent Information Technology Co ltd
Priority to CN202010127254.5A priority Critical patent/CN111459445A/en
Publication of CN111459445A publication Critical patent/CN111459445A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of audio processing, and discloses a webpage end audio generation method, which is used for converting a text of a webpage end into an audio capable of being played at the webpage end and comprises the following steps: receiving text information and sending the text information to a text-to-speech server; receiving a plurality of segmented audio streams corresponding to text information and fed back by a text-to-speech server in a segmented manner; constructing an audio output stream; and inputting waveform audio file format wav header information in an audio output stream, and sequentially inputting a plurality of segmented audio streams. According to the invention, after the text information of the webpage end is received, the audio output stream is newly established, the wav header information is input in the audio output stream, and the segmented audio stream converted by adopting the voice-to-text server is sequentially input, so that the audio stream can be directly played as the wav format audio at the webpage end, a user at the webpage end can listen to the high-quality audio, the waiting time for audio generation is reduced, and in addition, the situation that a PCM player is arranged at the webpage end is avoided.

Description

Webpage end audio generation method and device and storage medium
Technical Field
The invention relates to the technical field of audio processing, in particular to a webpage end audio generation method, a webpage end audio generation device and a webpage end audio generation storage medium.
Background
The TTS (Text To Speech) technology is widely used for online Speech generation and playing, and has wider application requirements from phrase generation To article reading, for example, converting a webpage end Text into Audio for playing, the technical application of the TTS in the aspect of phrase generation is mature, but for the processing of a long article, the TTS needs To transmit the generated Audio To a webpage end after the processing of the long article is completed, the conversion of the long article from the Text To the Audio needs To be completed, and the time problem of online waiting for Audio generation of a webpage end user needs To be considered.
Disclosure of Invention
In order to solve or at least partially solve the technical problem, embodiments of the present invention provide a method and an apparatus for generating a webpage-side audio.
According to a first aspect of the embodiments of the present invention, there is provided a method for generating a webpage-side audio, which is used to convert a text of a webpage side into an audio that can be played at the webpage side, the method including: receiving text information and sending the text information to a text-to-speech server; receiving a plurality of segmented audio streams corresponding to the text information and fed back by the text-to-speech server in a segmented manner; constructing an audio output stream; and inputting waveform audio file format wav header information into the audio output stream, and sequentially inputting a plurality of the segmented audio streams.
Preferably, the text information is received and sent to the text-to-speech server, both of which are transmitted using the hypertext transfer protocol HTTP.
Preferably, said inputting waveform audio file format wav header information in said audio output stream includes: monitoring whether the segmented audio stream is received for the first time; and inputting waveform audio file format wav header information in the audio output stream when the segmented audio stream is received for the first time.
Preferably, the monitoring whether the segmented audio stream is received for the first time includes: monitoring whether the current state of the audio output stream is empty; and confirming that the segmented audio stream is received for the first time when the current state of the audio output stream is empty.
Preferably, the segmented audio stream is an audio stream in a pulse code modulation, PCM, format.
According to a second aspect of the embodiments of the present invention, there is also provided a web page end audio generating apparatus, including: the text transmission module is used for receiving text information and sending the text information to the text-to-speech server; the receiving module is used for receiving a plurality of segmented audio streams which are returned by the text-to-speech server in a segmented mode and correspond to the text information; the building module is used for building an audio output stream; and the audio transmission module is used for inputting waveform audio file format wav header information in the audio output stream and sequentially inputting a plurality of the segmented audio streams.
Preferably, the text transmission module includes: the text receiving submodule is used for receiving text information by adopting a hypertext transfer protocol (HTTP); and the text sending submodule is used for sending the text information to a text-to-speech server by adopting a hypertext transfer protocol (HTTP).
Preferably, the audio transmission module includes: a monitoring submodule for monitoring whether the segmented audio stream is received for the first time; and the transmission submodule is used for inputting waveform audio file format wav header information into the audio output stream when the segmented audio stream is received for the first time.
According to a third aspect of the embodiments of the present invention, an embodiment of the present invention further provides a machine-readable storage medium, where instructions are stored on the machine-readable storage medium, and the instructions are configured to enable the machine-readable storage medium to execute the above-mentioned webpage-side audio generating method.
According to a fourth aspect of the embodiments of the present invention, there is also provided an apparatus, including at least one processor, and at least one memory and a bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the web page side audio generation method of any one of claims 1-5.
Through the technical scheme, after the text information of the webpage end is received, the audio output stream is newly built, the wav head information is input into the audio output stream, and the segmented audio stream converted by the voice-to-text server is sequentially input, so that the audio stream can be directly played as the wav-format audio at the webpage end, a user at the webpage end can listen to the high-quality audio, the waiting time for audio generation is reduced, and in addition, the situation that a PCM player is arranged at the webpage end is avoided.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the figure:
fig. 1 is a flowchart illustrating a method for generating webpage-side audio according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a specific application example of the method for generating webpage-side audio according to the embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a component structure of a web page-side audio generating apparatus according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a composition structure of a text transmission module according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a structure of an audio transmission module according to an embodiment of the present invention.
Description of the reference numerals
301. Text transmission module 302 and receiving module
303. Construction module 304 and audio transmission module
3011. Text receiving submodule 3012 and text sending submodule
3041. Monitoring submodule 3042, transmission submodule
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given only to enable those skilled in the art to better understand and to implement the present invention, and do not limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.
Fig. 1 shows a flowchart of a method for generating webpage-side audio according to an embodiment of the present invention.
Referring to fig. 1, a method for generating a webpage end audio according to an embodiment of the present invention is used to convert a text of a webpage end into an audio that can be played at the webpage end, and may include the following steps:
and S100, receiving the text information and sending the text information to a text-to-speech server.
Specifically, the web page back-end server receives the Text information sent by the web page and sends the received Text information To a TTS (Text To Speech) server.
In the embodiment of the invention, the text information is received and sent to the text-to-speech server, and the text information is transmitted by adopting a hypertext transfer protocol (HTTP). HTTP refers to hypertext transfer protocol, which is a stateless protocol. The client sends a request once, the server receives the request, the request is processed and returned to the client, and then the link between the client and the server is disconnected. For example: when the web page client sends the text information to the web page back end server by the HTTP protocol, the web page client waits for the response of the web page back end server to the text information. And the web page back-end server sends the text information to the TTS server and waits for the response of the TTS server to the text.
S200, receiving a plurality of segmented audio streams corresponding to the text information, which are segmented and returned by the text-to-speech server.
And the web page back-end server is used for converting the text information into audio by the TTS server after sending the text information to the TTS server. If the whole text information is completely converted into audio and then transmitted to the webpage back-end server and then sent to the webpage end for playing, long waiting time can be brought to the webpage end user. Therefore, after receiving the text message, the TTS server returns the text message to the web page backend server every time the text message with a set length is converted.
For example, in the case of a long article, the TTS server may transmit the converted audio to the web page backend server after each conversion of a natural segment of the article. Or after the text with the set number of characters is converted into audio, the converted audio can be transmitted to the webpage back-end server.
And S300, constructing an audio output stream.
Specifically, the web page back-end server may construct the audio output stream while sending the text information to the TTS server. The audio output stream may also be constructed when the segmented audio stream returned by the TTS server is received for the first time.
S400, inputting waveform audio file format wav header information in the audio output stream.
Specifically, currently, the web page side only supports the playing of mp3 and wav format audio. However, wav format audio has header information for each wav audio, and cannot achieve continuous streaming for a plurality of audio. If a plurality of audio frequencies are played continuously in the mp3 format, excessive silent audio frequencies are introduced, which causes jitter in the middle of splicing each audio stream, and the playing effect is poor. Therefore, in order to make the user experience better, the best method is to use PCM audio playing. However, the web page side does not support direct playing of audio in PCM format, and a special PCM audio player needs to be deployed. In order to overcome the problems, the invention inputs wav header information in the newly-built audio output stream, so that the webpage end plays the audio output stream as the audio in the wav format.
In one embodiment of the invention, inputting waveform audio file format wav header information in said audio output stream is achieved by: first, it is monitored whether a segmented audio stream is received from a TTS server for the first time, and waveform audio file format wav header information is input in an audio output stream when the segmented audio stream is received for the first time.
In particular, it is possible to monitor whether the current status of the audio output stream is empty; and when the current state of the audio output stream is empty, confirming that the webpage back-end server receives the segmented audio stream returned by the TTS server for the first time.
For example, the wav header information consists of fixed 44 bytes of data, which is described in detail below:
1:
00-034 bytes 'RIFF' resource exchange file mark
header[0]='R';
header[1]='I';
header[2]='F';
header[3]='F';
2:
04-074 bytes size-8 bytes (total number of bytes from the next byte to the end of the file)
header[4]=(char)((file_size-8)&0xff);
header[5]=(char)(((file_size-8)>>8)&0xff);
header[6]=(char)(((file_size-8)>>16)&0xff);
header[7]=(char)(((file_size-8)>>24)&0xff);
3:
08-114 byte wave-file mark
header[8]='W';
header[9]='A';
header[10]='V';
header[11]='E';
4:
12-154 byte "fmt" waveform format mark, last one space
header[12]='f';
header[13]='m';
header[14]='t';
header[15]=”;
5:
16 to 194 bytes filter (in general 00000010H)
header[16]=16;
header[17]=0;
header[18]=0;
header[19]=0;
6:
20-212 byte format type (when the value is 1, the data is linear PCM code)
header[20]=1;
header[21]=0;
7:
22-232 byte channels, the single channel is 1, the double channel is 2
header[22]=(char)channel;
header[23]=0;
8:
Sampling rate of 24-274 bytes
header[24]=(char)(sample_rate&0xff);
header[25]=(char)((sample_rate>>8)&0xff);
header[26]=(char)((sample_rate>>16)&0xff);
header[27]=(char)((sample_rate>>24)&0xff);
9:
28 to 314 Byte bit rate (Byte rate: sampling frequency: number of audio channels: number of samples per sample/8)
header[28]=(char)(bit_rate&0xff);
header[29]=(char)((bit_rate>>8)&0xff);
header[30]=(char)((bit_rate>>16)&0xff);
header[31]=(char)((bit_rate>>24)&0xff);
10:
The data block length is 32-332 bytes (the number of bytes of each sample is equal to the number of channels and the number of sample bits obtained by each sampling/8).
header[32]=(char)(channel*sample_bit/8);
header[33]=0;
11:
34-352 bytes per sample point.
header[34]=(char)sample_bit;
header[35]=0;
12:
36-394 bytes of data identifier.
header[36]='d';
header[37]='a';
header[38]='t';
header[39]='a';
13:
40-434 byte PCM audio data size
header[40]=(char)(data_size&0xff);
header[41]=(char)((data_size>>8)&0xff);
header[42]=(char)((data_size>>16)&0xff);
header[43]=(char)((data_size>>24)&0xff);
S500, sequentially inputting a plurality of segmented audio streams.
Specifically, the TTS server converts the text information into a plurality of segmented audio streams, and then sequentially transmits the audio streams to the web page backend server, and the web page backend server sequentially inputs the received audio streams into an audio output stream to be played at the web page.
The time for the TTS server to convert the text information into the audio is far shorter than the playing time of the audio at the webpage end. Therefore, after the wav header information and the first segment of the audio stream are input in the audio output stream, the webpage end can play the first segment of audio, and before the first segment of audio is played, the TTS second segment of audio can convert the second segment of audio completely and transmit the second segment of audio to the webpage back-end server. And analogizing in sequence, before the previous section of audio is played, the next adjacent section of audio is converted and transmitted to the webpage back-end server. Therefore, the webpage end can play the audio output stream as a complete wav format audio without the phenomena of jitter, discontinuity and the like.
In the embodiment of the present invention, the segmented audio stream uses an audio stream in a pulse code modulation PCM format, but the present invention is not limited thereto, and the segmented audio stream may use any suitable audio format.
It should be noted that, in the embodiment of the present invention, the web page backend server may be implemented based on one or more of the following programming languages: JAVA, python, c + +, c #, php to realize the webpage-side audio generation method. However, the present invention is not limited to this, and the web backend server may implement the web end audio generation method based on any other suitable language.
Fig. 2 is a flowchart illustrating a specific application example of the method for generating webpage-side audio according to the embodiment of the present invention.
Referring to fig. 2, the method for generating webpage-side audio in this application example may include the following steps: s1, the webpage end sends text information to a webpage back-end server; s2, the webpage back-end server sends the text information to the TTS server; s3, constructing an audio output stream by the webpage back-end server; s4, the TTS server returns PCM audio stream in a segmented mode; s5, monitoring whether the PCM audio stream is returned for the first time; s6, inputting waveform audio file format wav header information in an audio output stream when receiving a segmented audio stream returned by a TTS server for the first time; s7, continuing to input the segmented PCM audio in the audio output stream.
Through the technical scheme, after the text information of the webpage end is received, the audio output stream is newly built, the wav head information is input into the audio output stream, and the segmented audio stream converted by the voice-to-text server is sequentially input, so that the audio stream can be directly played as the wav-format audio at the webpage end, a user at the webpage end can listen to the high-quality audio, the waiting time for audio generation is reduced, and in addition, the situation that a PCM player is arranged at the webpage end is avoided.
Based on the foregoing method for generating a webpage-side audio, an embodiment of the present invention further provides a device for generating a webpage-side audio, and fig. 3 shows a schematic diagram of a structure of the device for generating a webpage-side audio according to the embodiment of the present invention, and as shown in fig. 3, the device for generating a webpage-side audio may include: the text transmission module 301 is configured to receive text information and send the text information to a text-to-speech server; a receiving module 302, configured to receive a plurality of segmented audio streams corresponding to text information and returned by a text-to-speech server in a segmented manner; a construction module 303 for constructing an audio output stream; and an audio transmission module 304, configured to input waveform audio file format wav header information in an audio output stream, and sequentially input a plurality of segmented audio streams.
Fig. 4 is a schematic diagram illustrating a composition structure of a text transmission module according to an embodiment of the present invention, and referring to fig. 4, according to an embodiment of the present invention, a text transmission module 301 according to an embodiment of the present invention includes: the text receiving submodule 3011 is configured to receive text information using a hypertext transfer protocol HTTP; and a text sending sub-module 3012, configured to send the text information to the text-to-speech server by using a hypertext transfer protocol HTTP.
Fig. 5 is a schematic diagram illustrating a composition structure of an audio transmission module according to an embodiment of the present invention, and referring to fig. 5, according to an embodiment of the present invention, an audio transmission module 304 according to an embodiment of the present invention includes: a monitoring submodule 3041 for monitoring whether the segmented audio stream is received for the first time; and a transmitting submodule 3042 for inputting waveform audio file format wav header information in the audio output stream when the segmented audio stream is received for the first time.
For other specific implementation details and beneficial effects of the web-side audio generating device, reference is made to the above-mentioned speech recognition method, and for technical details not disclosed in the embodiment of the web-side audio generating device of the present invention, reference is made to the description of the method embodiment shown in fig. 1 to 2 of the present invention for understanding, so that details are not repeated for brevity.
Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.
The webpage-side audio generating device comprises a processor and a memory, wherein the text transmission module, the receiving module, the constructing module, the audio transmission module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more, and the technical problem to be solved by the application is solved by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium on which a program is stored, where the program, when executed by a processor, implements a web page side audio generation method.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor realizes the webpage end audio generation method when executing the program. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A webpage-side audio generating method, configured to convert a text of a webpage side into audio that can be played on the webpage side, the method comprising:
receiving text information and sending the text information to a text-to-speech server;
receiving a plurality of segmented audio streams corresponding to the text information and fed back by the text-to-speech server in a segmented manner;
constructing an audio output stream; and
and inputting waveform audio file format wav header information into the audio output stream, and sequentially inputting a plurality of segmented audio streams.
2. The method for generating webpage-side audio according to claim 1, wherein the receiving the text message and the sending the text message to the text-to-speech server are both transmitted using a hypertext transfer protocol HTTP.
3. The method for generating audio at a web site according to claim 1, wherein said inputting waveform audio file format wav header information in said audio output stream comprises:
monitoring whether the segmented audio stream is received for the first time; and
inputting waveform audio file format wav header information in the audio output stream when the segmented audio stream is received for the first time.
4. The method for generating webpage side audio according to claim 3, wherein the monitoring whether the segmented audio stream is received for the first time comprises:
monitoring whether the current state of the audio output stream is empty; and
and when the current state of the audio output stream is empty, confirming that the segmented audio stream is received for the first time.
5. The method for generating audio on a web page side according to claim 1, wherein the segmented audio stream is an audio stream in a Pulse Code Modulation (PCM) format.
6. A web page end audio generating device, wherein the web page end audio generating device comprises:
the text transmission module is used for receiving text information and sending the text information to the text-to-speech server;
the receiving module is used for receiving a plurality of segmented audio streams which are returned by the text-to-speech server in a segmented mode and correspond to the text information;
the building module is used for building an audio output stream; and
and the audio transmission module is used for inputting waveform audio file format wav header information into the audio output stream and sequentially inputting a plurality of the segmented audio streams.
7. The apparatus for generating audio on web page side according to claim 5, wherein the text transmission module comprises:
the text receiving submodule is used for receiving text information by adopting a hypertext transfer protocol (HTTP); and
and the text sending submodule is used for sending the text information to a text-to-speech server by adopting a hypertext transfer protocol (HTTP).
8. The apparatus for generating audio on web page side according to claim 5, wherein the audio transmission module comprises:
a monitoring submodule for monitoring whether the segmented audio stream is received for the first time; and
and the transmission submodule is used for inputting waveform audio file format wav header information into the audio output stream when the segmented audio stream is received for the first time.
9. A machine-readable storage medium having stored thereon instructions for enabling the machine-readable storage medium to execute the method for webpage-side audio generation according to any one of claims 1-5.
10. An apparatus comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the web page side audio generation method of any one of claims 1-5.
CN202010127254.5A 2020-02-28 2020-02-28 Webpage end audio generation method and device and storage medium Pending CN111459445A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010127254.5A CN111459445A (en) 2020-02-28 2020-02-28 Webpage end audio generation method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010127254.5A CN111459445A (en) 2020-02-28 2020-02-28 Webpage end audio generation method and device and storage medium

Publications (1)

Publication Number Publication Date
CN111459445A true CN111459445A (en) 2020-07-28

Family

ID=71684204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010127254.5A Pending CN111459445A (en) 2020-02-28 2020-02-28 Webpage end audio generation method and device and storage medium

Country Status (1)

Country Link
CN (1) CN111459445A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634857A (en) * 2020-12-15 2021-04-09 京东数字科技控股股份有限公司 Voice synthesis method and device, electronic equipment and computer readable medium
CN112765397A (en) * 2021-01-29 2021-05-07 北京字节跳动网络技术有限公司 Audio conversion method, audio playing method and device
CN114900505A (en) * 2022-04-18 2022-08-12 广州市迪士普音响科技有限公司 WEB-based audio scene timing switching method, device and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729827A (en) * 2009-12-14 2010-06-09 深圳市同洲电子股份有限公司 Voice service method, system, digital television receiving terminal and front-end device
EP2447940A1 (en) * 2010-10-29 2012-05-02 France Telecom Method of and apparatus for providing audio data corresponding to a text
CN102487461A (en) * 2010-12-02 2012-06-06 康佳集团股份有限公司 Method for reading aloud webpage on web television and device thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729827A (en) * 2009-12-14 2010-06-09 深圳市同洲电子股份有限公司 Voice service method, system, digital television receiving terminal and front-end device
EP2447940A1 (en) * 2010-10-29 2012-05-02 France Telecom Method of and apparatus for providing audio data corresponding to a text
CN102487461A (en) * 2010-12-02 2012-06-06 康佳集团股份有限公司 Method for reading aloud webpage on web television and device thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634857A (en) * 2020-12-15 2021-04-09 京东数字科技控股股份有限公司 Voice synthesis method and device, electronic equipment and computer readable medium
CN112765397A (en) * 2021-01-29 2021-05-07 北京字节跳动网络技术有限公司 Audio conversion method, audio playing method and device
WO2022160990A1 (en) * 2021-01-29 2022-08-04 北京字节跳动网络技术有限公司 Audio conversion method and apparatus, and audio playback method and apparatus
CN114900505A (en) * 2022-04-18 2022-08-12 广州市迪士普音响科技有限公司 WEB-based audio scene timing switching method, device and medium
CN114900505B (en) * 2022-04-18 2024-01-30 广州市迪士普音响科技有限公司 Audio scene timing switching method, device and medium based on WEB

Similar Documents

Publication Publication Date Title
CN111459445A (en) Webpage end audio generation method and device and storage medium
JP6981257B2 (en) Information processing equipment and information processing method
US20020103646A1 (en) Method and apparatus for performing text-to-speech conversion in a client/server environment
JP2012514381A5 (en)
JPS60256255A (en) Buffer device for digital transmission network
US20030083105A1 (en) Method and apparatus for performing text to speech synthesis
WO2017101327A1 (en) Method and device for collective playback of high-fidelity sound by several players
CN106303754A (en) A kind of audio data play method and device
CN105024764A (en) Audio-format-based file transmission method and system
CN110995577B (en) Multi-channel adaptation method and device for message and storage medium
CN108234479A (en) For handling the method and apparatus of information
CN115278456A (en) Sound equipment and audio signal processing method
CN111147359B (en) Message conversion method, device and storage medium suitable for multiple channels
CN115132213A (en) Audio data transmission method, device, chip, electronic equipment and storage medium
EP1571647A1 (en) Apparatus and method for processing bell sound
JP5533503B2 (en) COMMUNICATION DEVICE, COMMUNICATION METHOD, AND COMMUNICATION PROGRAM
JP5049310B2 (en) Speech learning / synthesis system and speech learning / synthesis method
CN112581934A (en) Voice synthesis method, device and system
CN104754400A (en) Envelope information sharing method and device based on mobile terminal
Gabrielli et al. Advancements and performance analysis on the wireless music studio (WeMUST) framework
CN107580152B (en) Voice value-added service system and communication method thereof
CN118474281A (en) Conference record generation method and device, electronic equipment and storage medium
US20080225941A1 (en) Moving picture converting apparatus, moving picture transmitting apparatus, and methods of controlling same
JP2002304196A (en) Method, program and recording medium for controlling audio signal recording, method, program and recording medium for controlling audio signal reproduction, and method, program and recording medium for controlling audio signal input
CN117577092A (en) Voice broadcasting method, mobile payment device, storage medium and computer device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination