EP1073036B1 - Analyse de documents téléchargés pour un navigateur équipé d'un dispositif de synthèse de la parole - Google Patents
Analyse de documents téléchargés pour un navigateur équipé d'un dispositif de synthèse de la parole Download PDFInfo
- Publication number
- EP1073036B1 EP1073036B1 EP00306355A EP00306355A EP1073036B1 EP 1073036 B1 EP1073036 B1 EP 1073036B1 EP 00306355 A EP00306355 A EP 00306355A EP 00306355 A EP00306355 A EP 00306355A EP 1073036 B1 EP1073036 B1 EP 1073036B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- speech
- text data
- data file
- control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000015572 biosynthetic process Effects 0.000 title 1
- 238000003786 synthesis reaction Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims description 43
- 230000002194 synthesizing effect Effects 0.000 claims description 22
- 230000001419 dependent effect Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 16
- 230000015654 memory Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003936 working memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
Definitions
- the present invention relates to a speech processing method and apparatus.
- the invention has particular, although not exclusive relevance to the retrieval of data including speech data, which comprises control data and phoneme data identifying phoneme symbols, from a remote server over a communications channel.
- Web browsing software exists for allowing users to browse web pages which are stored on servers located throughout the Internet. There are currently two techniques which allow the browsing software to output speech relevant to the web page.
- the remote server can store and send the user recorded voice data which is relevant to the web page, or the user's browser may include a text to speech synthesizer which converts the hypertext in the web page into speech, see e.g. US-A-5899975.
- the present invention provides a web browser which is operable to process web pages received from the Internet in order to determine whether or not there is an associated speech data file stored for the web page in a server on the Internet and, if there is, means for retrieving the speech data file from that server.
- the invention also provides a data retrieval system which can process received text data to determine whether or not there is an associated speech data file and, if there is, means for retrieving the speech data file a remote storage location.
- FIG. 1 is a block diagram showing an Internet browsing system embodying the present invention.
- the invention may be used in any hypertext system which includes a first and second apparatus which are connected together via a network or other communications channel and in which data from the second apparatus can be retrieved by the first apparatus and can be output as speech.
- the system comprises a client 1, a server 3, a communications network 5 for linking the client 1 to the server 3.
- the system is generally called a client server system.
- Both the client 1 and the server 3 comprise the modules illustrated in Figure 1 and operate under control of a stored program in order to carry out the process steps shown in Figure 2.
- the control program for each of the client and server may be stored in a memory (not shown) at the time of manufacture or it may be read out from a storage medium detachable from the client or server or downloaded from a remote terminal connected to the client or server via a computer network.
- step s101 an input address for a web page (which is a hypertext data file) is input by a user of the client 1.
- the input web address is stored in the address memory 110 and identifies the server which stores the web page and the file name of the web page that the client wishes to download.
- the input web address is then transmitted, in step s102, by a web address transmitting unit 101 to the server 3, where it is received, in step s151, by a web page communicating unit 102.
- the web page communicating unit 102 then reads out the web page from the storage unit 103 corresponding to the received web page address and transmits it, in step s152, back to the client 1.
- the web page receiving unit 104 in the client 1 receives, in step s103, the web page transmitted from the server 3 and a web page display unit 105 in the client 1 develops image and character data on the basis of the received web page and displays it on a display (not shown) in step s104. Then, in step s105, a speech data require unit 106 in the client 1 processes the received web page and determines whether it comprises a requirement for speech data. In particular, the speech data require unit 106 processes the received web page to determine whether or not it includes a predetermined speech command which identifies that a speech data file exists which is associated with the received web page.
- FIGS 2b, 2c and 2d illustrate the form of different types of web pages which may be received in step s103 by the client 1.
- each of the web pages includes a header portion 11 which defines, among other things, the size and position of a window in which the web page will be displayed; a text portion 13 which includes the text to be displayed and a speech command portion 15 which identifies that there is speech data associated with the current web page.
- the speech command portion 15 may occur before or after the text portion 13 and will comprise at least the file name 17 of the file holding the associated speech data.
- the speech command also includes an address portion 19 which identifies where the speech data file is stored.
- the address 19 will identify the same server 3 which transmitted the current web page being viewed by the user. However this is not essential and the address portion 19 may identify a different server. Further, as illustrated in Figure 2d, the speech command may include the file names 17a and 17b of more than one speech data file and the corresponding address portions 19a and 19b. In this embodiment, the address portion 19 also identifies where the speech data file 17 is to be found within the identified server.
- step s105 the client 1 determines that there is a requirement for speech data, then the processing proceeds to step s106, otherwise it proceeds to step s109.
- a speech data require unit 106 transmits a request for the speech data. This request for speech data includes the file name 17 for the speech data file and its address 19.
- a speech data communicating unit 107 located within the server 3 receives, in step s153, the speech data request transmitted from the client 1 and then retrieves the appropriate speech data file from the speech data storage unit 108 and then transmits it back to the client 1 in step s154.
- a speech synthesizer 109 located within the client 1 receives, in step s107, the speech data file transmitted from the server 3, transforms the speech data into a synthesized speech signal in step s108 and outputs the synthesized speech signal to a loudspeaker (not shown).
- step s109 the client 1 determines whether or not the browsing process is finished. If it has, then the processing ends, otherwise the processing returns to step s101 where the process is continued for a new input address.
- the browser is a computer program run on a conventional computer system and which is displayed as a window on the display of the computer system. Therefore, in order to finish the browsing process, the user of the client 1 can simply close the window currently running the browser program.
- the speech data within the speech data file transmitted from the server 3 to the client 1 includes phoneme data (identifying phoneme symbols) which defines the sound signals to be output by the speech synthesizing unit 109; prosody control data which is used to control the stress and intonation of the sounds defined by the phoneme data; and speaker control data which controls the pitch, speed and type of speaker (male or female) of the speech signals generated by the speech synthesizing unit 109.
- Figure 3 illustrates the phoneme symbols used in this embodiment which are transmitted from the server 3 to the client 1. Figure 3 also gives, next to each symbol, an example of how the corresponding phoneme is used.
- Figure 3 illustrates the phoneme symbols are split into three groups - those relating to simple vowels 25, those relating to diphthongs 27 and those relating to consonants 29.
- Figure 4 illustrates the prosody control data used in this embodiment, which, in use, is embedded within the transmitted phoneme data.
- the prosody control data includes symbols which identify boundaries between syllables and words and symbols for controlling the stress applied to vowels and other symbols which control the pronunciation of the phonemes.
- Figure 5 illustrates the form of the speaker control data used in this embodiment.
- the top half 31 of Figure 5 illustrates the way in which the speaker control data is used, whilst the lower half 33 of Figure 5 illustrates the parameters that can be set.
- the parameters include the speed at which the speech is generated (which can be set to a value of 1 to 4); the pitch of the generated speech (which can be set to a value indicative of a shift from a predetermined standard speaker's pitch); and the type of speaker (which can either be set to male or female).
- Figure 6 illustrates the phoneme data together with the embedded prosody control data generated for the example text: "New York City.” and "This is a text-to-speech system.”
- the prosody control data used in these examples include the use of a primary stress control symbol "1" which causes the speech synthesizer 109 to place more stress on the subsequent vowel and a period control symbol ".” which causes the speech synthesizer 109 to add a pause.
- Figure 7 illustrates a further example which shows text data for display 39 together with corresponding speech data 41 which includes the use of both the above described prosody control data and speaker control data.
- the speaker control data may be inserted within the phoneme data so that, for example, the type of speaker outputting the speech may be changed during the speech synthesizing operation.
- the speech data file which is transmitted from the server 3 to the client 1 only includes phoneme data and appropriate control data
- the speech data file will be much smaller than a corresponding recorded speech waveform file and it can therefore be retrieved from the server 3 more quickly.
- the data for controlling the way in which the speech synthesizer 109 synthesizes the speech is also transmitted from the server 3, it can be set in advance by the owner of the web site. Therefore, the web site owner can set the control data so that the client's speech synthesizer 109 will output the speech in a predetermined manner.
- FIG. 8 is a block diagram illustrating the client server system of this embodiment
- Figure 9 is a flow diagram illustrating the processing steps performed by this client server system during a browsing operation.
- the main difference between this embodiment and the first embodiment is in the provision of a speech synthesizer user interface 209 within the client 1.
- the description of this second embodiment will therefore be restricted to the interaction of this speech synthesizer user interface unit 209 with the other modules of the system.
- the speech data require unit 106 determines that the received web page includes a speech command identifying that a speech data file exists which is associated with the current web page being viewed, it starts the speech synthesizer user interface unit 209 as well as transmitting the requirement for speech data to the server 3.
- the speech data communicating unit 107 located within the server 3 retrieves the appropriate speech data file and transmits it back to the client 1, the speech synthesizer 109 synthesizes a corresponding speech signal under control of the speech synthesizer user interface unit 209.
- Figure 11a shows an example of a graphical user interface which the speech synthesizer user interface 209 outputs to the display (not shown) of the client 1, and which allows the user of the client to set various control parameters and input various commands used to control the operation of the speech synthesizer 109.
- the graphical user interface illustrated in Figure 11a allows the user to start, stop, pause and restart the operation of the speech synthesizer 109, to change the pitch and speed of the synthesized speech signal and to change the type of speaker etc.
- FIG 10 is a flow chart illustrating the way in which commands and settings input via the speech synthesizer user interface 209, controls the operation of the speech synthesizer 109.
- the speech synthesizer 109 receives a command input by the user via the user interface 209.
- the speech synthesizer 109 determines, in step s20902 if the input command is a start command. If it is, then the processing proceeds to step S20903 where the speech synthesizer 109 begins to output a synthesized speech signal from the received speech data and the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
- step s20902 the input command is not the start command, then the processing proceeds to step s20904 where the speech synthesizer 109 determines if the input command is a stop command. If it is, then the processing proceeds to step s20905 where the speech synthesizer 109 stops outputting the synthesized speech signal and the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
- step s20904 the input command is not the stop command, then the processing proceeds to step s20906 where the speech synthesizer 109 determines whether or not the input command is a pause command. If it is, then the processing proceeds to step s20907 where the speech synthesizer 109 pauses outputting the synthesized speech signal at the current location in the speech data file and then the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
- step s20906 the speech synthesizer 109 determines that the input command is not the pause command, then processing proceeds to step s20908 where the speech synthesizer 109 determines whether or not the input command is a restart command. If it is, then the processing proceeds to step s20909 where the speech synthesizer 109 restarts outputting the synthesized speech signal from the current location (corresponding to the location where the outputting of the synthesized speech was paused) in the speech data file and then the processing returns to step s20901 where the speech synthesizer awaits the next input command.
- step s20908 the speech synthesizer 109 determines that the input command is not the restart command, then the processing proceeds to step s20910 where the speech synthesizer 109 determines if the input command is a command to change the pitch of the synthesized speech. If it is, then the processing proceeds to step s20911 where the pitch of the generated speech signal is changed on the basis of the pitch level set by the user. The processing then returns to step s20901 where speech synthesizer 109 awaits the next input command.
- step s20910 the speech synthesizer 109 determines that the input command is not a change of pitch command. If it is, then the processing proceeds to step s20913 where the speech synthesizer 109 changes the speed at which the speech is being synthesized on the basis of the speed level set by the user. The processing then returns to step s20901 where the speech synthesizer awaits the next input command.
- step s20912 the speech synthesizer 109 determines that the input command is not a speed change command
- processing proceeds to step s20914 where the speech synthesizer 109 determines if the input command is a command to change the type of speaker. If it is, then the processing proceeds to step s20915 where the settings of the speech synthesizer 109 are changed in accordance with the type of speaker set by the user. The processing then returns to step s20901 where the next input command is awaited.
- step s20914 the speech synthesizer 109 determines that the input command is not a command to change the type of speaker, then the processing proceeds to step s20916 where the speech synthesizer 109 determines whether or not the input command is to end the synthesizing operation. If it is, then the processing ends, otherwise the processing returns to step s20901 where the speech synthesizer 109 awaits the next input command.
- the speech synthesizer user interface 209 is a graphical user interface such as the two example interfaces shown in Figure 11.
- the interfaces include a control button 10 for inputting a command to start a synthesizing operation; a control button 11 for stopping a synthesizing operation and a control button 12 for controlling the pausing and restarting of a synthesizing operation, each of which is activated by a user controlled cursor (not shown) in a known manner.
- the interfaces also include a menu select button 13 for changing the type of speaker from a male speaker to a female speaker and vice versa and two sliders 14 and 15 for controlling the pitch and speed of the synthesized speech respectively.
- a progress slider 16 is also provided to show the user the progress of the speech synthesizer in generating synthesized speech signals for the received speech data file.
- the interface may be provided through a numerical or general keyboard forming part of the client, provided the relation between the keys and the commands is defined in advance.
- Figure 12 is a table illustrating the correlation between keys of a numeric keyboard and a general keyboard which may be programmed in advance to allow the user to control the speech synthesizer 109 using the keys of the keyboards.
- the user can change the default settings of the control parameters used to control the speech synthesizing operation set by the web site owners.
- the user of the client can customize the way in which the speech synthesizer synthesizes the speech in accordance with the user's preferences.
- Figure 13 is a block diagram illustrating the client server system of this embodiment
- Figure 14 is a flow diagram illustrating the processing steps performed by this client server system during a browsing operation.
- the main difference between this embodiment and the second embodiment is the arrow coming from the speech synthesizer 109 to the address transmitting unit 101.
- the description of this third embodiment will therefore be restricted to the processing which is performed in this situation.
- step s310 the speech synthesizer 109 determines whether or not a hypertext address is found and output from the received speech data. If there is not, then the processing proceeds to step s109 as before. On the other hand, if an address is output, then the processing returns to step s101 where the output address is input to the address transmitting unit 101 so that the linked web page is accessed.
- the speech synthesizer 109 in addition to generating a synthesized speech signal corresponding to the received speech data on the basis of commands within the speech data and/or commands input by the user via the user interface 209, the speech synthesizer 109 also responds to input commands designating a move to another part of the received speech data or to a linked web page.
- Figure 15 is a flow chart illustrating the processing steps performed by the speech synthesizer 109 in determining whether or not to move to another part of the speech data or to linked data.
- Figure 16 shows an example of speech data which includes link data 162.
- Figure 16 also shows the corresponding text which is displayed to the user and which illustrates to the user that there is a link (by virtue of the underlined text).
- the link data 162 includes and address 162-1 which identifies the location of related information (which may be a related web site or it may be a further file stored in the same web site) and speech data 162-2 of a message for the user (which in this example is: "For Canon-group companies pages, push M key now").
- step s30901 the process shown starts when the user inputs in step s30901, one of the commands shown in Figure 17 whilst the speech synthesizer 109 is synthesizing speech.
- the commands include "next link”, “previous link” and “go to the link”.
- step s30902 the speech synthesizer 109 determines whether or not the input command is the command to move to the next link. If it is, then the processing proceeds to step s30903 where the speech synthesizer 109 searches forward from its current location within the speech data file, for the next link 162 and restarts the synthesizing operation from the speech data 162-2 within that link 162.
- the processing then returns to step s30901 where the next input command is awaited. In this way, the user can cause the speech synthesizer to skip portions of the speech data being synthesized.
- step s30902 the speech synthesizer 109 determines that the input command is not a command to go to the next link, then the processing proceeds to step s30904 where the speech synthesizer 109 determines whether or not the input command is a command to return to the previous link. If it is, then the processing proceeds to step s30905 where the speech synthesizer 109 searches backward from its current location for the previous link and then restarts the synthesizing operation from the speech data 612-2 within that link 162. The processing then proceeds to step s30901 where the next input command is awaited.
- step s30904 the speech synthesizer 109 determines whether or not the input command was to go to the link. If it is, then the speech synthesizer retrieves the hypertext address 162-1 from the link data and outputs the address in step s30907 to the address transmitting unit 101. The processing then ends. If, however, at step s30906 the speech synthesizer 109 determines that the input command is not to go to the link, then the processing returns to step s30901 where the next input command is awaited.
- the user can control the synthesizing of the current speech data file and can cause the client to access a linked hypertext address to retrieve further text data and/or further speech data.
- the speech synthesizer 109 can be configured so that speech data 162-2 which forms part of a link is synthesized with a different speed or pitch or with a different type of speaker. If the speaker type is to change, then this can simply be set to be the opposite of the speaker type for speech being generated from speech data which is not part of a link.
- FIG. 18 is a block diagram illustrating the client server system of this embodiment
- Figure 19 is a flow diagram illustrating the processing steps initially performed during a browsing operation.
- the client 1 in this embodiment does not include a speech synthesizer. Therefore, prior to or at the same time that the client 1 transmits a request for the speech data, it transmits a request to the server 3 to send the client 1 a speech synthesizer. The way in which this is performed will now be described in more detail.
- step s405 the client 1 determines whether or not there is a speech synthesizer module in the client. If there is, then the processing shown in Figure 19 ends and the processing returns to step s105 shown in Figure 2, where the downloading of the speech data is performed in the manner described above.
- step s405 determines that there is no speech synthesizer
- processing proceeds to step s406 where a speech data require unit 406 transmits a request for a speech synthesizer module to the server 3.
- This request is received at step s453 by the speech synthesizer module transmitting unit 407.
- the speech synthesizer module transmitting unit 407 retrieves an appropriate speech synthesizer module from the storage unit 408 and transmits it, in step s454 back to the client 1.
- the transmitted speech synthesizer is received in step s407 by a speech synthesizer receiving unit 409.
- the processing proceeds to step s408 where the received speech synthesizer is initialized and set into a working condition.
- the processing shown in Figure 19 then ends and the processing then returns to step s105 shown in Figure 2.
- the client may have a speech synthesizer stored in, for example, a hard disk, it may not be currently running. In this case, rather than downloading the speech synthesizer from the server 3, the client may simply load the stored speech synthesizer into the working memory and set the synthesizer into a working condition.
- the web page received from the server included both text data and a speech command which identifies that a speech data file exits for the received web page.
- This speech command included both a file name and an address for this speech data file.
- the speech command may only include a file name for the speech data file.
- the client may be programmed to assume that the speech data file is stored in the server from which it downloaded the web page. Therefore, in such an embodiment, the request for the speech data file would be transmitted to the server using the address stored in the address memory 110.
- the user interface allowed a user to move to different portions of a speech data file being synthesized and to retrieve a further web page and/or a further speech data file.
- the user interface allows the user to input a command to move from the current point in the speech data file to a next or to a previous link.
- the links were used as control characters within the speech data files in addition to providing a link to another web page or another speech data file.
- the speech data file may be arranged in paragraphs or sections, with each paragraph or section having a control header which can be used to control movement of the speech synthesizing operation through the speech data file.
- the user may input a command to move to a next paragraph or to a previous paragraph.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Information Transfer Between Computers (AREA)
Claims (55)
- Appareil (1) de récupération de données comportant :un moyen (104) destiné à recevoir des données de texte ;un moyen (106) destiné à traiter les données de texte reçues pour déterminer si elles comprennent ou non des données identifiant un fichier de données de parole qui est associé aux données de texte reçues, caractérisé en ce que le fichier de données de parole comprend des données de phonèmes et des données de commande pour une utilisation d'un synthétiseur vocal dans la synthèse d'un signal vocal correspondant aux données de phonèmes sous la commande des données de commande ; etun moyen (109) qui, en réponse audit moyen de traitement, est destiné à récupérer le fichier de données vocales.
- Appareil selon la revendication 1, comportant en outre un moyen destiné à transmettre une demande pour les données de texte à un terminal éloigné qui stocke les données de texte.
- Appareil selon la revendication 1 ou 2, comportant un moyen de stockage destiné à stocker un ou plusieurs ordres prédéterminés identifiant des données vocales, et dans lequel ledit moyen de traitement peut être mis en oeuvre pour comparer lesdits ordres prédéterminés stockés aux données de texte reçues.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel ledit moyen de récupération comprend un moyen destiné à transmettre une demande pour ledit fichier de données vocales à un emplacement de stockage, et un moyen destiné à recevoir ledit fichier de données vocales depuis ledit emplacement de stockage.
- Appareil selon l'une quelconque des revendications précédentes, comportant en outre une interface d'utilisateur pour permettre à un utilisateur dudit appareil d'établir un ou plusieurs paramètres de commande desdites données de commande.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel lesdites données de commande comprennent des données de commande de prosodies pour commander la prononciation desdites données de phonèmes.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel lesdites données de commande comprennent les données de commande de parleur destinées à commander des paramètres de ladite parole associés à un parleur.
- Appareil selon la revendication 7, dans lequel lesdits paramètres associés à un haut-parleur comprennent la vitesse à laquelle la parole est synthétisée.
- Appareil selon la revendication 7 ou 8, dans lequel lesdits paramètres associés à un parleur comprennent la hauteur à laquelle ladite parole est synthétisée.
- Appareil selon l'une quelconque des revendications 7 à 9, dans lequel lesdits paramètres associés à un parleur comprennent le fait de déterminer si la parole est synthétisée sous la forme d'une voix masculine ou d'une voix féminine.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel ledit fichier de données vocales comprend une ou plusieurs liaisons avec d'autres données, et dans lequel ledit appareil comporte un moyen qui, en réponse à un ordre introduit par un utilisateur, est destiné à récupérer lesdites autres données.
- Appareil selon la revendication 11, dans lequel ladite une ou plusieurs liaisons comprennent une partie d'adresse qui identifie l'emplacement de stockage desdites autres données et des données vocales associées à ladite liaison.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel ledit moyen de récupération peut être mis en oeuvre pour récupérer un synthétiseur vocal à partir d'un terminal éloigné pour synthétiser des signaux vocaux correspondant auxdites données vocales.
- Appareil selon l'une quelconque des revendications 1 à 12, comportant en outre un synthétiseur vocal pouvant être mis en oeuvre pour recevoir lesdites données de phonèmes et lesdites données de commande, et pouvant être mis en oeuvre pour délivrer en sortie un signal vocal correspondant aux données de phonèmes sous la commande desdites données de commande.
- Appareil selon la revendication 14, comportant en outre un haut-parleur destiné à générer des sons vocaux correspondant au signal vocal synthétisé par ledit synthétiseur vocal.
- Appareil selon la revendication 14 ou 15, dans lequel ledit moyen de récupération peut être mis en oeuvre pour récupérer des données d'établissement destinées à commander une procédure d'établissement pour ledit synthétiseur avant la génération dudit signal vocal correspondant auxdites données vocales.
- Appareil selon l'une quelconque des revendications 14 à 16, comportant en outre une interface d'utilisateur pour permettre à un utilisateur de se déplacer vers différentes parties d'un fichier de données vocales devant être synthétisées.
- Appareil selon la revendication 17 lorsqu'elle dépend de la revendication 11, dans lequel ladite interface d'utilisateur permet à un utilisateur de se déplacer jusqu'à une liaison suivante ou jusqu'à une liaison précédente dans ledit fichier de données vocales.
- Appareil selon l'une quelconque des revendications précédentes, comportant en outre un moyen de sortie destiné à délivrer en sortie à un utilisateur les données de texte reçues.
- Appareil selon la revendication 19, dans lequel ledit moyen de sortie comprend un visuel.
- Appareil selon la revendication 19 ou 20, dans lequel ledit moyen de traitement peut être mis en oeuvre pour traiter lesdites données de texte après que ledit moyen de sortie a délivré en sortie audit utilisateur les données de texte.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel lesdites données de texte font partie d'un fichier de données d'hypertexte.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel lesdites données de texte font partie d'une page web.
- Appareil selon l'une quelconque des revendications précédentes, dans lequel lesdites données de texte comprennent des données identifiant un emplacement de stockage dudit fichier de données vocales, et dans lequel ledit moyen de récupération peut être mis en oeuvre pour récupérer ledit fichier de données vocales à partir dudit emplacement de stockage.
- Navigateur comportant un appareil de récupération de données selon l'une quelconque des revendications précédentes et un visuel destiné à afficher les données de texte reçues, et un haut-parleur destiné à délivrer en sortie une parole synthétisée.
- Système de récupération de données comportant un ou plusieurs terminaux informatiques stockant au moins des données de texte et dans lequel au moins l'un desdits terminaux stocke aussi des données vocales correspondant à certaines des données de texte stockées, un appareil de récupération de données selon l'une quelconque des revendications précédentes pour récupérer des données de texte et des données vocales à partir desdits terminaux et un réseau de communications pour relier ledit appareil de récupération audit un ou auxdits plusieurs terminaux et par l'intermédiaire duquel les données de texte et les données vocales récupérées passent.
- Système selon la revendication 26, dans lequel ledit un ou lesdits plusieurs terminaux informatiques sont des serveurs, dans lequel ledit réseau de communications fait partie de l'Internet et ledit appareil de récupération de données comporte un navigateur.
- Procédé de récupération de données comprenant les étapes qui consistent :à recevoir (S103) des données de texte ;à traiter (S105) les données de texte reçues pour déterminer si elles comprennent ou non des données identifiant un fichier de données vocales qui est associé aux données de texte reçues,
à récupérer (S106 ; S107) le fichier de données vocales. - Procédé selon la revendication 28, comprenant en outre l'étape de transmission d'une demande pour les données de texte à un terminal éloigné qui stocke les données de texte.
- Procédé selon la revendication 28 ou 29, comprenant l'étape de stockage d'un ou plusieurs ordres prédéterminés identifiant des données vocales et dans lequel ladite étape de traitement compare lesdits ordres prédéterminés stockés aux données de texte reçues.
- Procédé selon l'une quelconque des revendications 28 à 30, dans lequel ladite étape de récupération comprend les étapes qui consistent à transmettre une demande pour ledit fichier de données vocales à un emplacement de stockage et à recevoir ledit fichier de données vocales depuis ledit emplacement de stockage.
- Procédé selon l'une quelconque des revendications 28 à 31, comprenant en outre l'étape de réception de paramètres de commande depuis une interface d'utilisateur.
- Procédé selon l'une quelconque des revendications 28 à 32, dans lequel lesdites données de commande comprennent des données de commande de prosodies destinées à commander la prononciation desdites données de phonèmes.
- Procédé selon l'une quelconque des revendications 28 à 33, dans lequel lesdites données de commande comprennent des données de commande de parleur destinées à commander des paramètres associés à un parleur de ladite parole.
- Procédé selon la revendication 34, dans lequel lesdits paramètres associés à un parleur comprennent la vitesse à laquelle la parole est synthétisée.
- Procédé selon la revendication 34 ou 35, dans lequel lesdits paramètres associés à un parleur comprennent la hauteur à laquelle ladite parole est synthétisée.
- Procédé selon l'une quelconque des revendications 34 à 36, dans lequel lesdits paramètres associés à un parleur comprennent le fait de déterminer si la parole est synthétisée sous la forme d'une voix masculine ou d'une voix féminine.
- Procédé selon l'une quelconque des revendications 28 à 37, dans lequel ledit fichier de données vocales comprend une ou plusieurs liaisons vers d'autres données et dans lequel ledit procédé comprend l'étape de récupération desdites autres données en réponse à un ordre appliqué en entrée par un utilisateur.
- Procédé selon la revendication 38, dans lequel ladite une ou lesdites plusieurs liaisons comprennent une partie d'adresse qui identifie l'emplacement de stockage desdites autres données et desdites données vocales associées à ladite liaison.
- Procédé selon l'une quelconque des revendications 28 à 39, dans lequel ladite étape de récupération récupère un synthétiseur vocal à partir d'un terminal éloigné pour synthétiser des signaux vocaux correspondant auxdites données vocales.
- Procédé selon l'une quelconque des revendications 28 à 39, comprenant en outre les étapes consistant à synthétiser et délivrer en sortie un signal vocal correspondant aux données de phonèmes sous la commande desdites données de commande.
- Procédé selon la revendication 41, comprenant en outre l'étape consistant à délivrer en sortie à un haut-parleur ledit signal vocal synthétisé.
- Procédé selon la revendication 41 ou 42, dans lequel ladite étape de récupération récupère des données d'établissement pour commander une procédure d'établissement pour un synthétiseur vocal avant la génération dudit signal vocal correspondant aux données vocales.
- Procédé selon l'une quelconque des revendications 41 à 43, comprenant en outre l'étape de réception d'un ordre d'entrée provenant d'un utilisateur pour se déplacer vers différentes parties d'un fichier de données vocales devant être synthétisées.
- Procédé selon la revendication 44 lorsqu'elle dépend de la revendication 37, dans lequel ledit ordre d'entrée est un ordre pour passer à une liaison suivante ou à une liaison précédente dans ledit fichier de données vocales.
- Procédé selon l'une quelconque des revendications 28 à 45, comprenant en outre l'étape consistant à délivrer en sortie à un utilisateur les données de texte reçues.
- Procédé selon la revendication 46, dans lequel ladite étape de sortie délivre en sortie à un visuel lesdites données de texte.
- Procédé selon la revendication 46 ou 47, dans lequel ladite étape de traitement est exécutée après que ladite étape de sortie a délivré en sortie lesdites données de texte audit utilisateur.
- Procédé selon l'une quelconque des revendications 28 à 48, dans lequel lesdites données de texte font partie d'un fichier de données hypertexte.
- Procédé selon l'une quelconque des revendications 28 à 49, dans lequel lesdites données de texte font partie d'une page web.
- Procédé selon l'une quelconque des revendications 28 à 50, dans lequel lesdites données de texte comprennent des données identifiant un emplacement de stockage dudit fichier de données vocales et dans lequel ladite étape de récupération récupère ledit fichier de données vocales à partir dudit emplacement de stockage.
- Procédé de récupération de données comprenant les étapes qui consistent :à un premier terminal informatique :à recevoir des données de texte ;à traiter les données de texte reçues pour déterminer si elles comprennent ou non des données identifiant un fichier de données vocales qui est associé aux données de texte reçues et qui comprend des données de phonèmes et des données de commande destinées à être utilisées par un synthétiseur vocal dans la synthèse d'un signal vocal correspondant aux données de phonèmes sous la commande des données de commande ; età demander à un second terminal informatique éloigné d'envoyer ledit fichier de données vocales audit premier terminal informatique ; etaudit second terminal informatique éloigné :à recevoir ladite demande pour ledit fichier de données vocales ;à récupérer ledit fichier de données vocales conformément à ladite demande ; età transmettre ledit fichier de données vocales récupéré audit premier terminal informatique.
- Procédé selon la revendication 52, exécuté sur l'Internet.
- Support de stockage stockant des instructions exécutables par un processeur pour commander un processeur afin qu'il exécute le procédé selon l'une quelconque des revendications 28 à 53.
- Instruction exécutable par un processeur pour commander un processeur afin qu'il exécute le procédé selon l'une quelconque des revendications 28 à 53.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP11217219A JP2001043064A (ja) | 1999-07-30 | 1999-07-30 | 音声情報処理方法、装置及び記憶媒体 |
JP21721999 | 1999-07-30 |
Publications (3)
Publication Number | Publication Date |
---|---|
EP1073036A2 EP1073036A2 (fr) | 2001-01-31 |
EP1073036A3 EP1073036A3 (fr) | 2003-12-17 |
EP1073036B1 true EP1073036B1 (fr) | 2005-12-14 |
Family
ID=16700731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00306355A Expired - Lifetime EP1073036B1 (fr) | 1999-07-30 | 2000-07-26 | Analyse de documents téléchargés pour un navigateur équipé d'un dispositif de synthèse de la parole |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1073036B1 (fr) |
JP (1) | JP2001043064A (fr) |
DE (1) | DE60024727T2 (fr) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4014361B2 (ja) * | 2001-01-31 | 2007-11-28 | シャープ株式会社 | 音声合成装置及び音声合成方法並びに音声合成プログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP4225703B2 (ja) * | 2001-04-27 | 2009-02-18 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 情報アクセス方法、情報アクセスシステムおよびプログラム |
JP2002358092A (ja) * | 2001-06-01 | 2002-12-13 | Sony Corp | 音声合成システム |
US20030187656A1 (en) * | 2001-12-20 | 2003-10-02 | Stuart Goose | Method for the computer-supported transformation of structured documents |
JP4462901B2 (ja) * | 2003-11-11 | 2010-05-12 | 富士通株式会社 | モーダル同期制御方法およびマルチモーダルインターフェイスシステム |
JP4653572B2 (ja) * | 2005-06-17 | 2011-03-16 | 日本電信電話株式会社 | クライアント端末、音声合成情報処理サーバ、クライアント端末プログラム、音声合成情報処理プログラム |
WO2008132533A1 (fr) * | 2007-04-26 | 2008-11-06 | Nokia Corporation | Procédé, appareil et système de conversion de texte en voix |
JP2013097033A (ja) * | 2011-10-28 | 2013-05-20 | Hitachi Government & Public Corporation System Engineering Ltd | 音声合成情報付きのテキストデータ提供装置及びテキストデータ提供方法 |
US9117451B2 (en) | 2013-02-20 | 2015-08-25 | Google Inc. | Methods and systems for sharing of adapted voice profiles |
US10978060B2 (en) | 2014-01-31 | 2021-04-13 | Hewlett-Packard Development Company, L.P. | Voice input command |
CN110737817A (zh) * | 2018-07-02 | 2020-01-31 | 中兴通讯股份有限公司 | 浏览器的信息处理方法、装置、智能设备及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5673362A (en) * | 1991-11-12 | 1997-09-30 | Fujitsu Limited | Speech synthesis system in which a plurality of clients and at least one voice synthesizing server are connected to a local area network |
AU1566497A (en) * | 1995-12-22 | 1997-07-17 | Rutgers University | Method and system for audio access to information in a wide area computer network |
US5915001A (en) * | 1996-11-14 | 1999-06-22 | Vois Corporation | System and method for providing and using universally accessible voice and speech data files |
US6018710A (en) * | 1996-12-13 | 2000-01-25 | Siemens Corporate Research, Inc. | Web-based interactive radio environment: WIRE |
US5899975A (en) * | 1997-04-03 | 1999-05-04 | Sun Microsystems, Inc. | Style sheets for speech-based presentation of web pages |
US6246672B1 (en) * | 1998-04-28 | 2001-06-12 | International Business Machines Corp. | Singlecast interactive radio system |
-
1999
- 1999-07-30 JP JP11217219A patent/JP2001043064A/ja active Pending
-
2000
- 2000-07-26 DE DE60024727T patent/DE60024727T2/de not_active Expired - Lifetime
- 2000-07-26 EP EP00306355A patent/EP1073036B1/fr not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
EP1073036A2 (fr) | 2001-01-31 |
DE60024727T2 (de) | 2006-07-20 |
EP1073036A3 (fr) | 2003-12-17 |
DE60024727D1 (de) | 2006-01-19 |
JP2001043064A (ja) | 2001-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5899975A (en) | Style sheets for speech-based presentation of web pages | |
US7062437B2 (en) | Audio renderings for expressing non-audio nuances | |
EP1490861B1 (fr) | Procede, appareil et logiciel pour la synthese vocale | |
EP2112650B1 (fr) | Appareil de synthèse vocale, procédé de synthèse vocale, programme de synthèse vocale, terminal d'informations portable et système de synthèse vocale | |
JP2001521194A (ja) | Htmlデータのページを聴覚的に表すシステム及び方法 | |
KR20030040486A (ko) | 멀티-모달 콘텐츠 렌더러에서의 오디오 및 비쥬얼프레젠테이션의 동기화 방법 및 시스템 | |
JPH10274997A (ja) | 文書読み上げ装置 | |
JPH06214741A (ja) | テキスト−音声変換を制御するグラフィックスユーザインターフェイス | |
EP1073036B1 (fr) | Analyse de documents téléchargés pour un navigateur équipé d'un dispositif de synthèse de la parole | |
US6732078B1 (en) | Audio control method and audio controlled device | |
WO2004097656A1 (fr) | Systeme de creation de contenu, procede de creation de contenu, programme executable par ordinateur destine a executer le procede de creation de contenu, support d'enregistrement lisible par ordinateur contenant le programme, systeme d'interface utilisateur graphique et procede de commande d'affichage | |
JP2741833B2 (ja) | マルチメデイア提示内で音声的サーチ・パターンを使用するためのシステム及びその方法 | |
JP2000231475A (ja) | マルチメディア情報閲覧システムにおける音声読み上げ方法 | |
JP2001268669A (ja) | 移動電話端末を利用した機器制御装置、方法、及び記録媒体 | |
JP2001306601A (ja) | 文書処理装置及びその方法、及びそのプログラムを格納した記憶媒体 | |
US20020026314A1 (en) | Document read-out apparatus and method and storage medium | |
US6246984B1 (en) | Device having functionality means supported by ancillary message reproduction means | |
JP5338298B2 (ja) | ページ閲覧装置およびプログラム | |
KR102020341B1 (ko) | 악보 구현 및 음원 재생 시스템 및 그 방법 | |
JP4311710B2 (ja) | 音声合成制御装置 | |
JPH09311775A (ja) | 音声出力装置及びその方法 | |
WO2013061719A1 (fr) | Dispositif de fourniture de données de texte jointes à des informations de synthèse de parole, et procédé de fourniture de données de texte | |
JP2005181358A (ja) | 音声認識合成システム | |
JPH08272388A (ja) | 音声合成装置及びその方法 | |
JP2002023781A (ja) | 音声合成装置、音声合成装置におけるフレーズ単位修正方法、音声合成装置における韻律パターン編集方法、音声合成装置における音設定方法および音声合成プログラムを記録したコンピュータ読み取り可能な記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
17P | Request for examination filed |
Effective date: 20040510 |
|
AKX | Designation fees paid |
Designated state(s): DE FR GB |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 60024727 Country of ref document: DE Date of ref document: 20060119 Kind code of ref document: P |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20060915 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20130731 Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20130726 Year of fee payment: 14 Ref country code: GB Payment date: 20130712 Year of fee payment: 14 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60024727 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20140726 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20150331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20150203 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60024727 Country of ref document: DE Effective date: 20150203 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140726 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140731 |