DE102012219852A1

DE102012219852A1 - Method for manipulating text-to-speech output to operator, involves detecting gesture of operator in gesture information and evaluating gesture information to detect operator command, where parameter of text-to-speech output is adjusted

Info

Publication number: DE102012219852A1
Application number: DE201210219852
Authority: DE
Inventors: Jens Heimsoth
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2012-10-30
Filing date: 2012-10-30
Publication date: 2014-04-30

Abstract

The method involves detecting a gesture of the operator in the gesture information and evaluating the gesture information to detect a command of the operator. A parameter of text-to-speech output is adjusted using the operator command in order to manipulate the text-to-speech output. The gesture is detected as a spatial sequence of movements of a body part (118) of the operator within a detection area (116) of a detection device (102). Independent claims are included for the following: (1) a device for executing the method; and (2) a computer program product for executing the method.

Description

Stand der TechnikState of the art

Die vorliegende Erfindung bezieht sich auf ein Verfahren zum Beeinflussen einer Text-zu-Sprache-Ausgabe, auf ein Verfahren zum Ausgeben von Text als Sprache, auf eine entsprechende Vorrichtung sowie auf ein entsprechendes Computerprogrammprodukt.The present invention relates to a method for influencing a text-to-speech output, to a method for outputting text as speech, to a corresponding device and to a corresponding computer program product.

Die DE 10 2008 051 757 A1 beschreibt eine multimodale Benutzerschnittstelle eines Fahrerassistenzsystems zur Eingabe und Präsentation von Informationen.The DE 10 2008 051 757 A1 describes a multimodal user interface of a driver assistance system for entering and presenting information.

Offenbarung der ErfindungDisclosure of the invention

Vor diesem Hintergrund wird mit dem hier vorgestellten Ansatz ein Verfahren zum Beeinflussen einer Text-zu-Sprache-Ausgabe, ein Verfahren zum Ausgeben von Text als Sprache, weiterhin eine Vorrichtung, die eines dieser Verfahren verwendet sowie schließlich ein entsprechendes Computerprogrammprodukt gemäß den Hauptansprüchen vorgestellt. Vorteilhafte Ausgestaltungen ergeben sich aus den jeweiligen Unteransprüchen und der nachfolgenden Beschreibung.Against this background, with the approach presented here, a method for influencing a text-to-speech output, a method for outputting text as a language, furthermore a device which uses one of these methods and finally a corresponding computer program product according to the main claims are presented. Advantageous embodiments emerge from the respective subclaims and the following description.

Informationen können einen Informationsempfänger auf verschiedenen Übertragungswegen erreichen. Insbesondere kann der Informationsempfänger die Informationen über seine Sinnesorgane aufnehmen. Um die, im Allgemeinen als wichtigste Sinnesorgane bezeichneten Augen zu entlasten, kann eine als Text vorliegende Information vorgelesen werden. Dadurch können die Augen andere Informationen aufnehmen. Information can reach an information recipient on different transmission paths. In particular, the information recipient can receive the information about his sense organs. In order to relieve the eyes, which are generally referred to as the most important sensory organs, it is possible to read out textual information. This allows the eyes to record other information.

Die Erfindung eignet sich in besonderer Weise auch für ein sogenanntes Fahrer-Informations-System in einem Fahrzeug, das Informationen an den Fahrzeugführer ausgibt. The invention is particularly suitable for a so-called driver information system in a vehicle that outputs information to the driver.

Wenn das Vorlesen, also eine Ausgabe von Text als Sprache von dem Informationsempfänger gesteuert werden soll, bedarf es einer Möglichkeit, einen Befehl des Informationsempfängers an ein Gerät für die Ausgabe zu übermitteln.If the reading aloud, that is to say an output of text as a language, is to be controlled by the information receiver, it is necessary to be able to transmit an instruction of the information receiver to a device for the output.

Der hier vorgestellte Ansatz basiert auf der Erkenntnis, dass ein Vorlesen von Text durch Gebärden oder Gesten des Informationsempfängers gesteuert werden kann, ohne dabei beispielsweise reelle oder virtuelle Schaltflächen zu bedienen. Die Gebärden können von dem Informationsempfänger frei im Raum ausgeführt werden. Ein Bewegungsablauf der Gebärde kann dabei ein Steuerkommando und/oder eine Abfolge von Steuerkommandos repräsentieren. Die Gebärden können dabei selbsterklärend sein. Die Gebärden können auch einfache Bewegungen sein, die mit einem abstrakten Steuerkommando logisch verknüpft sind.The approach presented here is based on the recognition that reading aloud text can be controlled by gestures or gestures of the information recipient without, for example, serving real or virtual buttons. The gestures can be executed by the information receiver freely in the room. A movement sequence of the gesture can represent a control command and / or a sequence of control commands. The gestures can be self-explanatory. The gestures can also be simple movements that are logically linked to an abstract control command.

Ein Verfahren zum Beeinflussen einer Text-zu-Sprache-Ausgabe an einen Bediener weist die folgenden Schritte auf:
Erfassen einer Gebärde des Bedieners in einer Gebärdeninformation;
Auswerten der Gebärdeninformation, um eine Anweisung des Bedieners zu erkennen;
Anpassen zumindest eines Parameters der Text-zu-Sprache-Ausgabe unter Verwendung der Anweisung, um die Text-zu-Sprache-Ausgabe zu beeinflussen.One method of affecting a text-to-speech output to an operator comprises the following steps:
Detecting a gesture of the operator in a gesture information;
Evaluating the gesture information to recognize an instruction of the operator;
Adjusting at least one parameter of the text-to-speech output using the instruction to affect the text-to-speech output.

Ein Verfahren zum Ausgeben von Text als Sprache weist den folgenden Schritt auf:
Bereitstellen eines Sprachsignals unter Verwendung zumindest eines Parameters, der mit einem Verfahren gemäß dem hier vorgestellten Ansatz beeinflusst wird.A method of outputting text as a language includes the following step:
Providing a speech signal using at least one parameter that is influenced by a method according to the approach presented here.

Unter einer Text-zu-Sprache-Ausgabe kann ein Verfahren zum Ausgeben von Text als Sprache verstanden werden. Bei der Text-zu-Sprache-Ausgabe kann ein Fließtext in eine akustische Sprachausgabe gewandelt werden. Beispielsweise kann die Text-zu-Sprache-Ausgabe einen elektronisch vorliegenden Text vorlesen. Eine Gebärde kann ein unterscheidbares Zeichen mit einer zugeordneten Bedeutung sein. Die Gebärde kann eine Geste sein. Die Gebärde kann insbesondere eine Bewegung des Bedieners sein. Die Bedeutung kann beispielsweise aus der Bewegung abgeleitet sein. Ebenso kann die Bedeutung willkürlich zu einer Bewegung zugeordnet sein. Die Gebärde kann beispielsweise an eine Gebärdensprache angelehnt sein. Eine Gebärdeninformation kann beispielsweise ein elektronisches Signal sein. Die Gebärdeninformation kann ein Abbild der Gebärde sein. Die Gebärdeninformation kann beispielsweise eine Trajektorie der Bewegung eines Körperteils der die Gebärde ausführenden Person repräsentieren. Eine Anweisung kann ein Kommando oder ein Befehl zur Steuerung einer Ausgabe von Sprache aus einem Textsegment sein. Die Anweisung kann auf einen Inhalt des Texts bezogen sein. Ebenso kann die Anweisung auf die Ausgabe des Texts bezogen sein. Ein Parameter kann beispielsweise eine Sprachgeschwindigkeit, eine Lautstärke, ein Verweis auf eine Textposition oder eine Hierarchieebene des Texts sein. Ein Sprachsignal kann ein elektrisches Signal für einen Lautsprecher oder ein akustisches Signal von einem Lautsprecher sein. Das Sprachsignal kann beispielsweise erstellt werden, indem Buchstaben und/oder Silben von Worten des Texts zu einem Sprache repräsentierenden Signal synthetisiert werden. Dabei kann der Parameter die selber Synthese und/oder einen Inhalt des synthetisierten Texts beeinflussen.A text-to-speech output may be understood to mean a method of outputting text as a language. In the text-to-speech output, a continuous text can be converted into an acoustic speech output. For example, the text-to-speech output may read an electronic text. A gesture can be a distinguishable character with an associated meaning. The gesture can be a gesture. The gesture may in particular be a movement of the operator. The meaning can be derived, for example, from the movement. Likewise, the meaning can be assigned arbitrarily to a movement. For example, the gesture may be based on a sign language. Sign information may be, for example, an electronic signal. The sign information can be an image of the gesture. For example, the gesture information may represent a trajectory of movement of a body part of the person performing the gesture. An instruction may be a command or command to control an output of speech from a text segment. The instruction may be related to a content of the text. Likewise, the instruction may be related to the output of the text. A parameter may be, for example, a voice speed, a volume, a reference to a text position or a hierarchical level of the text. A speech signal may be an electrical signal for a loudspeaker or an acoustic signal from a loudspeaker. The speech signal can be created, for example, by synthesizing letters and / or syllables of words of the text into a signal representing a speech. The parameter may influence the synthesis and / or content of the synthesized text itself.

Die Gebärde kann als räumliche Bewegungsfolge zumindest eines Körperteils des Bedieners innerhalb eines Erfassungsraums einer Erfassungseinrichtung erfasst werden. Beispielsweise kann eine Kopfbewegung und/oder eine Handbewegung und/oder eine Rumpfbewegung erfasst werden. Ein Erfassungsraum kann frei im Raum angeordnet sein. Der Erfassungsraum kann eine räumliche Ausdehnung in Höhe, Breite und Tiefe aufweisen. Eine Erfassungseinrichtung kann die Gebärde beispielsweise über Ultraschall, über eine Video- bzw. Stereo-Video-Aufzeichnung, über elektromagnetische Wellen und/oder Felder und/oder über Lidar erfassen. The gesture can be detected as a spatial movement sequence of at least one body part of the operator within a detection space of a detection device. For example, a head movement and / or a hand movement and / or a trunk movement can be detected. A detection room can be arranged freely in the room. The detection space can have a spatial extent in height, width and depth. A detection device can detect the gesture, for example, via ultrasound, via a video or stereo video recording, via electromagnetic waves and / or fields and / or via lidar.

Die Gebärdeninformation und ein Sprachbefehl des Bedieners können ansprechend auf den Sprachbefehl ausgewertet werden, um die Anweisung des Bedieners zu erkennen. Ein Sprachbefehl kann ein akustisches Kommando des Bedieners sein. Die Ausgabe der Sprache kann über beide Eingabemöglichkeiten gesteuert werden. Durch eine Steuerung über Gebärden und Sprache kann eine schnelle und/oder präzise Steuerung erreicht werden.The gesture information and a voice command of the operator may be evaluated in response to the voice command to recognize the instruction of the operator. A voice command can be an acoustic command from the operator. The output of the language can be controlled via both input options. Control over gestures and speech can provide fast and / or precise control.

Das Verfahren kann einen Schritt des Vergleichens der Anweisung mit einer vorhergehend erkannten Anweisung des Bedieners aufweisen, um eine Anweisungsfolge zu erkennen, wobei im Schritt des Anpassens der Parameter ferner unter Verwendung der Anweisungsfolge angepasst wird. Eine Anweisungsfolge kann beispielsweise eine Kombination aus Anweisungen sein. Die Anweisungen können kumuliert werden. Beispielsweise kann die Anweisung die vorhergehend erkannte Anweisung konkretisieren und/oder ergänzen. The method may include a step of comparing the instruction with a previously recognized instruction of the operator to identify a sequence of instructions, wherein in the adjusting step, the parameter is further adjusted using the instruction sequence. For example, a sequence of instructions may be a combination of instructions. The instructions can be cumulated. For example, the instruction can concretize and / or supplement the previously recognized instruction.

Die Anweisung kann als ein Beschleunigungsbefehl, ein Verlangsamungsbefehl, ein Startbefehl, ein Pausenbefehl, ein Vorwärtsbefehl, ein Rückwärtsbefehl, ein Aufwärtsbefehl und/oder ein Abwärtsbefehl erkannt werden. Durch eine diskrete Anzahl von Befehlen kann jeder Gebärde ein eindeutiger Befehl zugeordnet werden. Die unterschiedlichen Gebärden können beispielsweise lineare und/oder kreisförmige Bewegungen eines Körperteils, insbesondere einer Hand des Bedieners sein. The instruction may be recognized as an acceleration command, a deceleration command, a start command, a pause command, a forward command, a backward command, an up command, and / or a down command. Through a discrete number of commands each gesture can be assigned a unique command. The different gestures may be, for example, linear and / or circular movements of a body part, in particular a hand of the operator.

Ansprechend auf den Beschleunigungsbefehl kann ein Text schneller als vor dem Beschleunigungsbefehl ausgegeben werden. Ansprechend auf den Verlangsamungsbefehl kann der Text langsamer als vor dem Verlangsamungsbefehl ausgegeben werden. Ansprechend auf den Startbefehl kann mit der Text-zu-Sprache-Ausgabe begonnen werden. Ansprechend auf den Pausenbefehl kann die Text-zu-Sprache-Ausgabe unterbrochen werden. Ansprechend auf den Vorwärtsbefehl kann ein zeitlich und/oder in einer Reihenfolge nachfolgendes Textelement ausgegeben werden. Ansprechend auf den Rückwärtsbefehl kann ein zeitlich und/oder in einer Reihenfolge vorausgehendes Textelement ausgegeben werden. Ansprechend auf den Aufwärtsbefehl kann ein Textelement einer höheren Hierarchieebene als vor dem Aufwärtsbefehl ausgegeben werden. Ansprechend auf den Abwärtsbefehl kann ein Textelement einer niedrigeren Hierarchieebene als vor dem Abwärtsbefehl ausgegeben werden.In response to the acceleration command, a text may be output faster than before the acceleration command. In response to the deceleration command, the text may be output more slowly than before the deceleration command. In response to the start command, the text-to-speech output can be started. In response to the pause command, the text-to-speech output may be interrupted. In response to the forward command, a text element following in chronological order and / or in sequence may be output. In response to the backward command, a temporal and / or sequential text element may be output. In response to the up command, a text element of a higher hierarchical level may be output than before the up command. In response to the down command, a text element of a lower hierarchical level may be output than before the down command.

Das Verfahren kann einen Schritt des Ausgebens einer Bestätigung des Erkennens der Anweisung aufweisen. Die Bestätigung kann optisch und/oder akustisch erfolgen. Beispielsweise kann die Bestätigung über eine Sprachausgabe oder einen Signalton erfolgen. Beispielsweise können ein oder mehrere Worte ausgegeben werden, die einen Kontext der erkannten Anweisung wiedergeben. Ebenso kann die Bestätigung über eine Signalleuchte oder eine optische Anzeige erfolgen.The method may include a step of issuing an acknowledgment of the acknowledgment of the instruction. The confirmation can be made optically and / or acoustically. For example, the confirmation may be via a voice output or a beep. For example, one or more words may be output that reflect a context of the recognized statement. Likewise, the confirmation can be done via a signal light or a visual display.

Das Verfahren kann einen Schritt des Bereitstellens eines Namens einer Hierarchieebene eines aktuell ausgewählten Textsegments aufweisen, wobei der Name ansprechend auf einen Anfang des aktuellen Textsegments bereitgestellt wird. Unter einer Hierarchieebene kann ein Strukturelement einer Strukturierung des Texts verstanden werden. Beispielsweise kann ein Text einen Titel, einen Betreff, einen Hinweis auf einen Verfasser, eine Zusammenfassung und/oder eine Mehrzahl von Absätzen aufweisen.The method may include a step of providing a name of a hierarchical level of a currently selected text segment, the name being provided in response to a beginning of the current text segment. A hierarchical level can be understood as a structural element of structuring the text. For example, a text may include a title, a subject, an author reference, a summary, and / or a plurality of paragraphs.

Die vorliegende Erfindung schafft ferner eine Vorrichtung, die ausgebildet ist, um die Schritte eines der erfindungsgemäßen Verfahren in entsprechenden Einrichtungen durchzuführen bzw. umzusetzen. Auch durch diese Ausführungsvariante der Erfindung in Form einer Vorrichtung kann die der Erfindung zugrunde liegende Aufgabe schnell und effizient gelöst werden. The present invention further provides an apparatus configured to perform the steps of one of the methods of the invention in corresponding devices. Also by this embodiment of the invention in the form of a device, the object underlying the invention can be solved quickly and efficiently.

Unter einer Vorrichtung kann vorliegend ein elektrisches Gerät verstanden werden, das Sensorsignale verarbeitet und in Abhängigkeit davon Steuer- und/oder Datensignale ausgibt. Die Vorrichtung kann eine Schnittstelle aufweisen, die hard- und/oder softwaremäßig ausgebildet sein kann. Bei einer hardwaremäßigen Ausbildung können die Schnittstellen beispielsweise Teil eines sogenannten System-ASICs sein, der verschiedenste Funktionen der Vorrichtung beinhaltet. Es ist jedoch auch möglich, dass die Schnittstellen eigene, integrierte Schaltkreise sind oder zumindest teilweise aus diskreten Bauelementen bestehen. Bei einer softwaremäßigen Ausbildung können die Schnittstellen Softwaremodule sein, die beispielsweise auf einem Mikrocontroller neben anderen Softwaremodulen vorhanden sind.In the present case, a device can be understood as meaning an electrical device which processes sensor signals and outputs control and / or data signals in dependence thereon. The device may have an interface, which may be formed in hardware and / or software. In the case of a hardware-based embodiment, the interfaces can be part of a so-called system ASIC, for example, which contains a wide variety of functions of the device. However, it is also possible that the interfaces are their own integrated circuits or at least partially consist of discrete components. In a software training, the interfaces may be software modules that are present, for example, on a microcontroller in addition to other software modules.

Von Vorteil ist auch ein Computerprogrammprodukt mit Programmcode, der auf einem maschinenlesbaren Träger wie einem Halbleiterspeicher, einem Festplattenspeicher oder einem optischen Speicher gespeichert sein kann und zur Durchführung eines der Verfahren nach einer der vorstehend beschriebenen Ausführungsformen verwendet wird, wenn das Programmprodukt auf einem Computer oder einer Vorrichtung ausgeführt wird.Also of advantage is a computer program product with program code which is stored on a machine-readable carrier such as a semiconductor memory, may be stored in a hard disk memory or an optical memory and used to perform any of the methods of any of the embodiments described above when the program product is executed on a computer or device.

Die Erfindung wird nachstehend anhand der beigefügten Zeichnungen beispielhaft näher erläutert. Es zeigen:The invention will now be described by way of example with reference to the accompanying drawings. Show it:

1 ein Blockschaltbild einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; 1 a block diagram of an apparatus for influencing a text-to-speech output according to an embodiment of the present invention;

2 ein Ablaufdiagramm eines Verfahrens zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; 2 a flowchart of a method for influencing a text-to-speech output according to an embodiment of the present invention;

3 eine Darstellung einer ersten Gebärde eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; 3 a representation of a first gesture of an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention;

4 eine Darstellung einer zweiten Gebärde eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; 4 a representation of a second gesture of an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention;

5 eine Darstellung einer dritten Gebärde eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; 5 an illustration of a third gesture of an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention;

6 eine Darstellung einer vierten Gebärde eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; 6 a representation of a fourth gesture of an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention;

7 eine Darstellung verschiedener Hierarchieebenen eines Texts für eine Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung; und 7 a representation of various hierarchical levels of a text for speech output according to an embodiment of the present invention; and

8 eine Darstellung verschiedener Gebärden eines Bedieners zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. 8th an illustration of various gestures of an operator for influencing a text-to-speech output according to an embodiment of the present invention.

In der nachfolgenden Beschreibung bevorzugter Ausführungsbeispiele der vorliegenden Erfindung werden für die in den verschiedenen Figuren dargestellten und ähnlich wirkenden Elemente gleiche oder ähnliche Bezugszeichen verwendet, wobei auf eine wiederholte Beschreibung dieser Elemente verzichtet wird.In the following description of preferred embodiments of the present invention, the same or similar reference numerals are used for the elements shown in the various figures and similarly acting, wherein a repeated description of these elements is omitted.

1 zeigt ein Blockschaltbild einer Vorrichtung 100 zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Die Vorrichtung 100 weist eine Einrichtung 102 zum Erfassen, eine Einrichtung 104 zum Auswerten und eine Einrichtung 106 zum Anpassen auf. Die Vorrichtung 100 ist in einem Fahrzeug 108 angeordnet. Das Fahrzeug 108 weist eine Vorrichtung 110 zum Ausgeben von Text als Sprache gemäß einem Ausführungsbeispiel der vorliegenden Erfindung auf. Die Vorrichtung 110 zum Ausgeben ist dazu ausgebildet, die Text-zu-Sprache-Ausgabe durchzuführen. Die Vorrichtung 110 zum Ausgeben ist mit der Vorrichtung 100 zum Beeinflussen verbunden. Ferner ist die Vorrichtung 110 mit einem Lautsprecher 112 verbunden, der in dem Fahrzeug 108 angeordnet ist. Die Vorrichtung 110 zum Ausgeben weist eine Schnittstelle 114 zum Empfangen von auszugebendem Text auf. 1 shows a block diagram of a device 100 for influencing a text-to-speech output according to an embodiment of the present invention. The device 100 has a facility 102 to capture, a device 104 to evaluate and a device 106 to adjust to. The device 100 is in a vehicle 108 arranged. The vehicle 108 has a device 110 for outputting text as a language according to an embodiment of the present invention. The device 110 to output is adapted to perform the text-to-speech output. The device 110 to spend is with the device 100 connected to influence. Furthermore, the device 110 with a speaker 112 connected in the vehicle 108 is arranged. The device 110 to output has an interface 114 for receiving text to be output.

Die Einrichtung 102 zum Erfassen ist in diesem Ausführungsbeispiel eine Kamera 102, die einen Erfassungsbereich 116 aufweist, in dem Objekte abgebildet werden können. Der Erfassungsbereich 116 ist in einer Mindestentfernung von der Kamera 102 angeordnet. Wenn ein Bediener der Vorrichtung 100 innerhalb des Erfassungsbereichs 116 ein Körperteil 118 wie beispielsweise eine Hand bewegt, erfasst die Einrichtung 102 zum Erfassen die Bewegung. Wenn die Bewegung des Körperteils 118 vorbestimmte Kriterien erfüllt, dann erkennt die Einrichtung 102 zum Erfassen eine Gebärde des Bedieners und stellt eine Trajektorie der Bewegung als Gebärdeninformation bereit. Beispielsweise umfasst die Gebärdeninformation, von wo nach wo das Körperteil 118 von dem Bediener innerhalb des Erfassungsbereichs 116 bewegt worden ist. Die Einrichtung 104 zum Auswerten empfängt die Gebärdeninformation und wertet sie aus. Die Gebärdeninformation wird mit einer Mehrzahl von hinterlegten Gebärdenschemata verglichen, um eine mit der Gebärde logisch verknüpfte Anweisung des Bedieners zu erkennen. Die Einrichtung 106 zum Anpassen empfängt die Anweisung und passt einen Parameter der Vorrichtung 110 zum Ausgeben von Text als Sprache entsprechend der Anweisung des Bedieners an, um die Text-zu-Sprache-Ausgabe zu beeinflussen. The device 102 for detecting in this embodiment is a camera 102 that have a detection area 116 in which objects can be mapped. The coverage area 116 is at a minimum distance from the camera 102 arranged. When an operator of the device 100 within the coverage 116 a body part 118 as a hand moves, the device detects 102 to capture the movement. When the movement of the body part 118 meets predetermined criteria, then recognizes the device 102 for detecting a gesture of the operator and provides a trajectory of movement as sign information. For example, the sign information includes from where to where the body part 118 from the operator within the detection area 116 has been moved. The device 104 to evaluate receives the sign information and evaluates it. The sign information is compared with a plurality of stored sign schemes to recognize a gesture logically linked to the gesture of the operator. The device 106 to customize receives the instruction and adjusts a parameter of the device 110 to output text as a language according to the instruction of the operator to influence the text-to-speech output.

Mit anderen Worten zeigt 1 eine Vorrichtung 100 für einen gestengesteuerten Sprachzoom. Ein Sprachausgabe- und Sprachdialogsystem 110 im Fahrzeug 108 weist eine TTS (Text-to-Speech) Funktion auf, durch die beliebiger Text vorgelesen werden kann. Die Nutzung im Fahrzeug 108 wird durch den hier vorgestellten Ansatz sehr handlich und effizient. Beispielsweise kann eine Lesegeschwindigkeit variabel ausgeführt werden, wodurch eine kurze Zeit benötigt wird, um zu einem relevanten Teil des Textes zu gelangen. Dadurch können längere Textinhalte schneller erfasst werden, und nicht nur Wort für Wort vorgelesen werden. Eine Navigation durch verschiedene Elemente (z. B. Emails) und verzweigen in die Mail ist ohne Blick auf das Display zur Orientierung und ohne die damit verbundene Ablenkung möglich. Die Eingabe erfolgt primär ohne displaybasierte Ansätze, wie Drehdrücksteller, Touchscreen oder Lenkradfernbedienung. Bei dem Sprachdialogsystem 110 erfolgt die Steuerung der Auswahl über Gesten, was bei längeren Elementen (z. B. Betreffzeilen) schnell und damit effizient ist. Für eine Navigation im vorzulesenden Text ist der gestengesteuerte Sprachzoom sehr gut geeignet.In other words shows 1 a device 100 for a gesture-controlled voice zoom. A voice output and voice dialogue system 110 in the vehicle 108 has a TTS (Text-to-Speech) function that allows any text to be read. The use in the vehicle 108 becomes very handy and efficient with the approach presented here. For example, a read speed may be made variable, requiring a short time to get to a relevant part of the text. This allows you to capture longer text content faster, not just word by word be read out. Navigation through various elements (eg emails) and branching into the mail is possible without looking at the display for orientation and without the associated distraction. The input is primarily without display-based approaches, such as rotary trigger, touch screen or steering wheel remote control. In the speech dialogue system 110 Selection control is via gestures, which is fast and therefore efficient for longer items (eg subject lines). For a navigation in the text to be read, the gesture-controlled speech zoom is very well suited.

Freiraum-Gestensteuerung ermöglicht intuitive und ablenkungsarme Eingabekommandos z. B. per Handbewegung. Die Gesten und Gebärden können insbesondere e-Feld-, Radar- und/oder ultraschallbasiert erfasst werden. Free space gesture control allows intuitive and distraction-free input commands z. B. by hand. The gestures and gestures can be detected in particular e-field, radar and / or ultrasound-based.

Die intuitiven Gesten werden in der Vorrichtung 100 dazu genutzt, um die Schwächen in der Navigation durch Texte bei Verwendung von Sprachausgabe zu verbessern und so gleichzeitig die Ablenkung minimal zu halten. Zur Maximierung der Bedieneffizienz können mehrere Aspekte verbunden werden. Über die Gesten kann die Textausgabe selber gesteuert werden, beispielsweise schneller oder langsamer und/oder lauter oder leiser. Ebenso kann ein ausgegebener Inhalt über die Gesten gesteuert werden. Beispielsweise kann von einem Text zu einem anderen Text gesprungen werden. Innerhalb eines Texts kann eine Textstelle für die Ausgabe ausgewählt werden. Die Bezeichnung „Sprachzoom" rührt daher, dass es mit dem hier vorgestellten Ansatz möglich ist, mit großer Schnelligkeit von der obersten Ebene (z. B. Newsfeeds) durch die verschiedenen Ebenen (Feed, Artikel, Abstract, Text, Absatz, schnelles Lesen), bis zur gewünschten Textpassage „zoomen" kann.The intuitive gestures become in the device 100 used to improve the flaws in textual navigation when using voice output while minimizing distraction. To maximize operating efficiency, several aspects can be linked. The text output itself can be controlled via the gestures, for example faster or slower and / or louder or quieter. Likewise, an output content can be controlled via the gestures. For example, you can jump from one text to another. Within a text, a text passage can be selected for output. The term "language zoom" is due to the fact that it is possible with the approach presented here, with great speed from the top level (eg newsfeeds) through the various levels (feed, article, abstract, text, paragraph, quick reading) , until the desired text passage "zoom" can.

Der hier vorgestellte Ansatz kann beispielsweise bei Headunits verwendet werden. Grundsätzlich sind aber auch Instrument Cluster oder Center Stacks und ebenso Anwendungen mit jeglichen Tablets oder industriellen Bedienkonsolen oder Automaten denkbar.The approach presented here can be used, for example, in head units. In principle, however, instrument clusters or center stacks as well as applications with any tablets or industrial control consoles or machines are conceivable.

2 zeigt ein Ablaufdiagramm eines Verfahrens 200 zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Das Verfahren 200 weist einen Schritt 202 des Erfassens, einen Schritt 204 des Auswertens und einen Schritt 206 des Anpassens auf. Im Schritt 202 des Erfassens wird eine Gebärde eines Bedieners erfasst, um eine Gebärdeninformation zu erhalten. Im Schritt 206 des Auswertens wird die Gebärdeninformation ausgewertet, um eine Anweisung des Bedieners zu erkennen. Im Schritt 208 des Anpassens wird zumindest ein Parameter der Text-zu-Sprache-Ausgabe unter Verwendung der Anweisung angepasst, um die Text-zu-Sprache-Ausgabe zu beeinflussen. 2 shows a flowchart of a method 200 for influencing a text-to-speech output according to an embodiment of the present invention. The procedure 200 has a step 202 of grasping, one step 204 the evaluation and a step 206 of adapting. In step 202 Upon detecting, a gesture of an operator is detected to obtain a gesture information. In step 206 of the evaluation, the sign information is evaluated to recognize an instruction of the operator. In step 208 of matching, at least one parameter of the text-to-speech output is adjusted using the instruction to affect the text-to-speech output.

Mit anderen Worten zeigt 2 ein Ablaufdiagramm eines Interaktionskonzepts zur Steuerung von Sprachausgabe mittels intuitiver Gesten. Das Interaktionskonzept vereint eine Steuerung der Ausgabe und eine Steuerung der Inhalte und ermöglicht durch Ergänzung von Start und Stoppfunktion eine vollständige, effiziente und ablenkungsarme Navigation durch Textelemente und Text mittels Freiraum-Gestensteuerung. Dabei können einfache, intuitive Gesten zum Einsatz kommen. Der hier vorgestellte Ansatz beschreibt eine Steuerung, die völlig ohne Blickkontakt auskommt. Dabei kann es notwendig sein, dem Nutzer immer wieder Orientierung zu geben, wo er sich befindet. Ebenso kann es wichtig sein, Feedback zu geben, ob ein Kommando erkannt wurde. Beides lässt sich zusammen erreichen, indem nach einer Geste, die ein Kommando repräsentiert das Resultat per Sprachausgabe angesagt wird. Z. B. "Abonnierte Feeds" / „Artikel im Feed" / „lese Textinhalt / „nächster Absatz" / „schneller/langsamer". Zusätzlich kann noch ein sprachgesteuerter Start des Verfahrens 200 sinnvoll sein, der mit einer herkömmlichen Spracheingabe über Kommandos erfolgen kann. Z. B. „Lies SMS"; „Lies e-mail"; Lies „Newfeeds".In other words shows 2 a flow diagram of an interaction concept for controlling speech output using intuitive gestures. The interaction concept combines control of the output and control of the content and, by supplementing the start and stop functions, enables complete, efficient and distraction-free navigation through text elements and text using free-space gesture control. Simple, intuitive gestures can be used. The approach presented here describes a control that manages completely without eye contact. It may be necessary to give the user orientation where he is. Likewise, it may be important to give feedback as to whether a command has been recognized. Both can be achieved together by announcing the result by voice after a gesture representing a command. For example "Subscribed Feeds" / "Article in Feed" / "read text content /" next paragraph "/" faster / slower. "In addition, there may be a voice-controlled start of the procedure 200 make sense, which can be done with a conventional voice input via commands. For example, "read SMS";"reade-mail"; Read "Newfeeds".

In den 3, 4, 5 und 6 ist anhand beispielhafter Gebärden oder Gesten gezeigt, wie die Navigation und Sprachausgabe gesteuert werden kann. In the 3 . 4 . 5 and 6 is shown by way of example gestures or gestures how the navigation and voice output can be controlled.

3 zeigt eine Darstellung einer ersten Gebärde 300 eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Die erste Gebärde 300 ist dargestellt, wie sie von der Erfassungseinrichtung, wie er in 1 dargestellt ist, erfasst wird. Gezeigt ist als Körperteil eine Hand 302, die innerhalb des Erfassungsbereichs 118 vertikal bewegt wird. Dabei kann die erste Gebärde 300 zumindest eine Aufwärtsbewegung und/oder zumindest eine Abwärtsbewegung der Hand 302 umfassen. 3 shows a representation of a first gesture 300 an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention. The first gesture 300 is shown as coming from the capture device, as in 1 is shown is detected. Shown is a body part as a hand 302 within the scope 118 is moved vertically. It may be the first gesture 300 at least one upward movement and / or at least one downward movement of the hand 302 include.

Anhand von 3 wird beispielhaft die Navigation durch eine Struktur von gegliederten Textelementen beschrieben. Zur Navigation durch die Struktur (Ebenen) werden Gesten 300 in vertikale Richtung vorgeschlagen, da dies mit höher und tiefer verbunden wird. Eine Bewegung nach oben repräsentiert ein Kommando, in eine höhere Strukturebene zu wechseln, eine Bewegung nach unten repräsentiert ein Kommando, eine Ebene nach unten zu wechseln. (Also z. B. Feeds => Titel => Absätze oder bei Email (Absender => Betreff => Absätze)Based on 3 For example, the navigation through a structure of articulated text elements will be described. To navigate through the structure (levels) become gestures 300 in vertical direction, as this is connected with higher and lower. Moving up represents a command to move to a higher structural level, a move down represents a command to move one level down. (So eg Feeds => title => paragraphs or by email (sender => subject => paragraphs)

4 zeigt eine Darstellung einer zweiten Gebärde 400 eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Die Darstellung in 4 entspricht der Darstellung in 3. Die Hand 302 wird innerhalb des Erfassungsbereichs 118 horizontal, quer zu einer Erfassungsrichtung der Erfassungseinrichtung bewegt. Dabei kann die zweite Gebärde 400 zumindest eine Hinbewegung und/oder zumindest eine Herbewegung der Hand 302 umfassen. 4 shows a representation of a second gesture 400 an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention. The representation in 4 corresponds to the illustration in 3 , The hand 302 will be within the coverage 118 horizontally, transversely to a detection direction of the detection device moves. It can be the second gesture 400 at least one forward movement and / or at least one movement of the hand 302 include.

Anhand von 4 wird beispielhaft die Navigation durch die Elemente einer Ebene beschrieben. Es wird eine vertikale Geste 400 verwendet. Diese Links-, bzw. Rechtsbewegung kann analog zu einer Wischbewegung auf einem Touchscreen zum Blättern bzw. zur Auswahl der nächsten Einträge verwendet werden.Based on 4 By way of example, the navigation through the elements of a plane is described. It becomes a vertical gesture 400 used. This left or right movement can be used analogously to a swipe on a touch screen to scroll or to select the next entries.

5 zeigt eine Darstellung einer dritten Gebärde 500 eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Die Darstellung in 5 entspricht der Darstellung in den 3 und 4. Die Hand 302 wird innerhalb des Erfassungsbereichs 118 kreisförmig, quer zu der Erfassungsrichtung der Erfassungseinrichtung bewegt. Dabei kann die dritte Gebärde 500 zumindest eine Hinbewegung und/oder zumindest eine Herbewegung der Hand 302 umfassen. 5 shows a representation of a third gesture 500 an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention. The representation in 5 corresponds to the representation in the 3 and 4 , The hand 302 will be within the coverage 118 circular, moved transversely to the detection direction of the detection device. This can be the third gesture 500 at least one forward movement and / or at least one movement of the hand 302 include.

Anhand von 5 wird beispielhaft die Navigation innerhalb eines Textes beschrieben. Intuitiv erscheint hierbei eine kreisende Bewegung 500. Diese entspricht einer Bedienbewegung eines Drehknopfs, wie einem Jogshuttle. Analog kann hier durch Kreisen von einer Wiedergabegeschwindigkeit zur nächsten geschaltet werden.Based on 5 For example, the navigation within a text is described. Intuitively, a circular movement appears here 500 , This corresponds to a control movement of a rotary knob, such as a jog shuttle. Analog can be switched here by circles from one playback speed to the next.

6 zeigt eine Darstellung einer vierten Gebärde 600 eines Bedieners einer Vorrichtung zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Im Gegensatz zu den 3, 4 und 5 ist die vierte Gebärde 600 quer zu der Erfassungsrichtung der Erfassungseinrichtung 102 dargestellt. Die Hand 302 wird innerhalb des Erfassungsbereichs 118 auf die Erfassungseinrichtung 102 zu und/oder von der Erfassungseinrichtung 102 weg bewegt. Die Gebärde 600 kann beispielsweise horizontal ausgeführt werden. Dabei kann die vierte Gebärde 600 zumindest eine Vorbewegung und/oder zumindest eine Zurückbewegung der Hand 302 umfassen. 6 shows a representation of a fourth gesture 600 an operator of a device for influencing a text-to-speech output according to an embodiment of the present invention. In contrast to the 3 . 4 and 5 is the fourth gesture 600 transverse to the detection direction of the detection device 102 shown. The hand 302 will be within the coverage 118 on the detection device 102 to and / or from the detection device 102 moved away. The gesture 600 can be executed horizontally, for example. It can be the fourth gesture 600 at least one forward movement and / or at least one return movement of the hand 302 include.

Anhand von 6 wird beispielhaft ein Starten / Stoppen der Ausgabe beschrieben. Zum Stoppen der Sprachwiedergabe kann eine Push Geste 600 (in eine in die Zeichenebene hineinweisende Z-Achse) verwendet werden. Zum Fortsetzen bzw. Starten der Wiedergabe kann eine Pull Geste 600 (in eine aus der zeichenebene herausweisende Z-Achse) verwendet werden. Insbesondere die Stoppgeste 600 ist eine natürliche Geste, die auch bei der menschlichen Kommunikation Anwendung findet.Based on 6 For example, start / stop of the output will be described. To stop the voice playback can be a push gesture 600 (in a z-axis pointing into the plane of the drawing). To resume or start playback, a pull gesture 600 (in a Z-axis pointing out of the drawing plane). In particular, the stop gesture 600 is a natural gesture that is also used in human communication.

7 zeigt eine Darstellung verschiedener Hierarchieebenen 700, 702, 704 eines Texts für eine Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. Die Hierarchieebenen 700, 702, 704 sind in einer Tabelle dargestellt und für drei verschiedene Textarten beispielsweise dargestellt. Die Hierarchieebenen 700, 702, 704 sind in eine erste Ebene 700, eine zweite Ebene 702 und eine dritte Ebene 704 eingeteilt. Die Hierarchieebenen 700, 702, 704 sind als Zeilen der Tabelle dargestellt. In einer ersten Spalte der Tabelle sind die Hierarchieebenen 700, 702, 704 für Texte, beispielsweise Nachrichtentexte 706 (Newsfeed) angetragen. Die erste Ebene 700 der Nachrichtentexte 706 ist als Themengebiet oder Resort 708 (Feedname) angegeben. Die zweite Ebene 702 der Nachrichtentexte 706 ist als Titel und/oder Zusammenfassung 710 (Title + Abstract) angegeben. Die dritte Ebene 704 der Nachrichtentexte 706 ist als Absatz im Text 712 angegeben. In einer zweiten Spalte der Tabelle sind die Hierarchieebenen 700, 702, 704 für elektronische Post bzw. E-Mail 714 angetragen. Die erste Ebene 700 der E-Mail 714 ist als Absender 716 angegeben. Die zweite Ebene 702 der E-Mail 714 ist als Betreff 718 angegeben. Die dritte Ebene 704 der E-Mail 714 st als Absatz im Text 720 angegeben. In einer dritten Spalte der Tabelle sind die Hierarchieebenen 700, 702, 704 für Kurznachrichten bzw. SMS 722 angetragen. Die erste Ebene 700 der SMS 722 ist als Absender 724 angegeben. Die zweite Ebene 702 der SMS 722 ist als Satz im Text 726 angegeben. Bei den SMS 722 ist die dritte Ebene 704 nicht belegt. Die in 7 beispielhaft angeführten Hierarchieebenen 700, 702, 704 können über Gebärden des Bedieners zum Vorlesen ausgewählt werden. Zwischen den einzelnen Textkategorien 706, 714, 722 kann ebenfalls durch Gebärden gewechselt werden. Beispielsweise kann der Wechsel analog zu einem sequenziellen Menü erfolgen. 7 shows a representation of different hierarchy levels 700 . 702 . 704 a text-to-speech output text according to an embodiment of the present invention. The hierarchy levels 700 . 702 . 704 are shown in a table and shown for three different text types, for example. The hierarchy levels 700 . 702 . 704 are in a first level 700 , a second level 702 and a third level 704 assigned. The hierarchy levels 700 . 702 . 704 are shown as rows of the table. In a first column of the table are the hierarchy levels 700 . 702 . 704 for texts, such as message texts 706 (Newsfeed) offered. The first level 700 the message texts 706 is as a topic or resort 708 (Feed name). The second level 702 the message texts 706 is as a title and / or summary 710 (Title + Abstract). The third level 704 the message texts 706 is as a paragraph in the text 712 specified. In a second column of the table are the hierarchy levels 700 . 702 . 704 for electronic mail or e-mail 714 plotted. The first level 700 the e-mail 714 is the sender 716 specified. The second level 702 the e-mail 714 is as the subject 718 specified. The third level 704 the e-mail 714 st as a paragraph in the text 720 specified. In a third column of the table are the hierarchy levels 700 . 702 . 704 for short messages or sms 722 plotted. The first level 700 the SMS 722 is the sender 724 specified. The second level 702 the SMS 722 is as a sentence in the text 726 specified. At the SMS 722 is the third level 704 not used. In the 7 exemplified hierarchy levels 700 . 702 . 704 can be selected by the operator to read aloud. Between the individual text categories 706 . 714 . 722 can also be changed by sign. For example, the change can be made analogously to a sequential menu.

Mit anderen Worten zeigt 7 ein Beispiel zur Navigation durch Textelemente mittels Gliederung. Die Gliederung kann zum schnelleren Navigieren durch größere Texte bei Sprachausgabe verwendet werden. Aufgrund der Gliederung kann schnell durch Textelemente und Texte bis zur relevanten Textpassage navigiert werden. Vorgeschlagen wird, die Textelemente zu strukturieren, um sich über Gesten in der Struktur schnell bewegen zu können. 3 Ebenen 700, 702, 704 für die Struktur erscheinen beispielsweise für E-Mails und Nachrichtentexte ausreichend.In other words shows 7 an example of navigation through text elements by structure. The outline can be used for faster navigation through larger texts in speech output. Due to the structure, text elements and texts can be quickly navigated to the relevant text passage. It is suggested to structure the text elements in order to move quickly through gestures in the structure. 3 levels 700 . 702 . 704 for the structure, for example, e-mails and message texts are sufficient.

8 zeigt eine Darstellung verschiedener Aktionen 800 zum Beeinflussen einer Text-zu-Sprache-Ausgabe gemäß einem Ausführungsbeispiel der vorliegenden Erfindung. In einer ersten Spalte einer Tabelle sind die Aktionen 800 aufgelistet. In einer zweiten Spalte sind Beschreibungen 802 von Gebärden oder Gesten angetragen. Illustrationen 804 der Gebärden sind in einer dritten Spalte angetragen. In einer vierten Spalte sind beispielhaft den Gebärden oder Gesten zugeordnete, über die Text-zu-Sprache-Ausgabe auszugebende Bestätigungen 806 angetragen. Die Illustrationen 804 sind so dargestellt, wie eine Einrichtung zum Erfassen, wie sie in 1 gezeigt ist die Gebärde oder Geste erfasst. Deshalb sind die Illustrationen 804 spiegelbildlich zu der Beschreibung 802 der Gebärden oder Gesten in der zweiten Spalte. 8th shows a representation of various actions 800 for influencing a text-to-speech output according to an embodiment of the present invention. In a first column of a table are the actions 800 listed. In a second column are descriptions 802 by gestures or gestures. illustrations 804 Signs are in a third column. In a fourth column, confirmations associated with gestures or gestures to be output via the text-to-speech output are exemplary 806 plotted. The illustrations 804 are presented as a means of detecting how they are in 1 shown is the gesture or gesture captured. That's why the illustrations are 804 mirror image of the description 802 the gestures or gestures in the second column.

In einer ersten Zeile 808 der Tabelle ist als erste Aktion der Aktionen 800 schneller Lesen eingetragen. Das schneller Lesen erfolgt dabei von Kommando zu Kommando in einer ansteigenden Sequenz, von normal über schneller zu sehr schnell zu Scannen. Dabei wird der Text mit jeweils einer zugeordneten Ausgabegeschwindigkeit vorgelesen. Die Beschreibung 802 der Gebärde zu der ersten Aktion lautet Kreisen im Uhrzeigersinn, nächste Stufe nach ca. 0,8 sek. Als Illustration 804 ist eine kreisförmige Bewegung eines Körperteils, wie in 5 dargestellt. Zu der zweiten Aktion ist eine kreisförmige Bewegung einer Hand gegen den Uhrzeigersinn dargestellt. Als Bestätigungstext 806 ist „schneller“ angetragen. Das schneller Lesen kann zum schnelleren Navigieren durch größere Texte bei Sprachausgabe verwendet werden. In a first line 808 the table is the first action of the actions 800 faster reading registered. The faster reading takes place from command to command in an increasing sequence, from normal to faster to very fast to scan. The text is read out with an assigned output speed. The description 802 the gesture to the first action is circles clockwise, next level after about 0.8 sec. As illustration 804 is a circular movement of a body part, as in 5 shown. For the second action, a circular movement of a hand counterclockwise is shown. As confirmation text 806 is offered "faster". The faster reading can be used to navigate faster through larger texts in speech output.

Üblicherweise ist bei TTS eine Lesegeschwindigkeit voreingestellt, die man als eher langsam bezeichnen kann. Diese ist für die gute Verständlichkeit von z. B. Menüpunkten empfehlenswert, erfordert in diesem Beispiel auf der anderen Seite auch nicht zu viel Zeit zum Zuhören. Für längere Texte (z. B. E-Mails oder Newsfeeds) ist jedoch eine schnellere und/oder variable Vorlesegeschwindigkeit vorteilhaft. Beispielsweise können drei Geschwindigkeitsstufen für die Sprachausgabe sowie als vierte Stufe ein Scan-Modus eingeführt werden. Dabei bezeichnet die Normale Geschwindigkeit eine normale (mäßige) Lesegeschwindigkeit. Das schnelle Vorlesen bezeichnet eine Ausgabe mit doppelter Geschwindigkeit. Das sehr schnelle Vorlesen bezeichnet eine Ausgabe mit dreifacher bis vierfacher Geschwindigkeit (abhängig von Deutlichkeit der Sprachengine). Im Scan-Modus (springen) können lediglich die ersten fünf Wörter eines Satzes schnell gelesen werden (doppelte Geschwindigkeit). Weiterhin können nur die ersten 3 Sätze eines Absatzes gelesen werden. Danach kann in den nächsten Absatz gesprungen werden. Auch hier kann mit Gesten zwischen den Stufen gewechselt werden.Usually TTS has a read speed that can be described as rather slow. This is for the good understanding of z. For example, menu items recommended in this example on the other side does not require too much time to listen. For longer texts (eg e-mails or newsfeeds), however, a faster and / or variable reading speed is advantageous. For example, three speed levels can be introduced for the voice output as well as a fourth level, a scan mode. The Normal Speed indicates a normal (moderate) reading speed. Fast reading means double speed output. The very fast reading means an output with three to four times the speed (depending on the clarity of the language engine). In the scan mode (jump) only the first five words of a sentence can be read quickly (double speed). Furthermore, only the first 3 sentences of a paragraph can be read. Then you can jump to the next paragraph. Again, gestures can be used to switch between levels.

In einer zweiten Zeile 810 der Tabelle ist als zweite Aktion der Aktionen 800 langsamer Lesen angetragen. Das langsamer Lesen erfolgt dabei in einer absteigenden Sequenz von Scannen über sehr schnell zu schneller zu normal zu Stop. Die Beschreibung 802 zu der zweiten Aktion lautet Kreisen gegen den Uhrzeigersinn, nächste Stufe nach ca. 0,8 sek. Als Illustration 804 ist eine kreisförmige Bewegung eines Körperteils, wie in 5 dargestellt. Zu der zweiten Aktion ist eine kreisförmige Bewegung einer Hand im Uhrzeigersinn dargestellt. Als Bestätigungstext 806 ist „langsamer“ angetragen.In a second line 810 the table is the second action of the actions 800 submitted slowly reading. Slow reading is done in a descending sequence from scanning over very fast to fast to normal to stop. The description 802 to the second action is circles counterclockwise, next stage after about 0.8 sec. As illustration 804 is a circular movement of a body part, as in 5 shown. For the second action, a circular movement of a hand is shown in a clockwise direction. As confirmation text 806 is submitted "slower".

In einer dritten Zeile 812 der Tabelle ist als dritte Aktion der Aktionen 800 stoppe Lesen angetragen. Die Beschreibung 802 zu der dritten Aktion lautet Z-Achse in Richtung Fläche (Push). Als Illustration 804 ist eine lineare Bewegung eines Körperteils, wie in 6 dargestellt. Zu der dritten Aktion ist eine geradlinige Bewegung einer Hand auf die Einrichtung zum Erfassen zu dargestellt. Als Bestätigungstext 806 ist „Pause“ angetragen.In a third line 812 the table is the third action of the actions 800 stop reading submitted. The description 802 to the third action is Z-axis in the direction of surface (push). As illustration 804 is a linear movement of a body part, as in 6 shown. For the third action, a straight-line movement of a hand to the means for detecting is shown. As confirmation text 806 is paused.

In einer vierten Zeile 814 der Tabelle ist als vierte Aktion der Aktionen 800 starte Lesen angetragen. Die Beschreibung 802 zu der vierten Aktion lautet Z-Achse von der Fläche weg (Pull). Als Illustration 804 ist eine lineare Bewegung eines Körperteils, wie in 6 dargestellt. Zu der vierten Aktion ist eine geradlinige Bewegung einer Hand von der Einrichtung zum Erfassen weg dargestellt. Als Bestätigungstext 806 ist kein Kommando angetragen. Stattdessen wird der Text mit normaler Geschwindigkeit (weiter) gelesen. In a fourth line 814 the table is the fourth action of the actions 800 Start reading Posted. The description 802 for the fourth action, Z-axis is off the surface (pull). As illustration 804 is a linear movement of a body part, as in 6 shown. For the fourth action, a rectilinear motion of a hand away from the detection device is shown. As confirmation text 806 No command is requested. Instead, the text is read at normal speed (continued).

In einer fünften Zeile 816 der Tabelle ist als fünfte Aktion der Aktionen 800 ein Wechsel zu einem anderen Textelement auf derselben Hierarchieebene, wie ein aktuell vorgelesenes Textelement angetragen. Dabei ist das andere Textelement in einer Reihenfolge vor dem aktuellen Textelement angeordnet (ein Element zurück in derselben Ebene). Die Beschreibung 802 zu der fünften Aktion lautet horizontale Bewegung links. Als Illustration 804 ist eine lineare Bewegung eines Körperteils, wie in 4 dargestellt. Zu der fünften Aktion ist eine geradlinige Bewegung einer Hand von rechts nach links dargestellt. Als Bestätigungstext 806 ist „zurück“ angetragen.In a fifth line 816 the table is the fifth action of the actions 800 a change to another text element at the same hierarchical level as a currently read text element. In this case, the other text element is arranged in an order before the current text element (an element back in the same plane). The description 802 to the fifth action is horizontal movement left. As illustration 804 is a linear movement of a body part, as in 4 shown. To the fifth action is a rectilinear motion of a hand is shown from right to left. As confirmation text 806 is "back" advertised.

In einer sechsten Zeile 818 der Tabelle ist als sechste Aktion der Aktionen 800 ein Wechsel zu einem anderen Textelement auf derselben Hierarchieebene, wie ein aktuell vorgelesenes Textelement angetragen. Dabei ist das andere Textelement in einer Reihenfolge nach dem aktuellen Textelement angeordnet (springe zum nächsten Element derselben Ebene). Die Beschreibung 802 zu der sechsten Aktion lautet horizontale Bewegung rechts. Als Illustration 804 ist eine lineare Bewegung eines Körperteils, wie in 4 dargestellt. Zu der sechsten Aktion ist eine geradlinige Bewegung einer Hand von links nach rechts dargestellt. Als Bestätigungstext 806 ist „vor“ angetragen.In a sixth line 818 the table is called the sixth action of the actions 800 a change to another text element at the same hierarchical level as a currently read text element. The other text element is arranged in order of the current text element (jump to the next element of the same level). The description 802 to the sixth action is horizontal movement right. As illustration 804 is a linear movement of a body part, as in 4 shown. For the sixth action, a straight-line movement of a hand is shown from left to right. As confirmation text 806 is offered "before".

In einer siebten Zeile 820 der Tabelle ist als siebte Aktion der Aktionen 800 ein Wechsel zu einem anderen Textelement auf einer niedrigeren Hierarchieebene, angetragen. Dabei weist das andere Textelement im Vergleich zu dem aktuellen Textelement eine niedrigere Hierarchieebene auf (eine Ebene tiefer). Die Beschreibung 802 zu der siebten Aktion lautet vertikale Bewegung nach unten. Als Illustration 804 ist eine lineare Bewegung eines Körperteils, wie in 3 dargestellt. Zu der siebten Aktion ist eine geradlinige Bewegung einer Hand von oben nach unten dargestellt. Als Bestätigungstext 806 ist ein Name der Hierarchieebene angetragen.In a seventh line 820 the table is the seventh action of the actions 800 a change to another text element at a lower hierarchical level. The other text element has a lower hierarchy level compared to the current text element (one level lower). The description 802 the seventh action is vertical downward movement. As illustration 804 is a linear movement of a body part, as in 3 shown. For the seventh action a straightforward movement of a hand is shown from top to bottom. As confirmation text 806 is a name of the hierarchical level.

In einer achten Zeile 822 der Tabelle ist als achte Aktion der Aktionen 800 ein Wechsel zu einem anderen Textelement auf einer höheren Hierarchieebene, angetragen. Dabei weist das andere Textelement im Vergleich zu dem aktuellen Textelement eine höhere Hierarchieebene auf (eine Ebene höher). Die Beschreibung 802 zu der achten Aktion lautet vertikale Bewegung nach oben. Als Illustration 804 ist eine lineare Bewegung eines Körperteils, wie in 3 dargestellt. Zu der achten Aktion ist eine geradlinige Bewegung einer Hand von unten nach oben dargestellt. Als Bestätigungstext 806 ist ein Name der Hierarchieebene angetragen.In an eighth line 822 the table is the eighth action of the actions 800 a change to another text element at a higher hierarchical level. In this case, the other text element has a higher hierarchical level (one level higher) than the current text element. The description 802 The eighth action is vertical upward movement. As illustration 804 is a linear movement of a body part, as in 3 shown. For the eighth action, a straightforward movement of a hand from bottom to top is shown. As confirmation text 806 is a name of the hierarchical level.

Mit anderen Worten zeigt 8 ein Interaktionskonzept zur Steuerung von Sprachausgabe mittels intuitiver Gesten. Gezeigt ist eine Übersicht der Freiraum-Gestensteuerung. Durch die Gesten kann die Sprachausgabe frei von optischen Ablenkungen gesteuert werden. Die Gestensteuerung ermöglicht die Navigation durch Textelemente mit Gliederung mittels intuitiven und damit leicht erlernbaren Gesten. Durch den hier vorgestellten Ansatz ist das Vorlesen von E-Mails oder umfangreichen Newsfeeds sowie SMS während der Fahrt durch TTS (Text-to-Speech) aufgrund der verbesserten Navigation durch die Elemente und Texte im Fahrzeug effizient nutzbar.In other words shows 8th an interaction concept for controlling speech output using intuitive gestures. Shown is an overview of the free space gesture control. Through the gestures, the speech output can be controlled free of optical distractions. The gesture control allows the navigation through text elements with structure using intuitive and thus easy to learn gestures. The approach presented here makes it possible to efficiently read out e-mails or extensive news feeds as well as text messages while driving through TTS (Text-to-Speech) due to the improved navigation through the elements and texts in the vehicle.

Mit dem hier vorgestellten Ansatz ist unter Verwendung von beispielsweise acht einfachen, intuitiven Gesten eine umfassende (vollständige) Steuerung einer Sprachausgabe durch strukturierte Textelemente und innerhalb von Text möglich, wodurch die Effizienz der Sprachausgabe erheblich gesteigert wird. Damit ist erstmals ein handhabbarer Gebrauch von Sprachausgabe für Anwendungen wie E-Mail, Newsfeed, aber auch eine deutliche Verbesserung für SMS möglich.With the approach presented here, for example, using eight simple, intuitive gestures, a comprehensive (complete) control of a speech output by means of structured text elements and within text is possible, whereby the efficiency of the speech output is considerably increased. This is the first time a manageable use of speech output for applications such as e-mail, newsfeed, but also a significant improvement for SMS possible.

Die beschriebenen und in den Figuren gezeigten Ausführungsbeispiele sind nur beispielhaft gewählt. Unterschiedliche Ausführungsbeispiele können vollständig oder in Bezug auf einzelne Merkmale miteinander oder auch multimodal mit anderen Ein-/Ausgabekanälen wie Display-Anzeige oder Sprach- oder Dreh-/Drückstellereingaben kombiniert werden. Auch kann ein Ausführungsbeispiel durch Merkmale eines weiteren Ausführungsbeispiels ergänzt werden. The embodiments described and shown in the figures are chosen only by way of example. Different embodiments may be combined with each other in their entirety or in relation to individual features or also multimodally with other input / output channels such as display display or voice or rotary / push-button inputs. Also, an embodiment can be supplemented by features of another embodiment.

Ferner können erfindungsgemäße Verfahrensschritte wiederholt sowie in einer anderen als in der beschriebenen Reihenfolge ausgeführt werden. Furthermore, method steps according to the invention can be repeated as well as carried out in a sequence other than that described.

Umfasst ein Ausführungsbeispiel eine „und/oder“ -Verknüpfung zwischen einem ersten Merkmal und einem zweiten Merkmal, so ist dies so zu lesen, dass das Ausführungsbeispiel gemäß einer Ausführungsform sowohl das erste Merkmal als auch das zweite Merkmal und gemäß einer weiteren Ausführungsform entweder nur das erste Merkmal oder nur das zweite Merkmal aufweist.If an exemplary embodiment comprises a "and / or" link between a first feature and a second feature, then this is to be read so that the embodiment according to one embodiment, both the first feature and the second feature and according to another embodiment either only first feature or only the second feature.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

DE 102008051757 A1 [0002]

Claims

Procedure ( 200 ) for influencing a text-to-speech output to an operator, the method ( 200 ) has the following steps: capture ( 202 ) of a gesture ( 300 . 400 . 500 . 600 ) of the operator in a gesture information; Evaluate ( 204 ) the gesture information to recognize an instruction of the operator; To adjust ( 206 ) of at least one parameter of the text-to-speech output using the instruction to affect the text-to-speech output.

Procedure ( 200 ) according to claim 1, wherein in step ( 202 ) grasping the gesture ( 300 . 400 . 500 . 600 ) as a spatial sequence of movements of at least one body part ( 118 ) of the operator within a detection room ( 116 ) a detection device ( 102 ) is detected.

Procedure ( 200 ) according to one of the preceding claims, wherein in step ( 204 ) evaluating the sign information and the voice command are evaluated in response to a voice command of the operator to recognize the instruction of the operator.

Procedure ( 200 ) according to one of the preceding claims, comprising a step of comparing the instruction with a previously recognized instruction of the operator to detect a sequence of instructions, wherein in step ( 206 ) of adjusting the parameters is further adjusted using the sequence of instructions.

Procedure ( 200 ) according to one of the preceding claims, wherein in step ( 204 ) evaluating the instruction as an acceleration command, a deceleration command, a start command, a pause command, a forward command, a backward command, an up command, and / or a down command is detected.

Procedure ( 200 ) according to claim 5, wherein in step ( 206 ), in response to the deceleration command, the text is output more slowly, the text-to-speech output is started in response to the start command, the text-to-speech output is triggered in response to the pause command. Output is issued in response to the forward command, a subsequent text element is output, a previous text element is output in response to the backward command, a text element of a higher hierarchical level is output in response to the up command, and / or a text element of a lower hierarchy level is output in response to the down command.

Procedure ( 200 ) according to one of the preceding claims, comprising a step of issuing an acknowledgment of the recognition of the instruction.

Procedure ( 200 ) according to one of the preceding claims, comprising a step of providing a name of a hierarchical level of a currently selected text segment, the name being provided in response to a beginning of the current text segment.

A method of outputting text as speech, the method comprising the step of: providing a speech signal using at least one parameter obtained by a method ( 200 ) is affected according to one of claims 1 to 8.

An apparatus comprising units adapted to perform or drive the steps of a method according to any one of claims 1 to 9.

Computer program product with program code for carrying out the method according to one of claims 1 to 9, when the program product is executed on a device.