DE102023112337A1

DE102023112337A1 - METHOD FOR CONTROLLING PROCESSES USING VOICE COMMAND INPUT

Info

Publication number: DE102023112337A1
Application number: DE102023112337.8A
Authority: DE
Inventors: Jörg Jonas-Kops
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-05-10
Filing date: 2023-05-10
Publication date: 2023-11-16

Abstract

Die Erfindung betrifft ein Verfahren zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe mit den Verfahrensschritten Erfassen einer Spracheingabepause, Erfassen einer Spracheingabe, Identifizieren der Spracheingabe als Sprachbefehl zur Ausführung eines Prozessschrittes, Zuordnen der erfassten Spracheingabe zu einem Prozessschritt und Starten des dem Spracheingabe zugeordneten Prozessschrittes, wobei die Spracheingabe nur dann als Sprachbefehl identifiziert wird, wenn die Spracheingabepause unmittelbar zeitlich benachbart zur Spracheingabe erfasst wird.The invention relates to a method for controlling processes by means of a voice command input with the method steps of detecting a voice input pause, detecting a voice input, identifying the voice input as a voice command for executing a process step, assigning the captured voice input to a process step and starting the process step assigned to the voice input, wherein the Voice input is only identified as a voice command if the voice input pause is recorded immediately adjacent to the voice input.

Description

Stand der TechnikState of the art

Elektronische Geräte, die Spracheingabe verstehen und umsetzen, sind bekannt. Derartige Geräte sind z.B. Navigationsgeräte, Smartphones, Smartwatches, Head-Mounted-Devices (HMD) und Augmented-Reality-Systeme (AR). Derartige Systeme werden z.B. im industriellen Bereich (insbesondere Industrie 4.0) der Predictive Maintenance (vorausschauende Wartung) eingesetzt werden, in dem Nutzer Prozesse, wie z.B.: Prüfpläne, Qualitätsprüfungen in Form von Soll-Ist-Vergleichen oder in Form von Schritt für Schritt Anleitungen abarbeiten sowie die Ergebnisse dokumentieren und analysieren.Electronic devices that understand and implement voice input are known. Such devices include navigation devices, smartphones, smartwatches, head-mounted devices (HMD) and augmented reality systems (AR). Such systems will be used, for example, in the industrial area (especially Industry 4.0) of predictive maintenance, in which users process processes such as: test plans, quality checks in the form of target/actual comparisons or in the form of step-by-step instructions and document and analyze the results.

Die genannten Geräte weisen verschiedene Applikationen (Apps) auf und/oder haben über eine meistens drahtlose Verbindung Zugriff auf Apps, deren Funktionen basierend auf der Spracheingabe ausgewählt werden können. Wenn ein Gerät allerdings Zugriff auf viele unterschiedliche Apps hat, kann ein Nutzer nur sehr schwer die Befehle der Spracheingabe memorieren, um auf eine bestimmte Funktion zuzugreifen.The devices mentioned have various applications (apps) and/or have access to apps via a mostly wireless connection, the functions of which can be selected based on voice input. However, if a device has access to many different apps, it is very difficult for a user to memorize voice commands to access a specific function.

Es ist daher Aufgabe der Erfindung, ein Verfahren zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe bereitzustellen, mittels dem sowohl der Nutzer als auch die Applikation eine Spracheingabe eindeutig unterscheiden kann, Verwechslungen der Spracheingabemöglichkeiten vermieden werden und daher die einzelnen Prozessschritte eines Prozesses sicherer angesteuert werden können.It is therefore the object of the invention to provide a method for controlling processes by means of voice command input, by means of which both the user and the application can clearly distinguish between voice input, confusion between the voice input options can be avoided and therefore the individual process steps of a process can be controlled more reliably.

Es ist weiterhin Aufgabe der Erfindung, ein Softwareprogramm zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe bereitzustellen, mittels dem sowohl der Nutzer als auch die Applikation eine Spracheingabe eindeutig unterscheiden kann, Verwechslungen der Spracheingabemöglichkeiten vermieden werden und daher die einzelnen Prozessschritte eines Prozesses sicherer angesteuert werden können.It is a further object of the invention to provide a software program for controlling processes by means of voice command input, by means of which both the user and the application can clearly distinguish between voice input, confusion between the voice input options can be avoided and therefore the individual process steps of a process can be controlled more reliably.

Es ist außerdem Aufgabe der Erfindung, eine Datenbrille zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe bereitzustellen, mittels dem sowohl der Nutzer als auch die Applikation eine Spracheingabe eindeutig unterscheiden kann, Verwechslungen der Spracheingabemöglichkeiten vermieden werden und daher die einzelnen Prozessschritte eines Prozesses sicherer angesteuert werden können.It is also an object of the invention to provide data glasses for controlling processes by means of voice command input, by means of which both the user and the application can clearly distinguish between voice input, confusion between the voice input options can be avoided and therefore the individual process steps of a process can be controlled more reliably.

Die Aufgabe wird mittels des Verfahrens zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe gemäß Anspruch 1 gelöst. Vorteilhafte Ausführungen der Erfindung sind in den Unteransprüchen dargelegt.The task is achieved using the method for controlling processes using voice command input according to claim 1. Advantageous embodiments of the invention are set out in the subclaims.

Das erfindungsgemäße Verfahren zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe weist fünf Verfahrensschritte auf: Im ersten Verfahrensschritt wird eine Spracheingabepause erfasst. Im zweiten Verfahrensschritt wird eine Spracheingabe erfasst. Das Erfassen umfasst das akustische Aufnehmen der Spracheingabe sowie deren Speicherung. Um eine Spracheingabe als solche zu erkennen, muss die Spracheingabe einen Mindestschalldruck aufweisen, m.a.W. von einem Nutzer in einer Mindestlautstärke und/oder in einer unmittelbaren Nähe zum Mikrofon gesprochen werden. Dadurch wird einerseits sichergestellt, dass die Spracheingabe von dem Nutzer intendiert ist, andererseits wird die Spracheingabe deutlich von etwaig vorhandenen Hintergrundgeräuschen unterschieden.The method according to the invention for controlling processes by means of voice command input has five method steps: In the first method step, a voice input pause is detected. In the second step of the process, a voice input is recorded. Capturing includes acoustically recording the voice input and storing it. In order to recognize a voice input as such, the voice input must have a minimum sound pressure, i.e. spoken by a user at a minimum volume and/or in close proximity to the microphone. On the one hand, this ensures that the voice input is intended by the user, and on the other hand, the voice input is clearly distinguished from any background noise that may be present.

Im dritten Verfahrensschritt wird die Spracheingabe als Sprachbefehl zur Ausführung eines Prozessschrittes identifiziert. Im vierten Verfahrensschritt wird die erfasste Spracheingabe einem Prozessschritt zugeordnet. Ein Prozess kann z.B. der nächste Prozessschritt in einem Entscheidungsbaum sein, in dem mehrere Prozessschritte aufeinander folgen, z.B. bei einem Computerprogramm. Im fünften Verfahrensschritt wird der der Spracheingabe zugeordnete Prozessschritt gestartet. Der zugeordnete Prozess wird gestartet, optional durch eine weitere Spracheingabe eines Nutzers.In the third procedural step, the voice input is identified as a voice command to execute a process step. In the fourth step of the process, the recorded voice input is assigned to a process step. A process can, for example, be the next process step in a decision tree in which several process steps follow one another, e.g. in a computer program. In the fifth procedural step, the process step assigned to voice input is started. The assigned process is started, optionally by further voice input from a user.

Bei der Identifizierung der Spracheingabe als Sprachbefehl wird die Spracheingabe nur dann als Sprachbefehl identifiziert, wenn die Spracheingabepause unmittelbar zeitlich benachbart zur Spracheingabe erfasst wird. Der Sprachbefehl wird also durch das Auftreten einer Spracheingabepause definiert. Der Nutzer legt bewusst eine Spracheingabepause ein, um den Sprachbefehl deutlich als solchen zu kennzeichnen. Durch diese Kennzeichnung ist für das Spracherkennungssystem ein Sprachbefehl eindeutig als solcher erkennbar. Dadurch verkürzt sich die Erkennung der Spracheingabe erheblich. Aufwändige und damit kostenintensive Spracherkennungssysteme sind ebenfalls nicht notwendig. Es genügen Spracherkennungssysteme, wie sie z.B. in handelsüblichen Smartphones zur Verfügung stehen.When identifying the voice input as a voice command, the voice input is only identified as a voice command if the voice input pause is recorded immediately adjacent to the voice input. The voice command is therefore defined by the occurrence of a voice input pause. The user consciously pauses the voice input in order to clearly mark the voice command as such. This marking means that the voice recognition system receives a voice command clearly recognizable as such. This significantly shortens the recognition of voice input. Complex and therefore cost-intensive speech recognition systems are also not necessary. Speech recognition systems, such as those available in commercially available smartphones, are sufficient.

Eine Spracheingabepause ist im Rahmen dieser Schrift ein zeitliches Intervall üblicherweise von variabler Zeitdauer ohne ein Sprachsignal, eine Spracheingabe und/oder einen Sprachbefehl eines Nutzers. Auch das Auftreten von Hintergrundgeräuschen ohne Sprachsignal, eine Spracheingabe und/oder Sprachbefehl eines Nutzers wird als Spracheingabepause angesehen.In the context of this document, a voice input pause is a time interval, usually of variable duration, without a voice signal, a voice input and/or a voice command from a user. The occurrence of background noise without a voice signal, a voice input and/or a voice command from a user is also considered a voice input pause.

In einer weiteren Ausführung der Erfindung beträgt der Schalldruck der erfassten Spracheingabe größer 40 dB, bevorzugt größer 45 dB und besonders bevorzugt größer 55 dB. Die übliche Zimmerlautstärke beträgt rund 55 dB. Dadurch wird einerseits sichergestellt, dass die Spracheingabe intendiert ist, andererseits wird die Spracheingabe deutlich von etwaig vorhandenen Hintergrundgeräuschen über Zimmerlautstärke unterschieden. Das erfindungsgemäße Verfahren ist daher auch in Umgebungen einsetzbar, die eine hohe Umgebungslautstärke aufweisen.In a further embodiment of the invention, the sound pressure of the detected voice input is greater than 40 dB, preferably greater than 45 dB and particularly preferably greater than 55 dB. The usual room noise is around 55 dB. On the one hand, this ensures that the voice input is intended, and on the other hand, the voice input is clearly differentiated from any background noise that may be present above room volume. The method according to the invention can therefore also be used in environments that have a high ambient noise level.

In einer weiteren Gestaltung der Erfindung wird die Spracheingabepause unmittelbar vor der Spracheingabe erfasst. Der Nutzer legt bewusst unmittelbar vor dem Sprachbefehl eine Spracheingabepause ein, um den Sprachbefehl deutlich als solchen zu kennzeichnen. Durch diese Kennzeichnung ist für das Spracherkennungssystem ein Sprachbefehl eindeutig als solcher erkennbar.In a further embodiment of the invention, the speech input pause is recorded immediately before the speech input. The user deliberately pauses the voice input immediately before the voice command in order to clearly identify the voice command as such. This marking means that a voice command can be clearly recognized as such by the voice recognition system.

In einer Weiterbildung der Erfindung wird die Spracheingabepause unmittelbar nach der Spracheingabe erfasst. Der Nutzer legt bewusst unmittelbar nach dem Sprachbefehl eine Spracheingabepause ein, um den Sprachbefehl deutlich als solchen zu kennzeichnen. Durch diese Kennzeichnung ist für das Spracherkennungssystem ein Sprachbefehl eindeutig als solcher erkennbar. Möglich ist auch eine Spracheingabepause unmittelbar vor und nach dem Sprachbefehl.In a further development of the invention, the speech input pause is recorded immediately after the speech input. The user deliberately pauses the voice input immediately after the voice command in order to clearly identify the voice command as such. This marking means that a voice command can be clearly recognized as such by the voice recognition system. It is also possible to pause voice input immediately before and after the voice command.

In einer weiteren Ausführung der Erfindung weist die Spracheingabepause mindestens eine Zeit t von t ≥ 5s, bevorzugt von t ≥ 2s und besonders bevorzugt von t ≥ 1s auf. Diese Zeit ist ausreichend, um in einem üblichen Sprachfluss eines Nutzers die Eingabe eines Sprachbefehls genügend zu markieren. Um die Spracheingabe für den Nutzer nicht unnötig in die zeitliche Länge zu ziehen, wird eine Spracheingabepause von wenig mehr als 1s bevorzugt. Die Spracheingabepause kann in ihrer zeitlichen Länge auch individuell von einem Nutzer angepasst werden.In a further embodiment of the invention, the speech input pause has at least a time t of t ≥ 5s, preferably of t ≥ 2s and particularly preferably of t ≥ 1s. This time is sufficient to sufficiently mark the entry of a voice command in a user's usual speech flow. In order not to make the voice input unnecessarily long for the user, a voice input pause of a little more than 1 second is preferred. The length of the voice input pause can also be individually adjusted by a user.

In einer weiteren Ausgestaltung der Erfindung muss die die Spracheingabepause nur von einer autorisierten Spracheingabequelle eingehalten werden. Ob eine Spracheingabequelle autorisiert ist, einen Sprachbefehl zu geben, kann z.B. über charakteristische physikalische Merkmale der Stimme der Spracheingabequelle erkannt werden. Ein alternatives oder zusätzliches Merkmal kann die Positionierung der Spracheingabequelle sein, z.B. die Entfernung der Spracheingabequelle von dem Sprachaufnahmegerät. Nicht autorisierte Spracheingabequellen werden durch das Spracherkennungssystem ignoriert. Dadurch ist z.B. in lärmintensiven Umgebungen eine zuverlässige Spracherkennung möglich. Außerdem wird die Sicherheit des erfindungsgemäßen Verfahrens erhöht.In a further embodiment of the invention, the voice input pause only needs to be maintained by an authorized voice input source. Whether a voice input source is authorized to give a voice command can be recognized, for example, via characteristic physical features of the voice of the voice input source. An alternative or additional feature may be the positioning of the voice input source, e.g. the distance of the voice input source from the voice recorder. Unauthorized voice input sources are ignored by the voice recognition system. This makes reliable speech recognition possible, for example in noisy environments. In addition, the safety of the method according to the invention is increased.

In einer weiteren Ausbildung der Erfindung wird der Spracheingabebefehl als eine Spracheingabemöglichkeit zur Ausführung eines Prozessschrittes auf einer Ausgabeeinrichtung ausgegeben. Die Spracheingabemöglichkeiten sind dem jeweiligen Prozess angepasst. Jeder Prozessschritt erfordert andere Spracheingabemöglichkeiten, aus denen ein Nutzer eine Möglichkeit auswählt und die Sprachbefehlseingabe durchführt.In a further embodiment of the invention, the voice input command is output as a voice input option for executing a process step on an output device. The voice input options are adapted to the respective process. Each process step requires different voice input options, from which a user selects one option and performs the voice command input.

In einer Weiterbildung der Erfindung erfolgt die Ausgabe der Spracheingabemöglichkeit visuell auf einer visuellen Anzeigeeinrichtung. Die Spracheingabemöglichkeiten sind dem jeweiligen Prozess angepasst. Jeder Prozessschritt erfordert andere Spracheingabemöglichkeiten, aus denen ein Nutzer eine Möglichkeit auswählt und die Sprachbefehlseingabe durchführt. Die Darstellung der Spracheingabemöglichkeiten erfolgt auf einer Anzeigeeinrichtung z.B. einer Datenbrille.In a further development of the invention, the voice input option is output visually on a visual display device. The voice input options are adapted to the respective process. Each process step requires different voice input options, from which a user selects one option and performs the voice command input. The voice input options are displayed on a display device, e.g. data glasses.

In einer weiteren Gestaltung der Erfindung wird der der Spracheingabemöglichkeit zugeordnete Prozess nach Erfassen und Zuordnen der Spracheingabemöglichkeit gestartet, wenn die Spracheingabemöglichkeit solitär erfasst wird. Der Sprachbefehl kann auch als alleinstehender Befehl mittels Spracheingabe von einem Nutzer gegeben werden. Der Nutzer gibt dann nur den Sprachbefehl mittels Spracheingabe.In a further embodiment of the invention, the process associated with the voice input option is started after detecting and assigning the voice input option if the voice input option is detected on its own. The voice command can also be given as a stand-alone command by a user using voice input. The user then only gives the voice command using voice input.

In einer weiteren Ausführung der Erfindung wird der dem Sprachbefehl zugeordnete Prozessschritt nur dann ausgeführt, wenn eine Bestätigung der als Sprachbefehl erfassten Spracheingabe durch Sprachbefehl erfolgt. Der Nutzer kann sich so vergewissern, dass der Sprachbefehl richtig erkannt ist und den Sprachbefehl explizit durch Spracheingabe bestätigen.In a further embodiment of the invention, the process step assigned to the voice command is only carried out if the voice input recorded as a voice command is confirmed by a voice command. The user can make sure that the voice command is correctly recognized and explicitly confirm the voice command through voice input.

In einer vorteilhaften Ausgestaltung der Erfindung erfolgt zur Bestätigung der als Sprachbefehl erfassten Spracheingabe eine Ausgabe. Der erfasste Sprachbefehl wird ausgegeben. Der Nutzer kann sich so vergewissern, dass seine Spracheingabe richtig erkannt ist und ggf. die Spracheingabe bestätigen oder annullieren.In an advantageous embodiment of the invention, an output is produced to confirm the voice input recorded as a voice command. The captured voice command is issued. The user can make sure that their voice input is recognized correctly and, if necessary, confirm or cancel the voice input.

In einer Weiterbildung der Erfindung erfolgt die Ausgabe visuell und/oder akustisch. Der erfasste Sprachbefehl wird auf dem Bildschirm einer Anzeigevorrichtung z.B. in Schriftform und/oder als Icon angezeigt und/oder akustisch mittels der Audioausgabe ausgegeben.In a further development of the invention, the output is visual and/or acoustic. The recorded voice command is displayed on the screen of a display device, for example in writing and/or as an icon, and/or output acoustically using the audio output.

In einer weiteren Ausführung der Erfindung wird in der Ausgabe die als Sprachbefehl erfasste Spracheingabe wiederholt, z.B. visuell und/oder akustisch. Diese Absicherung der Sprachbefehlseingabe des Nutzers ist besonders in widrigen Umgebungen vorteilhaft, wenn z.B. die Lichtverhältnisse derart sind, dass ein Nutzer den Bildschirm schlecht erkennen kann. In Umgebungen mit hoher Geräuschentwicklung bietet die Bestätigung auf der Anzeigevorrichtung ebenfalls eine hohe Zuverlässigkeit.In a further embodiment of the invention, the voice input recorded as a voice command is repeated in the output, for example visually and/or acoustically. This protection of the user's voice command input is particularly advantageous in adverse environments, for example if the lighting conditions are such that it is difficult for a user to see the screen. In high noise environments, confirmation on the display device also provides high reliability.

In einer vorteilhaften Ausgestaltung der Erfindung umfasst die Ausgabe eine Spracheingabemöglichkeit zum Widerruf der als Sprachbefehl erfassten Spracheingabe. Der Nutzer erhält eine Rückmeldung, ob der von ihm gegebene Sprachbefehl tatsächlich dem von ihm intendierten Sprachbefehl entspricht. Der Nutzer hat die Möglichkeit, den von ihm gegebenen Sprachbefehl zu annullieren bzw. zu ändern.In an advantageous embodiment of the invention, the output includes a voice input option for revoking the voice input recorded as a voice command. The user receives feedback as to whether the voice command he gave actually corresponds to the voice command he intended. The user has the option to cancel or change the voice command they have given.

In einer weiteren Gestaltung der Erfindung erfolgt die die Bestätigung der als Sprachbefehl erfassten Spracheingabe durch das Verstreichen einer Latenzzeit. Nach Eingabe eines Sprachbefehls durch einen Nutzer verstreicht ein Zeitraum, innerhalb dessen der Nutzer den von ihm eingegebenen Sprachbefehl annullieren oder ändern kann. Wird der Sprachbefehl nicht durch den Nutzer annulliert oder geändert, wird der dem Sprachbefehl zugeordnete Prozessschritt ausgeführt. Verstreicht umgekehrt die Latenzzeit ohne Annullierung oder Änderung des von einem Nutzer eingegebenen Sprachbefehls, gilt der vom Nutzer eingegebene Sprachbefehl als bestätigt, der dem Sprachbefehl zugeordnete Prozessschritt wird ausgeführt.In a further embodiment of the invention, the voice input recorded as a voice command is confirmed by the elapse of a latency time. After a user enters a voice command, a period of time elapses within which the user can cancel or change the voice command they entered. If the voice command is not canceled or changed by the user, the process step associated with the voice command is carried out. Conversely, if the latency period elapses without cancellation or change of the voice command entered by a user, the voice command entered by the user is considered confirmed and the process step assigned to the voice command is carried out.

In einer weiteren Ausführung der Erfindung werden ausschließlich Hardware-Ressourcen der Datenbrille zum Empfang einer Sprachbefehlseingabe genutzt. Limitierende Faktoren sind dabei hauptsächlich der an der Datenbrille selbst angeordnete zur Verfügung stehende Speicherplatz sowie Rechenleistung zur Ausführung eines geeigneten Computerprogramms sowie die auf die Spracheingabe begrenzten Steuerungs- und/oder Eingabemöglichkeiten. Die Datenbrille ist während des Verfahrens mit einem Computer verbunden, der die geeignete und ausreichende Hardware-Ausstattung aufweist. Angestrebt ist eine vollständige Ausführung des Verfahrens auf der Datenbrille, um den Aufwand an geeigneter Hardware so gering wie möglich zu halten. Insbesondere wird die Hardware der Datenbrille zum Empfang einer Sprachbefehlseingabe eines Nutzers verwendet, die Datenbrille weist dazu eine akustische Aufnahmeeinrichtung (Mikrofon) auf. Weitere Hardware kann über die Kopplung über geeignete Kommunikationsschnittstellen mit der Datenbrille verbunden sein. Sie wird dann als zur Datenbrille gehörig gezählt, wenn ein entsprechendes der Hardware zugeordnetes Treiberprogramm auf der Datenbrille ausgeführt wird.In a further embodiment of the invention, only hardware resources of the data glasses are used to receive a voice command input. The main limiting factors are the available storage space on the data glasses themselves, as well as the computing power required to execute a suitable computer program, as well as the control and/or input options limited to voice input. During the procedure, the data glasses are connected to a computer that has the appropriate and sufficient hardware equipment. The aim is for the process to be carried out completely on the data glasses in order to keep the amount of suitable hardware required as low as possible. In particular, the hardware of the data glasses is used to receive a voice command input from a user; the data glasses have an acoustic recording device (microphone) for this purpose. Additional hardware can be connected to the data glasses via the coupling via suitable communication interfaces. It is then counted as belonging to the data glasses if a corresponding driver program assigned to the hardware is executed on the data glasses.

Die Aufgabe wird weiterhin durch das Softwareprogramm gemäß Anspruch 18 zur Durchführung des erfindungsgemäßen Verfahrens gelöst.The task is further solved by the software program according to claim 18 for carrying out the method according to the invention.

Das erfindungsgemäße Softwareprogramm ist geeignet, das Verfahren zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe durchzuführen. Das Softwareprogramm nutzt die Hardware der Datenbrille zur Eingabe von Sprachbefehlen und Ausgabe von Bestätigung bzw. Widerruf der eingegebenen Sprachbefehle.The software program according to the invention is suitable for carrying out the method for controlling processes by means of a voice command input. The software program uses the hardware of the data glasses to enter voice commands and issue confirmation or revocation of the entered voice commands.

Die Aufgabe wird außerdem durch die Datenbrille gemäß Anspruch 19 gelöst.The task is also solved by the data glasses according to claim 19.

Die erfindungsgemäße Datenbrille zur Ausführung des erfindungsgemäßen Verfahrens weist eine Anzeigeeinrichtung zur Darstellung von Spracheingabemöglichkeiten auf. Die Anzeigeeinrichtung ist permanent im Sichtfeld des Nutzers angeordnet, z.B. mittels eines AR-Systems.The data glasses according to the invention for carrying out the method according to the invention have a display device for displaying voice input options. The display device is permanently arranged in the user's field of vision, for example using an AR system.

Weiterhin weist das System ein Mikrophon zum Erfassen von gesprochenen Spracheingabemöglichkeiten auf. Das Mikrofon kann permanent im Sprachfeld des Nutzers angeordnet sein, z.B. mittels eines AR-Systems.The system also has a microphone for recording spoken language input options. The microphone can be permanently arranged in the user's speech field, for example using an AR system.

Außerdem weist das System eine Computereinheit zur Ausführung eines Softwareprogramms zur Durchführung des Verfahrens zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe auf. Die Computereinheit kann ein Wearable wie Smartphone, Smartwatch oder innerhalb eines AR-Systems angeordnet sein. Möglich ist aber auch eine stationäre Computereinheit, mit der ein Wearable mittels Kabelverbindung oder kabelloser Verbindung verbunden ist.The system also has a computer unit for executing a software program for carrying out the method for controlling processes using voice command input. The computer unit can be a wearable such as a smartphone, smartwatch or arranged within an AR system. However, a stationary computer unit to which a wearable is connected via a cable or wireless connection is also possible.

In einer Weiterbildung der Erfindung verfügt die Datenbrille ausschließlich über ein Mikrofon zur Befehlseingabe durch einen Nutzer. Das Mikrofon ist derart an der Datenbrille angeordnet, dass ein Nutzer komfortabel und sicher einen Spracheingabebefehl geben kann. Durch das System sind Spracheingabebefehle mit einem Schalldruck von mindestens 10 dB, bevorzugt mindestens 40 dB und besonders bevorzugt mindestens 55 dB erfassbar. In a further development of the invention, the data glasses only have a microphone Command input by a user. The microphone is arranged on the data glasses in such a way that a user can comfortably and safely give a voice input command. The system can detect voice input commands with a sound pressure of at least 10 dB, preferably at least 40 dB and particularly preferably at least 55 dB.

Ausführungsbeispiele des erfindungsgemäßen Systems und des erfindungsgemäßen Verfahrens zur Steuerung von Prozessen mittels einer Sprachbefehlseingabe sind in den Zeichnungen schematisch vereinfacht dargestellt und werden in der nachfolgenden Beschreibung näher erläutert.Exemplary embodiments of the system according to the invention and the method according to the invention for controlling processes by means of voice command input are shown schematically in simplified form in the drawings and are explained in more detail in the following description.

Es zeigen:

1: Ansicht eines Ausführungsbeispiels des erfindungsgemäßen Systems
2: Anwendung des erfindungsgemäßen Verfahrens auf eine mehrteilige Spracheingabe
3 a: Ausführungsbeispiel der Ausführung des erfindungsgemäßen Verfahrens, Anzeige der Ausführung des Sprachbefehls mit Latenzzeit
3 b: Ausführungsbeispiel der Ausführung des erfindungsgemäßen Verfahrens, optionale Ausführung eines zweiten Sprachbefehls
4: Ausführungsbeispiel der Anordnung von Spracheingabepausen
5: Ablaufdiagramm des erfindungsgemäßen Verfahrens

Show it:

1 : View of an exemplary embodiment of the system according to the invention
2 : Application of the method according to the invention to a multi-part speech input
3 a : Embodiment of the execution of the method according to the invention, display of the execution of the voice command with latency
3 b : Embodiment of the execution of the method according to the invention, optional execution of a second voice command
4 : Example of the arrangement of speech input pauses
5 : Flow chart of the method according to the invention

1 zeigt eine Ansicht eines Ausführungsbeispiels des erfindungsgemäßen Systems zur Durchführung des Verfahrens 400 zur Steuerung von Prozessen. Das System weist eine Datenbrille 100 auf, mittels der die Spracheingabemöglichkeiten in das Sichtfeld eines Nutzers eingeblendet werden. Die Datenbrille 100 wird in diesem Ausführungsbeispiel wie eine herkömmliche Brille vom Nutzer getragen und verfügt über eine entsprechend gestaltete Fassung 170 mit Bügel 180 und Brillengläsern 190. Die Datenbrille 100 weist die Projektionsvorrichtung 110 mit Bildschirm 120 zur Einblendung der Spracheingabemöglichkeiten direkt vor dem Auge des Nutzers auf. Eine derartige Datenbrille 100 erhöht die Geschwindigkeit der Bearbeitung von Prozessen, da der Nutzer beide Arme frei hat. Zur Sprachein- und ausgabe verfügt die Datenbrille 100 über eine Kommunikationseinheit 160 mit Mikrofon 130 und Audioausgabe 140. Gesteuert wird die Datenbrille 100 durch die Steuereinheit 150. Neben der Verwendung einer Datenbrille 100 kann das erfindungsgemäße Verfahren 400 aber auch auf anderen Geräten, vorteilhafterweise Wearables (am Körper getragene Geräte) ausgeführt werden, z.B. mittels eines Smartphones. Das System 1 weist außerdem einen Rechner auf (nicht dargestellt), mit der die Steuereinheit 150 verbunden ist. Der Rechner ist vorzugsweise ein handelsüblicher PC oder Notebook, der genügend Rechenleistung bereitstellt, um ein Computerprogramm zu betreiben, mit dem das erfindungsgemäße Verfahren 100 durchgeführt wird. Durch das System sind Spracheingaben 250 mit einem Schalldruck von mindestens 40 dB erfassbar. 1 shows a view of an exemplary embodiment of the system according to the invention for carrying out the method 400 for controlling processes. The system has data glasses 100, by means of which the voice input options are displayed in a user's field of vision. In this exemplary embodiment, the data glasses 100 are worn by the user like conventional glasses and have a correspondingly designed frame 170 with temples 180 and lenses 190. The data glasses 100 have the projection device 110 with a screen 120 for displaying the voice input options directly in front of the user's eye . Such data glasses 100 increase the speed of processing processes because the user has both arms free. For voice input and output, the data glasses 100 has a communication unit 160 with a microphone 130 and audio output 140. The data glasses 100 is controlled by the control unit 150. In addition to the use of data glasses 100, the method 400 according to the invention can also be used on other devices, advantageously wearables ( devices worn on the body), for example using a smartphone. The system 1 also has a computer (not shown) to which the control unit 150 is connected. The computer is preferably a commercially available PC or notebook that provides sufficient computing power to operate a computer program with which the method 100 according to the invention is carried out. The system can detect voice input 250 with a sound pressure of at least 40 dB.

Das erfindungsgemäße Verfahren 400 zur Steuerung von Prozessen weist fünf Verfahrensschritte auf: Im ersten Verfahrensschritt 410 wird eine Spracheingabe 250 empfangen und erfasst. Dabei erfolgt der Empfang der Spracheingabe 250 ausschließlich über das Mikrofon 130 der Datenbrille 100.The method 400 according to the invention for controlling processes has five method steps: In the first method step 410, a voice input 250 is received and recorded. The voice input 250 is received exclusively via the microphone 130 of the data glasses 100.

Um eine Spracheingabe 250 als solche zu erkennen, muss die Spracheingabe 250 einen Mindestschalldruck aufweisen, m.a.W. von einem Nutzer in einer Mindestlautstärke und/oder in einer unmittelbaren Nähe zum Mikrofon 130 gesprochen werden. Dadurch wird einerseits sichergestellt, dass die Spracheingabe 250 von dem Nutzer intendiert ist, andererseits wird die Spracheingabe 250 deutlich von etwaig vorhandenen Hintergrundgeräuschen unterschieden. Das erfindungsgemäße Verfahren ist daher auch in Umgebungen einsetzbar, die eine hohe Umgebungslautstärke aufweisen. Der Schalldruck der erfassten Spracheingabe 250 beträgt in diesem Ausführungsbeispiel 45dB.In order to recognize a voice input 250 as such, the voice input 250 must have a minimum sound pressure, i.e. be spoken by a user at a minimum volume and/or in close proximity to the microphone 130. On the one hand, this ensures that the voice input 250 is intended by the user, and on the other hand, the voice input 250 is clearly distinguished from any background noise that may be present. The method according to the invention can therefore also be used in environments that have a high ambient noise level. The sound pressure of the recorded voice input 250 is 45dB in this exemplary embodiment.

Das Erfassen der Spracheingabe 250 umfasst ebenfalls ein Zerlegen 420 der Spracheingabe 250 in ihre phonetischen Merkmale, z.B. in ihre Silben. Danach erfolgt ein Erfassen einer Spracheingabepause t_n in der Spracheingabe 250. Dann wird die Spracheingabe 250 als Sprachbefehl zur Ausführung eines Prozessschrittes derart identifiziert 430. 435, dass die Spracheingabe 250 nur dann als Sprachbefehl identifiziert wird, wenn die Spracheingabepause t_n unmittelbar zeitlich benachbart zur Spracheingabe 250 erfasst wird. Schließlich wird die erfasste Spracheingabe 250 einem Prozessschritt zugeordnet 440, 445 und der zugeordnete Prozessschritt gestartet 450.Detecting the speech input 250 also includes breaking down 420 the speech input 250 into its phonetic features, for example into its syllables. A voice input pause t _n is then detected in the voice input 250. The voice input 250 is then identified 430. 435 as a voice command for executing a process step in such a way that the voice input 250 is only identified as a voice command if the voice input pause t _n is immediately adjacent in time to the Voice input 250 is recorded. Finally, the recorded voice input 250 is assigned to a process step 440, 445 and the assigned process step is started 450.

2 zeigt ein Ausführungsbeispiel der Anwendung des erfindungsgemäßen Verfahrens 400 auf eine mehrteilige Spracheingabe 250 eines Nutzers. Der Nutzer gibt die Spracheingabe 250 „Bitte ein Voice Memo starten und speichern im Verzeichnis „Projekt”” in das Mikrofon 130 der Datenbrille 100 ein. Die Spracheingabe 250 wird empfangen und vom Spracherkennungssystem erfasst. Die Spracheingabe 250 wird anschließend in ihre einzelnen Silben zerlegt 420. Es hat sich herausgestellt, dass eine Zerlegung einer Spracheingabe 250 in deren Silben sinnvoll ist. 2 shows an exemplary embodiment of the application of the method 400 according to the invention to a multi-part voice input 250 from a user. The user enters the voice input 250 “Please start a voice memo and save it in the “Project” directory” into the microphone 130 of the data glasses 100. The voice input 250 is received and detected by the voice recognition system. The speech input 250 is then broken down into its individual syllables 420. It has been found that breaking down a speech input 250 into its syllables makes sense.

Unter der Spracheingabe 250 ist in 2 die Spracheingabe 250 in phonetischer Lautschrift dargestellt, darunter die Spracheingabepausen t_n, die ein Nutzer üblicherweise während der Spracheingabe 250 einlegt. In diesem Ausführungsbeispiel sind die Spracheingabepausen t₂ und t₄ vor sowie nach dem Sprachbefehl „Voice Mail“ zeitlich signifikant länger als die anderen Spracheingabepausen t₁, t₃, t₅, t₆, t₇, t₈, t₉, insbesondere länger als die Spracheingabepausen t₆ und t₇ vor und nach dem Sprachbefehl „speichern“. Die Spracheingabepausen t₂ und t₄ weisen in diesem Ausführungsbeispiel eine zeitliche Länge von jeweils 2s auf. Erfindungsgemäß ist für die zeitliche Länge der Spracheingabepausen t_n eine Zeit von mindestens 1s vorgesehen. Diese Zeit ist ausreichend, um in einem üblichen Sprachfluss eines Nutzers die Eingabe eines Sprachbefehls genügend zu markieren. Die Spracheingabepausen t_n können in ihrer zeitlichen Länge auch individuell von einem Nutzer angepasst werden, üblicherweise in einer Zeitspanne von 1 s bis zu mehr als 5s. Beide Spracheingabepausen t₂ und t₄ vor sowie nach dem Sprachbefehl „Voice Mail“ werden mittels phonetischer Spracherkennungsmethode P erkannt. Durch die Spracheingabepausen t₂ und t₄ wird die Spracheingabe „Voice Mail“ als Sprachbefehl erkannt und mittels semantischer Spracherkennungsmethode S dem Prozessschritt „Voice Mail“ zugeordnet.Below the voice input 250 is in 2 the voice input 250 is shown in phonetic phonetic transcription, including the voice input pauses t _n that a user usually takes during voice input 250. In this exemplary embodiment, the voice input pauses t ₂ and t ₄ before and after the voice command "Voice Mail" are significantly longer in time than the other voice input pauses t ₁ , t ₃ , t ₅ , t ₆ , t ₇ , t ₈ , t ₉ , in particular longer “save” as the voice input pauses t ₆ and t ₇ before and after the voice command. In this exemplary embodiment, the speech input pauses t ₂ and t ₄ each have a length of 2 s. According to the invention, a time of at least 1 s is provided for the time length of the speech input pauses t _n . This time is sufficient to sufficiently mark the entry of a voice command in a user's usual speech flow. The length of the speech input pauses t _n can also be individually adjusted by a user, usually in a period of 1 s to more than 5 s. Both speech input pauses t ₂ and t ₄ before and after the voice command “Voice Mail” are recognized using the phonetic speech recognition method P. Through the voice input pauses t ₂ and t ₄ , the voice input “Voice Mail” is recognized as a voice command and assigned to the “Voice Mail” process step using the semantic voice recognition method S.

Der in diesem Ausführungsbeispiel erkannte Sprachbefehl „Voice Mail“ wird nicht nur ausgeführt, sondern die Ausführung zusätzlich auf dem Bildschirm 120 der Anzeigevorrichtung 110 in Schriftform und/oder als Icon angezeigt 405 und/oder akustisch mittels der Audioausgabe 140 ausgegeben. Der Nutzer kann sich so vergewissern, dass seine Spracheingabe 250 richtig erkannt ist und ggf. die Spracheingabe 250 bestätigen oder annullieren. Alle weiteren Spracheingabepausen t₁, t₃, t₅, t₆, t₇, t₈, t₉, insbesondere die Spracheingabepausen t6 und t7 vor sowie nach dem Sprachbefehl „speichern“ weisen in diesem Ausführungsbeispiel eine zeitliche Länge von jeweils weniger als 1s auf. Diese Zeit ist erfindungsgemäß zu kurz, als dass die Spracheingabepausen t6 und t7 als solche identifiziert werden und einen Sprachbefehl kennzeichnen. Der Sprachbefehl „speichern“ wird daher als solcher nicht erkannt und demzufolge nicht ausgeführt.The voice command “Voice Mail” recognized in this exemplary embodiment is not only executed, but the execution is also displayed 405 on the screen 120 of the display device 110 in written form and/or as an icon and/or output acoustically using the audio output 140. The user can thus make sure that his voice input 250 is correctly recognized and, if necessary, confirm or cancel the voice input 250. All other voice input pauses t ₁ , t ₃ , t ₅ , t ₆ , t ₇ , t ₈ , t ₉ , in particular the voice input pauses t6 and t7 before and after the voice command "save" have a time length of less than 1s each in this exemplary embodiment on. According to the invention, this time is too short for the voice input pauses t6 and t7 to be identified as such and to identify a voice command. The voice command “save” is therefore not recognized as such and therefore not executed.

Im Rahmen dieser Schrift sind folgende Definitionen aus der Phonetik zugrunde gelegt: Ein Wort umfasst eine oder mehrere Silben. Eine Silbe umfasst ein oder mehrere Phoneme (Laut). Ein Phonem ist die kleinste bedeutungsunterscheidende Lauteinheit einer Sprache. Die Hauptaufgabe und Funktion von Sprachlauten ist es, der Identifikation linguistischer Einheiten zu dienen. Um diese Einheiten identifizieren zu können, müssen sie voneinander unterscheidbar sein, und diese Unterscheidbarkeit (distinktive Merkmale) wird durch Sprachlaute gewährleistet. Beispiele distinktiver Merkmale der Phone in der Phonetik sind z.B. nasal, lateral, stimmhaft, sonorant, silbisch, konsonantisch, koronal, anterior, hoch, niedrig, hinten, lateral, rund, okkulsiv, fortis, sibilant. Ein Phon ist ein jedes einzelne konkrete Vorkommen eines Lautes.The following definitions from phonetics are used as a basis for this document: A word comprises one or more syllables. A syllable includes one or more phonemes (sounds). A phoneme is the smallest unit of sound in a language that distinguishes meaning. The main task and function of speech sounds is to serve the identification of linguistic units. In order to identify these units, they must be distinguishable from each other, and this distinctiveness (distinctive features) is ensured by speech sounds. Examples of distinctive features of the phone in phonetics are, for example, nasal, lateral, voiced, sonorant, syllabic, consonantal, coronal, anterior, high, low, posterior, lateral, round, occlusive, fortis, sibilant. A phone is every single concrete occurrence of a sound.

Die genannte Spracheingabe 250 ist daher im Sinne der Erfindung mehrteilig. Die Spracheingabe 250 weist nicht nur mehrere (11) Worte auf, einige Worte enthalten auch mehrere Silben sowie eine Vielzahl von Phonen. Die Spracheingabe 250 weist außerdem zwei unterschiedliche Sprachbefehle auf, nämlich „Voice Mail“ und „speichern“.The mentioned voice input 250 is therefore multi-part within the meaning of the invention. Not only does the voice input 250 include multiple (11) words, some words also include multiple syllables and a variety of phones. The voice input 250 also has two different voice commands, namely “voice mail” and “save”.

Eine Spracheingabepause t_n ist im Rahmen dieser Schrift ein zeitliches Intervall üblicherweise von variabler Zeitdauer ohne ein Sprachsignal, eine Spracheingabe und/oder einen Sprachbefehl eines Nutzers. Auch das Auftreten von Hintergrundgeräuschen wird als Spracheingabepause t_n angesehen. Ebenfalls ist ein Sprachsignal, eine Spracheingabe und/oder ein Sprachbefehl, der durch einen nicht autorisierten Nutzer der Datenbrille 100 gegeben wird, eine Spracheingabepause t_n.In the context of this document, a voice input pause t _n is a time interval, usually of variable duration, without a voice signal, a voice input and/or a voice command from a user. The occurrence of background noise is also considered a speech input pause t _n . Likewise, a voice signal, a voice input and/or a voice command given by an unauthorized user of the data glasses 100 is a voice input pause t _n .

Ein Ausführungsbeispiel der Anwendung des erfindungsgemäßen Verfahrens 400 zeigt 3. Zur Steuerung des Verfahrens 400 wird vorteilhafterweise ein Chatbot verwendet, also ein textbasiertes Dialogsystem, mit dem ein Nutzer einen Prozess steuert. Der Nutzer gibt ebenfalls die Spracheingabe 250 „Bitte ein Voice Memo starten und speichern im Verzeichnis” „Projekt“ in das Mikrofon 130 der Datenbrille 100 ein (3 a). Die Spracheingabe 250 wird empfangen und vom Spracherkennungssystem erfasst. In diesem Ausführungsbeispiel hat ein Nutzer Spracheingabepausen t_n vor und nach dem Sprachbefehl „Voice Mail“ eingelegt. Die Spracheingabepausen t_n werden ebenfalls erfasst. Durch die Spracheingabepausen t_n wird die Spracheingabe „Voice Mail“ als Sprachbefehl erkannt und dem Prozessschritt „Voice Mail“ zugeordnet. Danach wird der Prozessschritt „Voice Mail“ gestartet.An exemplary embodiment of the application of the method 400 according to the invention shows 3 . To control the method 400, a chatbot is advantageously used, i.e. a text-based dialogue system with which a user controls a process. The user also enters the voice input 250 “Please start a voice memo and save it in the directory” “Project” into the microphone 130 of the data glasses 100 ( 3 a) . The voice input 250 is received and detected by the voice recognition system. In this exemplary embodiment, a user has taken voice input pauses t _n before and after the voice command “Voice Mail”. The speech input pauses t _n are also recorded. Due to the voice input pauses t _n , the voice input “Voice Mail” is recognized as a voice command and assigned to the “Voice Mail” process step. The “Voice Mail” process step is then started.

Dem Nutzer wird die Erfassung des Sprachbefehls „Voice Mail“ auf dem Bildschirm 120 der Anzeigevorrichtung 110 angezeigt 224 und optional zusätzlich akustisch mittels der Audioausgabe 140 ausgegeben. Die Erfassung des Sprachbefehls kann sowohl in Schriftform als auch als graphische Darstellung, z.B. mittels Icons, auf dem Bildschirm 120 angezeigt werden.The detection of the voice command “Voice Mail” is displayed 224 to the user on the screen 120 of the display device 110 and optionally also output acoustically using the audio output 140. The recording of the voice command can be displayed on the screen 120 both in written form and as a graphical representation, for example using icons.

Zusätzlich wird auf dem Bildschirm 120 der Anzeigevorrichtung 110 in diesem Ausführungsbeispiel dem Nutzer eine Latenzzeit 214 angezeigt, in diesem Ausführungsbeispiel 5 s. Die Latenzzeit 214 kann von einem Nutzer frei gewählt werden. Die Latenzzeit 214 ist der Zeitraum, innerhalb dem der gegebene Sprachbefehl noch widerrufen bzw. geändert werden kann, indem der Nutzer den entsprechenden Sprachbefehl und/oder einen Widerrufsbefehl mittels Spracheingabe gibt, z.B. „Abbruch“. Nach Verstreichen der Latenzzeit 214 ohne Sprachbefehl und/oder Widerrufsbefehl wird der ursprüngliche Sprachbefehl ausgeführt, und nicht ausgeführt bei Widerrufsbefehl. Der Nutzer kann in diesem Ausführungsbeispiel den Sprachbefehl auch innerhalb des Zeitraums der Latenzzeit 214 wiederholen, um den Prozessschritt auszuführen. Eine andere Möglichkeit besteht darin, dass die Latenzzeit 214 der Zeitraum ist, in dem ein Nutzer den gegebenen Sprachbefehl explizit durch einen Sprachbefehl bestätigen muss, z.B. mit „Ausführen“. Nach Verstreichen der Latenzzeit 214 ohne Bestätigung durch den Nutzer wird dann der ursprüngliche Sprachbefehl nicht, bei erfolgter Bestätigung jedoch ausgeführt.In addition, in this exemplary embodiment, a latency time 214 is displayed to the user on the screen 120 of the display device 110, in this exemplary embodiment 5 s. The latency time 214 can be freely selected by a user. The Latency 214 is the period of time within which the given voice command can still be revoked or changed by the user giving the corresponding voice command and/or a revocation command using voice input, e.g. “Cancel”. After the latency time 214 has elapsed without a voice command and/or a revocation command, the original voice command is executed and is not executed in the case of a revocation command. In this exemplary embodiment, the user can also repeat the voice command within the period of latency 214 to carry out the process step. Another possibility is that the latency time 214 is the period in which a user must explicitly confirm the given voice command with a voice command, for example with “Execute”. After the latency time 214 has elapsed without confirmation by the user, the original voice command will not be executed, but will be executed if confirmation is given.

Nach Beenden des Prozessschrittes „Voice Mail“ (3 b) erhält der Benutzer eine entsprechende Meldung 216 („beendet“). Das Spracherkennungssystem gibt dem Nutzer nun eine Spracheingabemöglichkeit 215 („speichern?“) aus. Möglich ist auch die Ausgabe einer Mehrzahl von Spracheingabemöglichkeiten 215, abhängig vom gerade ausgeführten Prozessschritt. Derartige Spracheingabemöglichkeiten 215 sind im Rahmen dieser Schrift Initialworte. Die Spracheingabemöglichkeiten sind sowohl in Schriftform 215 als auch als graphische Darstellung auf dem Bildschirm 120 angezeigt. Optional können die Spracheingabemöglichkeiten 215 auch zusätzlich akustisch mittels der Audioausgabe 140 ausgegeben werden. Der Nutzer kann jetzt durch eine weitere Spracheingabe 250 der Spracheingabemöglichkeit 215 den Sprachbefehl „speichern“ geben. Vor und nach dem Sprachbefehl „speichern“ hat ein Nutzer ebenfalls Spracheingabepausen t_n eingelegt. Die Spracheingabepausen t_n werden ebenfalls erfasst. Durch die Spracheingabepausen t_n wird die Spracheingabe 250 „speichern“ als Sprachbefehl erkannt und dem Prozessschritt „speichern“ zugeordnet. Danach wird der Prozessschritt „speichern“ gestartet.After completing the “Voice Mail” process step ( 3 b) the user receives a corresponding message 216 (“finished”). The voice recognition system now gives the user a voice input option 215 (“save?”). It is also possible to output a plurality of voice input options 215, depending on the process step currently being carried out. Such voice input options 215 are initial words in the context of this document. The voice input options are displayed both in written form 215 and as a graphical representation on the screen 120. Optionally, the voice input options 215 can also be output acoustically using the audio output 140. The user can now give the voice command “save” to the voice input option 215 by another voice input 250. A user also took voice input pauses t _n before and after the voice command “save”. The speech input pauses t _n are also recorded. Due to the voice input pauses t _n , the voice input 250 “save” is recognized as a voice command and assigned to the “save” process step. The “save” process step is then started.

Ein Ausführungsbeispiel der Anordnung von Spracheingabepausen t_n zeigt 4. Das erfindungsgemäße Verfahren 400 kann mittels einer Suche nach und Erkennung von Initialworten durchgeführt werden. Die Initialworte weisen dabei semantisch charakteristische und vor allem distinktive phonetische Merkmale auf. Der Nutzer gibt die Spracheingabe 250 „speichern bitte“ (4 a) in das Mikrofon 130 der Datenbrille 100 ein. Das Initialwort ist in diesem Ausführungsbeispiel „speichern“.An exemplary embodiment of the arrangement of speech input pauses t _n is shown 4 . The method 400 according to the invention can be carried out by searching for and recognizing initial words. The initial words have semantically characteristic and, above all, distinctive phonetic features. The user enters the voice input 250 “please save” ( 4 a) into the microphone 130 of the data glasses 100. The initial word in this exemplary embodiment is “save”.

Spracheingabepausen t_n können erfindungsgemäß unmittelbar vor (4 a) oder nach (4 b) der Spracheingabe 250 eingelegt werden. Möglich ist außerdem das Einlegen von Spracheingabepausen t_n unmittelbar vor und nach der Spracheingabe 250. Sämtliche dem System 100 und dem Verfahren 400 zur Verfügung stehenden Initialworte können in einer Datenbank gespeichert sein. Der Nutzer kann weitere Initialworte der Datenbank hinzufügen, indem er z.B. das Initialwort „Initialwort“ (4 c) als Sprachbefehl 250 gibt. Die Anwendung von Initialworten verringert den Zeitaufwand der Erkennung der Spracheingabe.According to the invention, speech input pauses t _n can occur immediately before ( 4 a) or after ( 4 b) the voice input 250 can be inserted. It is also possible to insert speech input pauses t _n immediately before and after the speech input 250. All initial words available to the system 100 and the method 400 can be stored in a database. The user can add additional initial words to the database, for example by entering the initial word “initial word” ( 4c) as a voice command 250. The use of initial words reduces the time required to recognize voice input.

Spracheingabepausen t_n müssen erfindungsgemäß nur von einer oder mehreren autorisierten Spracheingabequellen eingehalten werden. Ob eine Spracheingabequelle autorisiert ist, einen Sprachbefehl 250 zu geben, kann z.B. über charakteristische physikalische Merkmale der Stimme der Spracheingabequelle erkannt werden. Ein alternatives oder zusätzliches Merkmal kann die Positionierung der Spracheingabequelle sein, z.B. über die Entfernung. Nicht autorisierte Spracheingabequellen werden durch das Spracherkennungssystem ignoriert. Dadurch ist z.B. in lärmintensiven Umgebungen eine zuverlässige Spracherkennung möglich.According to the invention, speech input pauses t _n only have to be maintained by one or more authorized speech input sources. Whether a voice input source is authorized to give a voice command 250 can be recognized, for example, via characteristic physical features of the voice of the voice input source. An alternative or additional feature may be the positioning of the voice input source, for example over distance. Unauthorized voice input sources are ignored by the voice recognition system. This makes reliable speech recognition possible, for example in noisy environments.

5 zeigt ein Ausführungsbeispiel des erfindungsgemäßen Verfahrens 400 zur Steuerung von Prozessen. Das erfindungsgemäße Verfahren 400 zur Steuerung von Prozessen weist fünf Verfahrensschritte auf: Im ersten Verfahrensschritt 410 wird eine Spracheingabe 250 empfangen und erfasst. Dabei erfolgt der Empfang der Spracheingabe 250 ausschließlich über das Mikrofon 130 der Datenbrille 100. Im nächsten Verfahrensschritt wird die Spracheingabe 250 aufgrund ihrer phonetischen Merkmale zerlegt 420. Dabei werden insbesondere durch Spracheingabepausen t_n markierte Sprachbefehle in der Spracheingabe 250 erfasst. Es erfolgt also eine Zerlegung der Spracheingabe 250 in Initialworte und Füllworte. Danach werden die Initialworte erkannt, die die relevanten Sprachbefehle darstellen 430. Der oder die Sprachbefehle werden einem Prozessschritt zugeordnet 440, der im letzten Verfahrensschritt 450 ausgeführt wird. Die Füllworte werden optional ebenfalls erkannt 435 und einem Prozessschritt zugeordnet 445. 5 shows an exemplary embodiment of the method 400 according to the invention for controlling processes. The method 400 according to the invention for controlling processes has five method steps: In the first method step 410, a voice input 250 is received and recorded. The voice input 250 is received exclusively via the microphone 130 of the data glasses 100. In the next procedural step, the voice input 250 is broken down 420 based on its phonetic features. In particular, voice commands marked by voice input pauses t _n are recorded in the voice input 250. The voice input 250 is therefore broken down into initial words and filler words. The initial words are then recognized, which represent the relevant voice commands 430. The voice command(s) are assigned to a process step 440, which is carried out in the last method step 450. The filler words are optionally also recognized 435 and assigned to a process step 445.

BEZUGSZEICHENLISTEREFERENCE SYMBOL LIST

PP: Phonetische SpracherkennungPhonetic speech recognition
SS: Semantische SpracherkennungSemantic speech recognition
100100: DatenbrilleSmart glasses
110110: ProjektionsvorrichtungProjection device
120120: BildschirmScreen
130130: Mikrofonmicrophone
140140: AudioausgabeAudio output
150150: SteuereinheitControl unit
160160: KommunikationseinheitCommunication unit
170170: Fassungversion
180180: Bügelhanger
190190: BrillenglasLens
214214: LatenzzeitLatency
215215: Darstellung SpracheingabemöglichkeitPresentation of voice input option
216216: Statusmeldung ProzessschrittStatus message process step
224224: Symbolsymbol
250250: SpracheingabebefehleVoice input commands
t, t1, t2, t3, t4, t5, t6, t7, t8, t9, tnt, t1, t2, t3, t4, t5, t6, t7, t8, t9, tn: SpracheingabepauseVoice input pause
400400: Verfahren zur Steuerung von ProzessenProcedures for controlling processes
405405: Anzeige von SpracheingabemöglichkeitenDisplay of voice input options
410410: Empfangen einer mehrteiligen SpracheingabeReceiving multi-part voice input
420420: Zerlegen einer SpracheingabeDecomposing a voice input
430430: Erkennen (phonetisch) der erfassten SpracheingabeRecognize (phonetically) the captured voice input
435435: Erkennen (semantisch) der erfassten SpracheingabeRecognize (semantically) the captured speech input
440440: Zuordnen des der erfassten Spracheingabe zugeordneten ProzessesAssigning the process associated with the captured voice input
445445: Zuordnen des der erfassten Spracheingabe zugeordneten ProzessesAssigning the process associated with the captured voice input
450450: Ausführen des der erfassten Spracheingabe zugeordneten ProzessesRun the process associated with the captured voice input

Claims

Method (400) for controlling processes by means of a voice command input (250) with the method steps: • Detecting a speech input pause (t _n ) • Detecting a speech input (250), the detected speech input (250) having a sound pressure greater than 10 dB • Identifying the voice input as a voice command to execute a process step, whereby the voice input (250) is only identified as a voice command if the voice input pause is recorded immediately adjacent to the voice input (250) • Assigning the captured voice input (250) to a process step • Starting the voice input (250) assigned process step

Method (400)) for controlling processes by means of a voice command input (250). Claim 1 characterized in that the voice input (250) has a sound pressure greater than 40 dB, preferably greater than 45 dB and particularly preferably greater than 55 dB.

Method (400) for controlling processes by means of a voice command input (250). Claim 1 or 2 , characterized in that the speech input pause (t _n ) is detected immediately before the speech input (250).

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the voice input pause (t _n ) is detected immediately after the voice input (250).

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the voice input pause (t _n ) has at least a time t with t >= 5s, preferably with t >= 2s and especially preferably with t >= 1s.

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the voice input pause (t _n ) only has to be maintained by an authorized voice input source.

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the voice input command is output as a voice input option (215) for executing a process step on an output device (120).

Method (400) for controlling processes by means of a voice command input (250). Claim 7 , characterized in that the voice input option (215) is output visually on a visual display device (120).

Method (400) for controlling processes by means of a voice command input (250). Claim 8 , characterized in that the output of the voice input option (215) together with other voice input options (215) takes place visually on a visual display device (120).

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the process assigned to the voice input option (311) is started after detecting and assigning the voice input option (215) when voice input option (215 ) is recorded solitary

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the process step assigned to the voice command is only carried out if a confirmation (224) of the voice input (250) recorded as a voice command is received. he follows.

Method (400) for controlling processes by means of a voice command input (250). Claim 11 , characterized in that an output is made to confirm (224) the voice input (250) recorded as a voice command.

Method (400) for controlling processes by means of a voice command input (250). Claim 12 , characterized in that the output is visual and/or acoustic.

Method (400) for controlling processes by means of a voice command input (250). Claim 12 or 13 , characterized in that the voice input (250) recorded as a voice command is repeated in the output.

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the Claims 12 until 14 , characterized in that the output includes a voice input option for revoking the voice input (250) recorded as a voice command.

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the Claims 11 until 15 , characterized in that the confirmation (224) of the voice input (250) recorded as a voice command occurs through the elapse of a latency time (244).

Method (400) for controlling processes by means of a voice command input (250) according to one or more of the preceding claims, characterized in that the method (400) exclusively uses the resources of data glasses (100) to receive a voice input (250).

Software program for carrying out the method (400) according to one or more of the Claims 1 until 17 .

Data glasses (100) for carrying out the method (400) according to one or more of the Claims 1 until 17 comprising • a display device (120) for displaying voice input options (210, 211, 212, 213, 214, 215, 216) • a microphone (130) for detecting spoken voice input options (210, 211, 212, 213, 214, 215, 216) • a computer unit for executing a software program

Data glasses (100) for carrying out the method (400). Claim 19 characterized in that the data glasses (100) only have a microphone (130) for command input, with voice input options (210, 211, 212, 213, 214, 215, 216) recorded by the system with a sound pressure of at least 10 dB being preferred at least 40 dB and particularly preferably at least 55 dB can be detected.