SE523066C2

SE523066C2 - Reducing information retrieval time from memory, especially for voice recognition device for mobile phone, by transferring most often retrieved information to faster part of memory

Info

Publication number: SE523066C2
Application number: SE9902050A
Authority: SE
Inventors: Katarina Ekelund; Jim Rasmusson
Original assignee: Ericsson Telefon Ab L M
Priority date: 1999-06-03
Filing date: 1999-06-03
Publication date: 2004-03-23
Also published as: SE9902050L; SE9902050D0

Abstract

The components (e.g. words) in the slower first part (3) of the memory (2) with the highest usage frequency are gathered into a group (6), the size of which is limited by the size of the faster second part (4) of the memory, and this group is downloaded into the second part of the memory and searched first of all before any of the other components. An Independent claim is also included for a device with a processor, memory and application, which uses this information retrieval method.

Description

25 30 1 Y . .. .. . . .- s . .s a . .. . . a . .a ... . ~. . . .a . . . n. u. ». , .. n.. a ...- - , .u . . . . - . a t a f v. .- U .. 2 Rösten är för många människor det naturligaste och lättaste sättet att kommunicera. 25 30 1 Y. .. ... . .- s. .s a. ... . a. .a .... ~. . . .a. . . now. ". , .. n .. a ...- -, .u. . . . -. a t a f v. .- U .. 2 The voice is for many people the most natural and easiest way to communicate.

Styrning av en telefon, eller andra konsumtionsprodukter, med hjälp av rösten kommer att bli mer och mer vanligt i framtiden.Controlling a telephone, or other consumer product, using the voice will become more and more common in the future.

Röststyrningstillämpningar inbegriper något slags tal/röst-igenkärntingssystem. Den del av tal/röst-igenkäruiingssystemet som analyserar det talade inmatningsordet och försöker överensstämma det med en referensvokabulär som nedan benämns tal- igenkännare. De samplade och digitaliserade avbildningama av referenslistan med ord som igenkänns av tal-igenkännaren, som nedan benänms ”referensvokabu1är”, måste lagras i ett snabbt minne som är tillgängligt för tal-igenkärmaren i enheten ifråga. Kapaciteten på minnet måste vara tillräcklig för att erbjuda användaren en rimligt dimensionerad referensvokabulär av igenkärmbara ord.Voice control applications include some kind of speech / voice recognition system. The part of the speech / voice recognition system that analyzes the spoken input word and tries to match it with a reference vocabulary, hereinafter referred to as speech recognition. The sampled and digitized images of the reference list of words recognized by the speech recognizer, hereinafter referred to as "reference vocabulary", must be stored in a fast memory available to the speech recognizer in the unit in question. The capacity of the memory must be sufficient to provide the user with a reasonably dimensioned reference vocabulary of recognizable words.

Responstiden för tal-igenkäimaren måste vara relativt kort för att vara acceptabel för användaren.The response time of the speech recognition device must be relatively short to be acceptable to the user.

Således är storleken och åtkomsthastigheten i minnet två mycket viktiga utfonnningskriteria för en tal-igenkänningstillärnpning. Ett lämpligt urval av dessa ”para-metrar” är speciellt viktig i tillämpningar såsom en mobiltelefon eller liknande enheter, vilka utmärks av att vara föremål för effektförbrtilmingsbegränsningar och för vilka effekten slås från när de inte används.Thus, the size and access speed of the memory are two very important design criteria for a speech recognition acquisition. An appropriate selection of these "parameters" is especially important in applications such as a mobile phone or similar devices, which are characterized by being subject to power loss limitations and for which the power is turned off when not in use.

I tidigare kända tal-igenkänningslösningar för motsvarande tillämpningar, vid påslaget läge av enheten, nedladdades hela referensvokabulären i tillämpningen från ett permanent minne med långsammare åtkomst (såsom ett EEPROM eller blixtrninne) till det icke permanenta minnet med snabbare åtkomst (såsom en (D)RAM) i den digitala signalprocessorn (DSP), som inrymmer bl.a. tal- igenkännaren. Den begränsade stor-leken på DSP-RAM begränsade avsevärt storleken på referensvokabulären. En typisk DSP-RAM-storlek på tex. 16 kilobit möjliggör lagring av en vokabulär med en total varaktighet på ungefär 10 sektmder, vilket överensstämmer tex. med 10 en-sekunders långa ord. 10 15 20 25 30 y v n n nnn n n nn nn nn nn nn nn n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n nnn nnn n n n n n n n n n nnn n n .nn n n n n nn n n n n n n n n n n n n n n n nn nn n. .n n . n n nn n n 3 Det permanenta minnet kan däremot dirnensioneras så att det kan lagra en vokabulär med en lämplig storlek, men åtkomsttiden är inte tillräckligt kort för vanliga tillämpningar för att det skall kunna kommas åt direkt av tal-igenkännaren, åtminstone inte för de mest användarfrekventa orden. För mindre användarfrekventa ord är en längre åtkomsttid acceptabel i många tillämpningar.In prior art speech recognition solutions for similar applications, when the device is turned on, the entire reference vocabulary in the application was downloaded from a permanent memory with slower access (such as an EEPROM or flash memory) to the non-permanent memory with faster access (such as a (D) RAM ) in the digital signal processor (DSP), which houses i.a. the speech recognizer. The limited size of DSP-RAM significantly limited the size of the reference vocabulary. A typical DSP-RAM size of e.g. 16 kilobits enables storage of a vocabulary with a total duration of approximately 10 sects, which corresponds to e.g. with 10 one-second long words. 10 15 20 25 30 y v n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n nn nn nn 3 On the other hand, the permanent memory can be downgraded so that it can store a vocabulary of a suitable size, but the access time is not short enough for common applications to be accessed directly by the speech recognizer, at least not for the most frequent users. order. For less user-frequent words, a longer access time is acceptable in many applications.

För att öka den möjliga storleken på referensvokabulären har ett större RAM inkluderats utanför DSP. Emellertid ökar ett större RAM effektförbmkningen hos tillämpningen.To increase the possible size of the reference vocabulary, a larger RAM has been included outside the DSP. However, a larger RAM increases the power consumption of the application.

I tillämpningar där effektförbrukning är en primär angelägenhet, efterfrågas en effektiv användning av RAM.In applications where power consumption is a primary concern, efficient use of RAM is required.

JP-A-O7/3 ll 591 beskriver en tal-igenkänningsapparat av mindre modell, där den s.k. objektvokabulären som används i tal-igenkännaren lagras i ett minne.JP-A-O7 / 3 ll 591 describes a speech recognition apparatus of a smaller model, in which the so-called the object vocabulary used in the speech recognizer is stored in a memory.

Objektvokabulären som idag lagras i minnet väljs ut från en stor ordbok. Tal- igenkänningen utförs genom att den i minnet lagrade vokabulären ändras genom att använda hela vokabulärer som är registrerad i ordboken såsom virtuella objekt.The object vocabulary that is stored in memory today is selected from a large dictionary. Speech recognition is performed by changing the vocabulary stored in memory by using whole vocabularies that are registered in the dictionary as virtual objects.

Sammanfattning Problemet med tidigare känd teknik är att den begränsade storleken pâ den relativt snabba delen av minnet, t.ex. på grund av effektförbrttkningsbegränsningar, begränsar storleken på gruppen av element som kan genomsökas inom en rimlig söktid.Summary The problem with the prior art is that the limited size of the relatively fast part of the memory, e.g. due to power consumption limitations, limits the size of the group of elements that can be searched within a reasonable search time.

Syftet med föreliggande uppfinning är att uppnå en kortare medelsöktid för elementen i en given referensgrupp av element, för en viss storlek på det relativt snabba rninnet. 10 15 20 25 30 . . - . n. 4 Detta uppnås enligt uppﬁnningen, såsom beskrivs i krav 1, genom att elementen i den första delen av minnet med den högsta användningsfrekvensen samlas i en grupp, vars storlek är begränsad av storleken på den andra delen av minnet, och genom att nämnda grupp av element laddas ned till den andra delen av minnet efter varje avslutning av ett frånslaget läge, och genomsöks först.The object of the present invention is to achieve a shorter average search time for the elements in a given reference group of elements, for a certain size of the relatively fast flow rate. 10 15 20 25 30. . -. n. 4 This is achieved according to the invention, as described in claim 1, by collecting the elements in the first part of the memory with the highest frequency of use in a group, the size of which is limited by the size of the second part of the memory, and by said group of elements are downloaded to the second part of the memory after each end of a disabled mode, and are scanned first.

I detta sammanhang används uttrycket ”användningsfrekvensen” i betydelsen ”den förväntade användningsfrekvensen", baserat på förväntningar eller prioriteringar, t.ex. tillämpat i en startsituation, eller ”den faktiska användningsfrekvensen” (dvs. den historiskt erfarna, möjligtvis dynamiskt uppdaterade användningsfrekvensen) eller en kombination därav. Uttrycket ”medelanvändningsfrekvensen för en grupp” används i betydelsen genomsnittsvärdet på användningsfrekvensen för elementen som utgör gruppen.In this context, the term "frequency of use" is used to mean "expected frequency of use", based on expectations or priorities, eg applied in a start-up situation, or "actual frequency of use" (ie the historically experienced, possibly dynamically updated frequency of use) or The term “average frequency of use of a group” is used to mean the average value of the frequency of use of the elements that make up the group.

Genom att, som beskrivs i krav 2, dela upp alla elementen i den första delen av minnet i grupper efter deras användningsfrekvens, begränsas storleken på varje grupp av element av storleken på den andra delen av minnet, och genom att ladda ned grupperna av element till den andra delen av minnet i ordningsföljd efter minskande medelanvändningsfrekvens, och genom att fortsätta genomsökningen tills det relevanta elementet har hittats eller tills alla grupper av element som utgör referensuppsättrringen av element har genomsökts, säkerställs att medelåtkomsttiden för hela referensuppsättningen av element minimeras i en grupp eller på gruppbasis.By dividing all the elements of the first part of the memory into groups according to their frequency of use, as described in claim 2, the size of each group of elements is limited by the size of the second part of the memory, and by downloading the groups of elements to the second part of the memory in order of decreasing average usage frequency, and by continuing the scan until the relevant element has been found or until all groups of elements constituting the reference set of elements have been searched, it is ensured that the average access time for the whole reference set of elements is minimized in a group or group basis.

Genom att, som beskrivs i krav 3, utföra genomsökningen i den relativt sett mindre och snabbare andra delen av minnet i en given ordning, och genom att ordna de individuella elementen i varje grupp efter deras användningsfrekvens, så att elementet med den högsta användningsfrekvensen genomsöks först, elementet med näst högst användningsfrekvens genomsöks därefter och så vidare, säkerställs att delelementen i en given grupp genomsöks i ordningsföljd efter minskande användningsfrekvens, dvs. att medelsöktiden för en given storlek på det relativt sett 10 15 20 25 30 a ' , .. n. . . .- .. .. .. .. .. - .By, as described in claim 3, performing the scan in the relatively smaller and faster second part of the memory in a given order, and by arranging the individual elements in each group according to their frequency of use, so that the element with the highest frequency of use is searched first , the element with the second highest frequency of use is then searched and so on, it is ensured that the sub-elements in a given group are searched in order of decreasing frequency of use, ie. that the average search time for a given size of the relatively 10 15 20 25 30 a ', .. n. . .- .. .. .. .. .. -.

. .. . . . . .. . . .. . . .. . . .. . H. . . . . . . . . . . . . . . . . . .u ... .. . .. ... . .... . ... . . . . .. - . . . . . - . . . . . . .- ., .. .. .... .. 5 snabbare minnet och en given referensuppsättning av element minimeras (förutsatt att de tillämpade användningsfrekvenserna för elementen avspeglar de faktiska inom rimliga gränser).. ... . . . ... . ... . ... . ... HRS. . . . . . . . . . . . . . . . . . .u ..... .. .... ..... .... . . . .. -. . . . . -. . . . . . .-., .. .. .... .. 5 faster memory and a given reference set of elements is minimized (provided that the applied usage frequencies of the elements reflect the actual within reasonable limits).

En sidoeffekt av den snabbare genomsökningen är en medföljande mindre effektförbruk-ning, vilket t.ex. är viktigt vid portabla enheter. Med andra ord ökar storleken på referensgruppen av element som kan koms åt inom rimlig tid för en given storlek av det relativt sett snabbare minnet jämfört med en mer slumpvis genomgång av referensuppsättningen av element.A side effect of the faster scan is an accompanying lower power consumption, which e.g. is important for portable devices. In other words, the size of the reference group of elements that can be accessed within a reasonable time for a given size of the relatively faster memory increases compared to a more random review of the reference set of elements.

Genom att, som beskrivs i krav 4, dynamiskt uppdatera användningsfrekvensen för de individuella elementen i minnet, säkerställs att söktiden upprätthålls vid ett även om användningsvanor ändras.By, as described in claim 4, dynamically updating the frequency of use of the individual elements in the memory, it is ensured that the search time is maintained at one even if usage habits change.

Genom att, som beskrivs i krav 5, kopiera utvalda element i två eller ﬂer grupper av element, säkerställs att en kombination av kort söktid för de mest frekvent använda orden och en minimering av söktiden för vissa 'högt prioriterade ord” uppnås. De speciella orden kan vara användarfrekventa ord eller andra ord med hög prioritet.By, as described in claim 5, copying selected elements into two or grupper groups of elements, it is ensured that a combination of short search time for the most frequently used words and a minimization of the search time for certain 'high priority words' is achieved. The special words can be user-frequency words or other high-priority words.

Genom att, som beskrivs i krav 6, den relativt sett större och långsamrnare första delen av minnet utförs såsom ett permanent minne och den relativt sett mindre och snabbare andra delen av minnet såsom ett icke permanent minne, säkerställs att referensuppsättningen av element upprätthålls i minnet under i frånslaget läge och att den snabbare delen av minnet kan utföras i en mängd anpassade storlekar och åtkomsttider enligt ansökningen.By, as described in claim 6, the relatively larger and slower first part of the memory is designed as a permanent memory and the relatively smaller and faster second part of the memory as a non-permanent memory, it is ensured that the reference set of elements is maintained in the memory during in the off mode and that the faster part of the memory can be performed in a variety of custom sizes and access times according to the application.

Genom att, som beskrivs i krav 7, ladda grupperna av element med den högsta användningsfrekvensen in i det icke permanenta minnet i påslaget läge, säkerställs att tillämpningen är direkt redo för användning när enheten som inrymmer tillämpningen slås på efter frånslaget läge. Detta är speciellt fördelaktigt för tillämpningar där effekten är frånslagen när tillämpningen inte används (t. ex. i en mobiltelefon). 10 15 20 25 30 523 066 - . - . s 1 I - . . .e I fall att, som beskrivs i krav 8, referensuppsättningen av element är den sarnplade och digitaliserade avbildningen av talsignalerna för en referensuppsättriing av igenkännbara ord (benänmt referensvokabulären) i ett tal-igenkänningssystem, och tillämpningen är en tal-igenkännare som genomsöker referensuppsättningen av element för överensstärnning med den samplade och digitaliserade avbildningen av ett givet talat inmatningsord, säkerställs att medelsöktiden för orden med den högsta användningsfrekvensen liksom medelsöktiden för hela referensvokabulären för en given storlek på det icke permanenta minnet minimeras.By, as described in claim 7, loading the groups of elements with the highest frequency of use into the non-permanent memory in the on position, it is ensured that the application is immediately ready for use when the unit accommodating the application is switched on after the off position. This is especially advantageous for applications where the power is turned off when the application is not in use (eg in a mobile phone). 10 15 20 25 30 523 066 -. -. s 1 I -. . In case, as described in claim 8, the reference set of elements is the plotted and digitized representation of the speech signals for a reference set of recognizable words (referred to as the reference vocabulary) in a speech recognition system, and the application is a speech recognizer which scans the reference set of elements for matching the sampled and digitized image of a given spoken input word, ensures that the average search time of the words with the highest frequency of use as well as the average search time of the entire reference vocabulary for a given size of the non-permanent memory is minimized.

Genom att, som beskrivs i krav 9, varje element i referensuppsätmingen av element förknippas med en digital etikett innefattande en eller ﬂera tecken som skall förknippas med en adress i ett register, ett kommando eller ett telefonnurmner etc., säkerställs att varje element enkelt och otvetydigt kan identifieras och att alla elementen kan användas för att styra olika funktioner eller processer.By, as described in claim 9, each element in the reference set of elements is associated with a digital label comprising one or more characters to be associated with an address in a register, a command or a telephone number, etc., it is ensured that each element is simple and unambiguous. can be identified and that all the elements can be used to control different functions or processes.

Genom att, som beskrivs i krav 10, bygga in samma eller ﬂera speciella ord, för vilka snabb åtkomlighet är av högsta vikt, i alla grupper som utgör referensvokabulären, säkerställs att speciella högt prioriterade ord får kortast möjliga åtkomsttid, oberoende av gruppema med ord som för tillfället är laddade i det icke permanenta minnet.By, as described in claim 10, incorporating the same or ﬂ your special words, for which quick access is of paramount importance, into all groups that make up the reference vocabulary, ensures that special high-priority words have the shortest possible access time, regardless of the groups of words are currently loaded in the non-permanent memory.

Genom att, som beskrivs i krav 11, basera referensuppsätmirigen av igenkärmbara ord, lagrade i det permanenta minnet, och de inmatningsord som skall överensstämma med ord som uttalas med samma röst, säkerställs att kraven på komplexiteten hos tal-igenkänningsalgorinnen reduceras.By, as described in claim 11, basing the reference array of recognizable words, stored in the permanent memory, and the input words to correspond to words pronounced with the same voice, it is ensured that the requirements on the complexity of the speech recognition algorithm are reduced.

Genom att, som beskrivs i krav 12, låta lagringen av referensuppsättningen av igenkännbara ord i det permanenta minnet och tilldelningen av digitala etiketter och förväntade användningsﬁekvenser för varje ord utföras av användaren innan tal- igenkärmaren används, säkerställs att systemet anpassas till användarens behov. 10 15 20 25 30 523 066 - . . ~ - u - . - - .- Genom att, som beskrivs i krav 13, välja referensuppsättningen av igenkännbara ord, lagrade i det permanenta minnet, bland två eller flera uppsättningar, beroende på användaren eller ämnet som skall hanteras, säkerställs att tillämpningen kan optimeras i enlighet med den särskilda användaren eller situationen.By allowing, as described in claim 12, allowing the reference set of recognizable words to be stored in the permanent memory and the assignment of digital labels and expected usage sequences for each word to be performed by the user before the speech recognizer is used, the system is adapted to the user's needs. 10 15 20 25 30 523 066 -. . ~ - u -. - - .- Selecting, as described in claim 13, the reference set of recognizable words, stored in the permanent memory, among two or more sets, depending on the user or substance to be handled, ensures that the application can be optimized according to the specific the user or the situation.

Som beskrivs i krav 14, åstadkommer föreliggande uppfinning vidare en apparat, som innefattar en processor som inrymmer en tillämpning, och ett minne, där minnet består av en relativt sett större och långsammare första del, och en relativt sett mindre och snabbare andra del, vilken andra del är direkt tillgänglig för genomsökning från tillämpningen, där närrmda apparat är anpassad att reducera medelåtkomsttiden för en referensuppsättning av element lagrade i minnet, där nänmda element åtkoms i minnet med olika frekvenser, där nämnda referensuppsättning av element lagras i nämnda relativt sett större och långsammare första del, och en del av elementen, som är lagrade i den första delen av minnet, laddas ned till den andra delen på begäran av tillämpningen. Genom att arrangera så att apparaten är anpassad att samla dessa element i en grupp i den första delen av minnet som har den högsta användnings-frekvensen, där storleken på nämnda grupp begränsas av storleken på den andra delen av minnet, och genom att apparaten är anpassad att ladda ned närrmda grupp av element till den andra delen av minnet efter varje avslutning av ett frånslaget läge, och genomsöka det först, säkerställs att medelsöktiden för elementen i en given referensgrupp av element reduceras för en viss storlek av det relativt snabba minnet.As described in claim 14, the present invention further provides an apparatus comprising a processor accommodating an application, and a memory, wherein the memory consists of a relatively larger and slower first part, and a relatively smaller and faster second part, which second part is directly accessible for scanning from the application, where said apparatus is adapted to reduce the average access time of a reference set of elements stored in the memory, where said elements are accessed in the memory with different frequencies, where said reference set of elements is stored in said relatively larger and slower first part, and a part of the elements, which are stored in the first part of the memory, are downloaded to the second part at the request of the application. By arranging so that the apparatus is adapted to collect these elements in a group in the first part of the memory which has the highest frequency of use, where the size of said group is limited by the size of the second part of the memory, and by the apparatus being adapted downloading the approximate group of elements to the second part of the memory after each end of a disabled mode, and scanning it first, ensures that the average search time for the elements in a given reference group of elements is reduced for a certain size of the relatively fast memory.

Genom att, som beskrivs i krav 15, arrangera så att apparaten är anpassad att dela upp elementen i den första delen av minnet i grupper efter deras användningsfrekvens, där storleken på varje grupp av element begränsas av storleken på den andra delen av minnet, och att ladda ned grupperna av element till den andra delen av minnet i ordningsföljd efter minskande medelanvändningsfrekvens, och att fortsätta genomsöknjngen tills det relevanta elementet har hittats eller tills alla grupper av element som utgör 10 15 20 25 30 523 066 - . - . en . - . n a. 8 referensuppsättningen av element har genomsökts, säkerställs att medelåtkomsttiden för hela referensuppsätmingen av element minimeras för en grupp på gnippbasis.By arranging, as described in claim 15, that the apparatus is adapted to divide the elements of the first part of the memory into groups according to their frequency of use, where the size of each group of elements is limited by the size of the second part of the memory, and that download the groups of elements to the second part of the memory in order of decreasing average frequency of use, and to continue the search until the relevant element has been found or until all groups of elements constituting 10 15 20 25 30 523 066 -. -. one . -. n a. 8 the reference set of elements has been scanned, it is ensured that the average access time for the entire reference set of elements is minimized for a group on a group basis.

Genom att, som beskrivs i krav 16, fysiskt placera den relativt sett större och långsannnare första delen av minnet och den relativt sett mindre och snabbare andra delen av minnet i olika enheter, där nämnda enheter kan utbyta information, uppnås en högre ﬂexibilitetsnivâ i tillämpningen enligt uppfmningen.By, as described in claim 16, physically placing the relatively larger and slower first part of the memory and the relatively smaller and faster second part of the memory in different units, where said units can exchange information, a higher ibil level of flexibility is achieved in the application according to the invention.

Genom att, som beskrivs i krav 17, utföra den relativt sett större och långsamrnare första delen av minnet som ett permanent minne och den relativt sett mindre och snabbare andra delen av minnet som ett icke permanent minne, säkerställs att referensupp-sättriingen av element upprätthålls i minnet i frånslaget läge och att den snabbare delen av minnet kan utföras i en mängd anpassade storlekar och åtkomsttider enligt tillämpningen.By making the relatively larger and slower first part of the memory as a permanent memory and the relatively smaller and faster second part of the memory as a non-permanent memory, as described in claim 17, it is ensured that the reference arrangement of elements is maintained in the memory in the off mode and that the faster portion of the memory can be made in a variety of custom sizes and access times according to the application.

Genom att, som beskrivs i krav 18, arrangera så att apparaten är anpassad att ladda gruppen av element med den högsta anvåndningsfrekvensen in i det icke permanenta minnet vid påslagning, säkerställs att tillämpningen är direkt redo att användas när enheten som inrymmer tillämpningen slås på efter frånslaget läge. Detta är speciellt fördelaktigt för tillämpningar där effekten slås från när tillämpningen inte används (t.ex. i en mobiltelefon).By arranging, as described in claim 18, so that the apparatus is adapted to charge the group of elements with the highest frequency of use into the non-permanent memory during switch-on, it is ensured that the application is immediately ready for use when the unit containing the application is switched on after switch-off. location. This is especially beneficial for applications where the power is turned off when the application is not in use (eg in a mobile phone).

I det fall, som beskrivs i krav 19, att apparaten är anpassad att sampla och digitalisera och lagra i ett minne talsignalema i en referensuppsåttning av igenkärmbara ord som utgör referensuppsättningen av element i ett tal- igenkärmingssystem, och av att tillämpningen är en tal-igenkärmare som genomsöker referensuppsättningen av element efter överensstämmande med den sarnplade och digitaliserade avbildningen för ett givet talat inmatningsord, säkerställs att medelsöktiden för orden med den högsta användningsfrekvensen såväl som medelsöktiden för hela referensvokabulären för en given storlek på det icke permanenta minnet minimeras. 10 15 20 25 30 525 066 - . » ø u- c - o ~ u.In the case described in claim 19, that the apparatus is adapted to sample and digitize and store in a memory the speech signals in a reference set of recognizable words constituting the reference set of elements in a speech recognition system, and in that the application is a speech recognizer which scans the reference set of elements in accordance with the plotted and digitized image of a given spoken input word, ensures that the average search time of the words with the highest usage frequency as well as the average search time of the entire reference vocabulary for a given non-permanent memory size is minimized. 10 15 20 25 30 525 066 -. »Ø u- c - o ~ u.

Genom att, som beskrivs i krav 20, integrera åtminstone det icke pennanenta minnet och tal-igenkännaren i en enhet, vars tillgång på elektrisk effekt är begränsad, t.ex. en cellulär telefon, säkerställs att fördelarna med uppfinningen utnyttjas helt.By, as described in claim 20, integrating at least the non-pencil memory and speech recognizer in a unit whose access to electrical power is limited, e.g. a cellular telephone, it is ensured that the benefits of the invention are fully utilized.

Uppfinningen kan användas i alla tillämpningssyften för en begränsning av den maximalt tillåtna effektförbrukningen som begränsar storleken på det relativt sett snabbare minnet som skall genomsökas med hjälp av en sökalgoritm. Exempel på tillämpningar av uppfurrringen i samband med tal-igenkänning för röst-styrningen av olika mobila tillämpningar, såsom mobiltelefoner, sökare, portabla speltillämpningar, tillämpningar i allmänhet där ”handsfree” hantering är fördelaktig, t.ex. i en bil eller liknande.The invention can be used for all application purposes for a limitation of the maximum allowable power consumption which limits the size of the relatively faster memory to be searched by means of a search algorithm. Examples of voice recognition applications in connection with voice recognition for voice control of various mobile applications, such as mobile phones, viewfinders, portable gaming applications, applications in general where "hands-free" handling is advantageous, e.g. in a car or the like.

Kort beskrivning av ritningen: En föredragen utföringsform av uppfinningen i ett tal-igenkänningssystem för röst- styrning av en mobiltelefon, som t.ex. används i en bil för ”handsfree” hantering av telefonen, kommer att beskrivas i det följande med hänvisning till ritningen, där ﬁg. 1 visar uppdelningen av referensvokabulären som lagras i ett pennanent minne och laddnirrgen av en grupp av ord i ett icke permanent minne i ett tal- igenkänningssystem enligt uppfinningen, och ﬁg. 2 visar ett tal-igenkänningssystem enligt uppfinningen med de individuella orden i varje grupp indikerade, och ﬁg. 3 visar ett tal-igenkänningssystem enligt uppfinningen, där de individuella orden i varje grupp är irrdikerade och där ett högt prioriterat ord inryms i alla grupper, och ﬁg. 4 visar ett tal-igenkänningssystem enligt uppfinningen, där digitala etiketter anvisar till de individuella orden i varje grupp, och fig. 5 visar uppdelningen av referensvokabulären i grupper av ord och de anvisade parametrarna enligt uppfinningen, och ﬁg. 6 visar en cellulär telefon enligt uppfinningen. 10 15 20 25 30 523 066 - . - - n e 10 Detaljerad beskrivning av utföringsformema enligt unnﬁrrrrin@: Fig. 1 visar minnet 2 (delat i en relativt sett långsarnmare del 3 och en relativt sett snabbare del 4) och processorn 9 i ett tal-igenkänningssystem för röststymingen i en mobiltelefon. Fig. 1 visar ett relativt sett långsammare, permanent minne 3 (en EEPROM), där grupper 6, 7, 20 (G#1, G#2, ..., G#N) av sarnplade och digitaliserade avbildningar av talade ord, som tillsammans utgör en referensvokabulär 1 (REF-VOC) i ett tal-igenkänningssystem, är lagrade; ett relativt sett snabbare, icke permanent minne 4 (benärrmt nedan), till vilket utvalda grupper av ord laddas ned från det permanenta minnet 3; och en digital signalprocessor 9 (DSP) som inrymmer tal-igenkänningstillärnpningen 5 (SR), som har åtkomst till RAM 4 för en genomsökning bland de lagrade elementen för ett överensstämmande med ett talat inmatningsord. I ritningen visas RAM 4, som inrymmer den nedladdade gruppen av ord 8, helt och hållet extemt placerad av DSP 9. Den kan likväl vara helt och hållet internt placerad 10 i DSP 9 eller delad däremellan. I ﬁg. la laddas gruppen 6 (G#1), som representerar de mest användningsfrekventa orden (identifierade av parametem 30 (FG1)), ned från EEPROM 3 till RAM 4. I ﬁg. lb laddas gruppen 7 (G#2), som representerar de näst mest användningsfrekventa orden (identiﬁerade av parametem 31 (FG2)), ned från EEPROM 3 till RAM 4.Brief Description of the Drawing: A preferred embodiment of the invention in a speech recognition system for voice control of a mobile telephone, such as used in a car for “hands-free” handling of the phone, will be described in the following with reference to the drawing, where ﬁ g. 1 shows the division of the reference vocabulary stored in a pennant memory and the charge of a group of words in a non-permanent memory in a speech recognition system according to the invention, and ﬁ g. 2 shows a speech recognition system according to the invention with the individual words in each group indicated, and ﬁ g. 3 shows a speech recognition system according to the invention, where the individual words in each group are irredicated and where a high priority word is contained in all groups, and ﬁ g. Fig. 4 shows a speech recognition system according to the invention, where digital labels refer to the individual words in each group, and Fig. 5 shows the division of the reference vocabulary into groups of words and the indicated parameters according to the invention, and ﬁ g. 6 shows a cellular telephone according to the invention. 10 15 20 25 30 523 066 -. Detailed description of the embodiments according to the invention: Fig. 1 shows the memory 2 (divided into a relatively slower part 3 and a relatively faster part 4) and the processor 9 in a speech recognition system for the voice control in a mobile telephone. Fig. 1 shows a relatively slower, permanent memory 3 (an EEPROM), where groups 6, 7, 20 (G # 1, G # 2, ..., G # N) of sarnplate and digitized images of spoken words, which together constitute a reference vocabulary 1 (REF-VOC) in a speech recognition system, are stored; a relatively faster, non-permanent memory 4 (referred to below), to which selected groups of words are downloaded from the permanent memory 3; and a digital signal processor 9 (DSP) housing the speech recognition interface 5 (SR), which has access to RAM 4 for a scan among the stored elements for a match with a spoken input word. The drawing shows RAM 4, which houses the downloaded group of words 8, entirely extremely located by DSP 9. It may nevertheless be entirely internally located 10 in DSP 9 or divided therebetween. I ﬁ g. la, group 6 (G # 1), which represents the most frequently used words (identified by parameter 30 (FG1)), is downloaded from EEPROM 3 to RAM 4. I ﬁ g. lb, group 7 (G # 2), which represents the second most frequently used words (identified by parameter 31 (FG2)), is downloaded from EEPROM 3 to RAM 4.

I detta sammanhang används uttrycket ”referensvokabulär” eller bara ”vokabulär” i betydelsen en uppsättning av samplade och digitaliserade avbildningar av de talade orden som är avsedda att igenkännas av tal-igenkänningssystemet. De sarnplade och digitaliserade avbildningarna av varje talat ord (Wid- i ñg. 2-5), som är lagrade i ett permanent rninne 3, förknippas med motsvarande digitala ”etiketter” (LWU i ﬁg. 4- 5), som kan användas som styrinmatningar för vidare behandling. 10 15 20 25 30 o n nu; o o en o. nu ou no oo o~ a s e q -ø a o n» n a o. n .a a n . a u. n o n. . a. .nu an. a. o o- »nu a ana; a u a u u. » . c < n q n . a n o n. a. nu o» ll I det följande används termen ”ord” generellt i betydelsen den sarnplade och digitaliserade avbildnirigen av motsvarande talade ord.In this context, the term "reference vocabulary" or simply "vocabulary" is used to mean a set of sampled and digitized images of the spoken words that are intended to be recognized by the speech recognition system. The complex and digitized images of each spoken word (Wid- i ñg. 2-5), which are stored in a permanent rninne 3, are associated with corresponding digital “labels” (LWU in ﬁ g. 4- 5), which can be used as control inputs for further processing. 10 15 20 25 30 o n nu; o o en o. nu ou no oo o ~ a s e q -ø a o n »n a o. n .a a n. a u. n o n. a. .nu an. a. o o- »nu a ana; a u a u u. ». c <n q n. a n o n. a. nu o »ll In the following, the term“ word ”is generally used in the sense of the complex and digitized depiction of corresponding spoken words.

Enligt uppfinningen är en användarspecifik referensvokabulär lagrad i ett pennanent minne och uppdelad i N grupper av ord (N 2 2) efter användningsfrekvensen som anvisats dem. Varje grupp av ord identifieras med hjälp av medelanvändnjrigsfrekvensen (FGi i ﬁg. 1 och 5) för dess separata ord.According to the invention, a user-specific reference vocabulary is stored in a pen memory and divided into N groups of words (N 2 2) according to the frequency of use assigned to them. Each group of words is identified by the mean use frequency (FGi in ﬁ g. 1 and 5) of its separate words.

I detta sammanhang används uttrycket ”användningsfrekvensen” i betydelsen antingen ”den förväntade användningsﬁekvensef, baserad på förväntningar eller prioriteringar, vilket t.ex. tillämpas i en startsituation, eller ”den faktiska användningsfrekvensen” (dvs. den historiskt erfarna/upplevda, möjligen dynamiskt uppdaterade användningsﬁekvensen) eller en kombination därav. Uttrycket ”medelanvändnings-frekvensen för en grupp” (FGi i ﬁg. 1 och 5) används allrnänt i betydelsen genom-snittsvärdet av användningsfrekvensema (Fwü i ﬁg. 5) för de element som utgör gruppen ((l/n)* Xﬁnn (Fwij ) = FGi, där n är antalet element i gruppen). I en speciell utföringsfonn av uppfuiningen är emellertid anvisningen till medelanvändningsfrekvenserna 30, 31, 32 (FGI, FG2, FGN) för de individuella grupperna 6, 7, 20 (G#l, G#2, ..., G#N) baserad på en ”bästa gissning” eller ”användarprioritet”.In this context, the term “frequency of use” is used in the sense of either “the sequence of expected use”, based on expectations or priorities, which e.g. applied in a start-up situation, or “the actual frequency of use” (ie the historically experienced / experienced, possibly dynamically updated use sequence) or a combination thereof. The term “average frequency of use for a group” (FGi in ﬁ g. 1 and 5) is generally used in the sense of the average value of the frequencies of use (Fwü in ﬁ g. 5) for the elements that make up the group ((l / n) * X ﬁ nn (Fwij ) = FGi, where n is the number of elements in the group). However, in a particular embodiment of the invention, the instruction for the average use frequencies 30, 31, 32 (FGI, FG2, FGN) for the individual groups 6, 7, 20 (G # 1, G # 2, ..., G # N) is based on a "best guess" or "user priority".

Den maximala storleken på varje grupp av ord 6, 7, 20 bestäms av storleken på det snabba icke pennanenta minnet 4, till vilket tal-igenkämiingsalgoritrnen 5, som är inrymd i en processor 9 (t.ex. en DSP), har tillträde. På ritningen är det maximala antalet ord benämnt n.The maximum size of each group of words 6, 7, 20 is determined by the size of the fast non-pencil memory 4, to which the speech recognition algorithm 5 housed in a processor 9 (eg a DSP) has access. In the drawing, the maximum number of words is called n.

Grupperna (G#1, G#2, ..., G#N), som tillsammans utgör referensvokabulären 1, är anordnade i ordningsföljd efter minskande användningsfrekvens för orden i varje grupp, dvs. den första gruppen av ord 6 (G#1) har den högsta 10 15 20 25 30 n a »en o a o: nu no ao nu nu u u u - o f. q n o u u n o a u a n ~ u 1 a n 1 . . n a u a a - g n n a »nu u; 4 n n u n u e o o .nu n n - s u a. « - ~ . a a a u n n u. u. o. u. . . n Q a. 12 medelanvändningsfrekvensen 30 (FGl), den andra gruppen av ord 7 (G#2) har den näst högsta medelanvändningsfrekvensen 31 (FG2), etc.The groups (G # 1, G # 2, ..., G # N), which together constitute the reference vocabulary 1, are arranged in order according to the decreasing frequency of use of the words in each group, ie. the first group of words 6 (G # 1) has the highest 10 15 20 25 30 n a »en o a o: nu no ao nu nu u u u - o f. q n o u u n o a u a n ~ u 1 a n 1. . n a u a a - g n n a »nu u; 4 n n u n u e o o .nu n n - s u a. «- ~. a a a u n n u u. u. o. u. . n Q a. 12 the average usage frequency 30 (FG1), the second group of words 7 (G # 2) has the second highest average usage frequency 31 (FG2), etc.

I en speciell utföringsforrn av uppfinningen, illustrerad i fig. 2, är i 11, 12, 18; 14, 15, 19 (W11, Wl2, ..., Wln och W21, W22, ..., W2n i grupperna G#1 resp.In a particular embodiment of the invention, illustrated in Fig. 2, are in 11, 12, 18; 14, 15, 19 (W11, W12, ..., W1n and W21, W22, ..., W2n in the groups G # 1 resp.

G#2) inom varje grupp vidare anordnade i ordningsföljd efter minskande användningsfrekvens för att säkerställa att delorden i en given grupp genomsöks i nedstigande ordning efter användningsfrekvens (jfr fig. 5). I ﬁg. 2a laddas gruppen 6 (G#l), som representerar de mest användarfrekventa orden, ned från EEPROM 3 till RAM 4. I ﬁg. 2b laddas gruppen 7 (G#2), som representerar de näst mest användarfrekventa orden, ned från EEPROM 3 till RAM 4.G # 2) within each group further arranged in order of decreasing frequency of use to ensure that the subwords in a given group are searched in descending order of frequency of use (cf. Fig. 5). I ﬁ g. 2a, group 6 (G # 1), which represents the most user-frequent words, is downloaded from EEPROM 3 to RAM 4. I ﬁ g. 2b, group 7 (G # 2), which represents the second most frequently used words, is downloaded from EEPROM 3 to RAM 4.

I en speciell utföringsform av uppfinningen, som visas i ﬁg. 3, ges vissa ord med hög prioritet, som visas med 17; 171 (), som representerar t.ex. ett larmtele- fonnurnmer som överensstämmer med ordet ”alarm” i en mobiltelefontillämpning, en speciell rankning som inte överensstämmer med deras förväntade användningsfrekvens. Sådana högprioritetsord kan, som visas i ﬁg. 3, inkluderas i alla gruppema såsom det första elementet (de första elementen) i genomsökningen, för att säkerställa minimal söktid för sådana ord. I ﬁg. 3a laddas gruppen 6 (G#l), som representerar de mest användarfrekventa orden, ned från EEPROM 3 till RAM 4. I ﬁg. 3b laddas gruppen 7 (G#2), som representerar de näst mest användarfrekventa orden, ned från EEPROM 3 till RAM 4.In a particular embodiment of the invention, shown in Figs. 3, are given certain high priority words, which are indicated by 17; 171 (), which represents e.g. an alarm telephone number that corresponds to the word “alarm” in a mobile phone application, a special ranking that does not correspond to their expected frequency of use. Such high-priority words can, as shown in ﬁ g. 3, is included in all groups as the first element (s) of the scan, to ensure minimal search time for such words. I ﬁ g. 3a, group 6 (G # 1), which represents the most user-frequent words, is downloaded from EEPROM 3 to RAM 4. I ﬁ g. 3b, group 7 (G # 2), which represents the second most user-frequent words, is downloaded from EEPROM 3 to RAM 4.

Då tillämpning är påslagen laddas gruppen av ord med den högsta (förväntade) förekomstfrekvensen in i det snabba minnet 4 (RAM). Vid det först förekommande tal-igenkänningstillfallet efter att tillämpningen har slagits på, kommer tal- igenkännaren 5 (SR) att genomsöka efter ett överensstärmnande med föreliggande inmatade ”ord” bland de ord 8 som för tillfället är lagrade i RAM 4.When application is on, the group of words with the highest (expected) occurrence frequency is loaded into the fast memory 4 (RAM). In the first speech recognition event after the application has been turned on, the speech recognition 5 (SR) will scan for a match with the present entered "words" among the words 8 currently stored in RAM 4.

Valet av den relevanta gruppen av ord för nedladdning till det snabba rninnet 4, 10 15 20 25 30 523 066 I . - - u.The selection of the relevant group of words for download to the fast rinninnet 4, 10 15 20 25 30 523 066 I. - - u.

QQQQ un l3 för den händelse att ingen överensstärrnnande firms mellan det aktuella talade inmatningsordet och gruppen av ord 8 som för tillfället är lagrade i det snabba minnet 4, är otvetydig om referensvokabulären 1 är uppdelad i endast två grupper av ord 6, 7 (G#l, G#2, N=2): Gruppen av ord, som för tillfället inte finns i RAM 4, laddas ned från EEPROM 3. Om ﬂer än två grupper finns, kan proceduren utföras på åtskilliga sätt. I en föredragen utföringsfonn anvisas ett ”prioritetsnumrner” som motsvarar till medelanvändningsfrekvensen för gruppen till varje grupp av ord, och tal-igenkännings-systemet är anpassat att genomsöka dem i ordningsföljd efter minskande medelanvänd-ningsfrekvens. Om ingen överensståmrnande fmns i gruppen av ord 6 med den högsta användningsfrekvensen, laddas gruppen av ord 7 med den näst högsta användningsfrekvensen in i RAM 4 från EEPROM 3, och genomsökningen fortsätter bland denna grupp av ord och så vidare, tills antingen ett överensstämrnande har hittats eller hela referensvokabulären 1 har genomsökts.QQQQ un l3 in the event that there is no match between the currently spoken input word and the group of words 8 currently stored in the fast memory 4, is unambiguous if the reference vocabulary 1 is divided into only two groups of words 6, 7 (G # l, G # 2, N = 2): The group of words, which are not currently in RAM 4, is downloaded from EEPROM 3. If there are more than two groups, the procedure can be performed in several ways. In a preferred embodiment, a "priority numbers" corresponding to the average usage frequency of the group for each group of words are assigned, and the speech recognition system is adapted to scan them in order of decreasing average usage frequency. If no match is found in the group of words 6 with the highest usage frequency, the group of words 7 with the second highest usage frequency is loaded into RAM 4 from EEPROM 3, and the search continues among this group of words and so on, until either a match has been found. or the entire reference vocabulary 1 has been searched.

Mer allmänt uttryckt: Om ingen överensstärmnande hittas i gruppen av ord som fn. är lagrade i det snabba minnet, laddas gruppen av ord med nästföljande prioritetsnurnrner relativt den aktuella gruppen ned. Om inte heller någon överensstämmande hittas i nästa grupp av ord, fortsätts proceduren tills antingen ett överensstämrnande har hittats eller tills alla grupper av ord i tur och ordning har laddats ned till det snabba minnet och sålunda alla ord i vokabulären har genomsökts.More generally: If no concordance is found in the group of words as fn. are stored in the fast memory, the group of words with the next priority numbers relative to the current group is downloaded. If no match is found in the next group of words either, the procedure continues until either a match has been found or until all groups of words in turn have been downloaded to the fast memory and thus all words in the vocabulary have been searched.

Om ett överensstärnmande hittas bland de mest användarfrekventa orden (G#1), blir söktiden minimal. För de mindre användarfrekventa orden, som ingår i andra grupper av ord än gruppen av ord med den högsta medelanvändningsfrekvensen, blir responstiden längre, och desto längre ju lägre medel-(förväntad)- användningsfrekvens för gruppen som de tillhör. Detta är troligtvis en acceptabel procedur för de mest relevanta tillämpningama. Den faktiska responstiden i detta fall beror, förutom på åtkomsttiden i det snabba minnet 4, även på åtkomsttiden för det permanenta minnet 3 och på tiden för nedladdning av den relevanta gruppen av ord till det snabba minnet 4. 10 15 20 25 30 u - a o en 14 Om ett överensstämrnande har hittats i en grupp av ord, som inte är den grupp som har de mest användarfrekventa orden, förblir denna särskilda grupp av ord lagrad i det snabba minnet 4 när igenkänningsprocessen har avslutats. Återigen kan olika strategier användas för att förbereda nästa tal-igenkänningssteg utan en mellanliggande frånslagning. En strategi kan vara att alltid avsluta en lyckad eller misslyckad igenkänningsprocess genom att ladda ned gruppen av ord 6 (G#1) med den högsta medelanvändningsfrekvensen. En annan strategi kan vara att avsluta en misslyckad igenkänningsprocess genom nedladdning av gruppen av ord 6 (G#1) med den högsta medelanvändningsﬁekvensen och att avsluta en lyckad igenkänningsprocess genom att behålla den aktuella gruppen av ord 8 i det snabba minnet 4 och därefter, om nästa igenkänningsprocess inte hittar någon överensstämmande i den aktuella gruppen av ord 8, att ladda den grupp av ord med den högsta medelanvändningsfrekvensen annan än den aktuella gruppen, och därefter fortsätta genomsökningen i enlighet med den procedur som normalt används efter påslagning. Andra enklare eller mer sofistikerade scheman kan också användas.If a match is found among the most user-frequent words (G # 1), the search time will be minimal. For the less user-frequent words, which are part of groups of words other than the group of words with the highest average usage frequency, the response time becomes longer, and the longer the lower the average (expected) usage frequency of the group to which they belong. This is probably an acceptable procedure for the most relevant applications. The actual response time in this case depends, in addition to the access time in the fast memory 4, also on the access time of the permanent memory 3 and on the time for downloading the relevant group of words to the fast memory 4. 10 15 20 25 30 u - ao en 14 If a match has been found in a group of words, which is not the group that has the most user-frequent words, this particular group of words remains stored in the fast memory 4 when the recognition process is completed. Again, different strategies can be used to prepare for the next speech recognition step without an intermediate shutdown. One strategy may be to always end a successful or unsuccessful recognition process by downloading the group of words 6 (G # 1) with the highest average usage frequency. Another strategy may be to end a failed recognition process by downloading the group of words 6 (G # 1) with the highest average usage sequence and to end a successful recognition process by keeping the current group of words 8 in the fast memory 4 and then, if the next recognition process does not find any match in the current group of words 8, to load the group of words with the highest average usage frequency other than the current group, and then continue the scan according to the procedure normally used after switching on. Other simpler or more sophisticated schemes can also be used.

Det är viktigt att notera att varje ord i gruppen av ord 8, som för tillfället finns i det snabba minnet 4 och sålunda är primära sökobjekt för tal-igenkänningsalgoritrnen 5, inte behöver genomsökas i sin helhet för att bestämma om det talade inmatningsordet överensstämmer med det aktuella sökordet eller ej. Med användning av tidigare känd teknik kan den sökalgoritmen 5 som används begränsa sitt genomsökande till endast delar av det aktuella ordet som skall överensstämrnas.It is important to note that each word in the group of words 8, which is currently in the fast memory 4 and thus is the primary search object for the speech recognition algorithms 5, does not need to be searched in its entirety to determine if the spoken input word matches the current keyword or not. Using prior art, the search algorithm 5 used can limit its search to only parts of the current word to be matched.

Detta är fallet om sarmolikheten för överensstärnmande, efter jämförelse med en viss del av ordet, är lägre än ett förbestärnt värde. Detta hjälper till att den genomsnittliga söktiden hålls kort och därmed hålls kraftförbrukningen ner.This is the case if the probability of conformity, after comparison with a certain part of the word, is lower than a predetermined value. This helps to keep the average search time short and thus keeps power consumption down.

Före användning måste tal-igenkänningssystemet till en början ”anpassas” eller ”tränas” av användaren i det att han eller hon talar orden, som valts såsom önskvärda igenkärmbara ord 1, in i en mikrofon för lagring i ett permanent minne 3 10 15 20 25 30 an non a n on nn nu nu v nn n» nn o n nn c p nu n u en n n. n nn n a» n u nn n nn »n nn n n» nan n -nan n a n an n ~ n ~ n n n » n n :n nn nn n. n n c n nn 15 efter att de har samplats och digitaliserats. Det permanenta minnet 3 kan, om så är praktiskt, vara en del av enheten som inrymmer det icke pennanenta minnet 4 och tal-igenkärma-ren 5, eller så kan det vara fysiskt beläget i en arman enhet med vilken systemet kan utbyta information, tex. en därefter anpassad PC, portföljdator, personlig digital assistent (PDA), laddare eller liknande. I en föredragen utföringsform av uppfinningen är det permanenta minnet 3 och det icke permanenta minnet 4 och tal-igenkärmaren 5 integrerade i en cellulär telefon och systemet är dessutom anpassat för att kunna ladda en referensvokabulär från andra källor, såsom en PC, portföljdator, PDA, laddare eller liknande. lnformationsutbytet kan utföras med hjälp av en lokalkabel eller en sladdlös anslutning (t.ex. genom att använda ”bluetooth” standard) eller en kopplad (dvs. nätverksbaserad) íjärranslutning mellan de två enhetema ifråga. Denna inledande ”träning” av systemet kan utföras antingen på systemet som inrymmer tal-igenkännaren 5 eller i vilken som helst arman därefter anpassad enhet, såsom de ovannämnda.Prior to use, the speech recognition system must initially be "adapted" or "trained" by the user in speaking the words selected as desirable recognizable words 1 into a microphone for storage in a permanent memory 3 10 15 20 25 30 an non an on nn nu nu v nn n »nn on nn cp nu nu en n n. N nn na» nu nn n nn »n nn nn» nan n -nan nan an n ~ n ~ nnn »nn: n nn nn n. nncn nn 15 after they have been compiled and digitized. The permanent memory 3 may, if practical, be a part of the unit which houses the non-pencil memory 4 and the speech recognizer 5, or it may be physically located in another unit with which the system can exchange information, e.g. . a custom PC, briefcase, personal digital assistant (PDA), charger or similar. In a preferred embodiment of the invention, the permanent memory 3 and the non-permanent memory 4 and the speech recognizer 5 are integrated in a cellular telephone and the system is further adapted to be able to load a reference vocabulary from other sources, such as a PC, portfolio computer, PDA, charger or similar. The exchange of information can be carried out by means of a local cable or a wireless connection (eg by using the "bluetooth" standard) or a connected (ie network-based) interconnection between the two devices in question. This initial "training" of the system can be performed either on the system which houses the speech recognizer 5 or in any unit subsequently adapted unit, such as the above-mentioned.

Den lagrade vokabulären 1 är därmed användarspecifik i form av röstkarakteristiken likväl som i fonn av de faktiska orden som innefattar vokabulären. Det faktum att vokabulären av ord 1, som skall igenkännas av tal-igenkänningssystemet, är lagrad med hjälp av användarens egen röst (i motsats till en ”tredje parts” röst) reducerar till stora delar kraven på komplexitet hos tal-igenkänningsalgorimien 5. Prestandan hos systemet kan ökas ytterligare genom att användaren ges vissa instruktioner för hur orden bör uttalas under träningen av systemet såväl som under norrnal användning.The stored vocabulary 1 is thus user-specific in the form of the voice characteristic as well as in the form of the actual words which comprise the vocabulary. The fact that the vocabulary of word 1 to be recognized by the speech recognition system is stored using the user's own voice (as opposed to a "third party" voice) greatly reduces the complexity requirements of the speech recognition algorithm 5. The performance of the system can be further enhanced by giving the user certain instructions on how the words should be pronounced during the training of the system as well as during normal use.

Som en del av begynnelseproceduren måste de lagrade talade orden sättas i samverkan med sina motsvarande digitala etiketter 21, 22, 23, 24; 211, 25, 26, 27 etc. (exempliﬁerat med , Lwu, Lwm, ..., Lwml; , Lwm, Lwu, ..., Lw2,,,-1 för grupperna 6 (G#l) resp. 7 (G#2)). Detta illustreras i ﬁg. 4, där etiketten Lwu anvisas till j-ordet WU i gruppen G#i. Etiketten 21; 211 () anvisas till de speciella högprioritetsordet , inkluderat i alla gruppema. I ﬁg. 4a laddas 10 15 20 25 30 n n a.. o u nu nu n» nu n u u a o nu a n n o a. n o nu n a a .a n. a a. n a n ~ oc o i. s. n ana uno aa ß .a :oo u n. a a a u n ~ c oa o - . - a e s 1 a - a. a. a. a. 16 gruppen 6 (G#1), som innehåller orden med den högsta användningsfrekvensen, ned till RAM 4. I ﬁg. 4b laddas gruppen 7 (G#2), som innehåller orden med den näst högsta användningsﬁekvensen, ned till RAM 4. Även om uttrycket ”ord” används, menas den mer allmänna termen ”1jud”, eftersom de akustiska elementen lagrade i minnet skulle kunna vara vilken kombination som helst av akustiska signaler, t. ex. från en vissling, från en biltuta, ett hundskall, etc.As part of the initial procedure, the stored spoken words must be associated with their corresponding digital labels 21, 22, 23, 24; 211, 25, 26, 27 etc. (exemplified by, Lwu, Lwm, ..., Lwml;, Lwm, Lwu, ..., Lw2 ,,, - 1 for groups 6 (G # 1) and 7 ( G # 2)). This is illustrated in ﬁ g. 4, where the label Lwu is assigned to the j-word WU in the group G # i. Labels 21; 211 () is assigned to the special high priority word, included in all groups. I ﬁ g. 4a laddas 10 15 20 25 30 nn a .. ou nu nu n »nu nuuao nu anno a. No nu naa .a n. A a. Nan ~ oc o is n ana uno aa ß .a: oo u n. Aaaun ~ c oa o -. - a e s 1 a - a. a. a. a. 16 group 6 (G # 1), which contains the words with the highest frequency of use, down to RAM 4. I ﬁ g. 4b, group 7 (G # 2), which contains the words with the second highest usage sequence, is downloaded to RAM 4. Although the term “word” is used, the more general term “1sound” is meant, as the acoustic elements stored in memory could be any combination of acoustic signals, e.g. from a whistle, from a car horn, a dog's skull, etc.

Den digitala etiketten för ett givet ord kan ta många fonner, t.ex. 0 ett nummer, om det talade ordet är *ett nummer, som möjligtvis kan visas och/eller slås i en kommunikationsenhet (t.ex. ett talat ord ”ett” översätts till siffran ”l”), eller 0 en adress i ett register eller ett telefonnummer som förknippas med ordet som uttalas (t.ex. det talade ordet ”hem” översätts till siffroma som utgör ett hemtelefonnumrner till användaren ifråga), eller o själva det uttalade ordet kan möjligtvis visas på en skärm (t.ex. det talade ordet ”radio” översätts till bokstävema ”r a d i o”).The digital label for a given word can take many forms, e.g. 0 a number, if the spoken word is * a number, which may be displayed and / or dialed in a communication device (eg a spoken word "a" is translated to the number "l"), or 0 an address in a register or a telephone number associated with the word being pronounced (for example, the spoken word 'home' is translated into the digits which constitute a home telephone number of the user in question), or o the spoken word itself may be displayed on a screen (eg the the spoken word "radio" is translated into the letters "radio").

Fig. 5 visar ett exempel på uppdelningen av referensvokabulären 1, lagrade i det permanenta minnet 3, i grupper av ord 6, 7, 20 (G#1, G#2, ..., G#N) och de anvisade parametrarna enligt uppfinningen. Referensvokabulären 1 (REF-VOC) är uppdelad i N grupper av ord (G#i, i=l, 2, ..., N), där var och en består av n ord (WU, i=l, 2, ..., N; j=1, 2, ..., n) (eller tomrum, om den inte helt utnyttjas) med etiketter Lwij- och användningsfrekvenser 40, 41, 42; 43, 44, 45; 46, 47, 48 (Fwij- i grupperna G#l, G#2 resp. G#N) anvisade till dem. En medelanvändningsfrekvens 30, 3 l, 32 (FGi) anvisas till varje grupp av ord (6, 7, 20 för grupperna 6, 7 resp. 20).Fig. 5 shows an example of the division of the reference vocabulary 1, stored in the permanent memory 3, into groups of words 6, 7, 20 (G # 1, G # 2, ..., G # N) and the indicated parameters according to the invention. The reference vocabulary 1 (REF-VOC) is divided into N groups of words (G # i, i = 1, 2, ..., N), each of which consists of n words (WU, i = 1, 2,. .., N; j = 1, 2, ..., n) (or void, if not fully utilized) with labels Lwij and frequency of use 40, 41, 42; 43, 44, 45; 46, 47, 48 (Fwij- in groups G # 1, G # 2 and G # N respectively) assigned to them. An average usage frequency 30, 3 l, 32 (FGi) is assigned to each group of words (6, 7, 20 for groups 6, 7 and 20, respectively).

I utföringsfonnen i ﬁg. 5 är det speciella högprioritetsordet med motsvarande etikett Lpwl innefattat i alla grupperna (vilket lämnar endast n-l 10 15 20 25 30 523 066 . ~ » » i. ø - Q ø . . . n. 17 ”vanliga” ord i varje grupp). Ingen användningsfrekvens anvisas till sådana ord (dvs. deras närvaro påverkar inte medelanvändningsfrekvensen för gruppen (grupperna) ifråga).In the execution form in ﬁ g. 5 is the special high-priority word with the corresponding label Lpwl included in all the groups (leaving only n-l 10 15 20 25 30 523 066. ~ »» I. Ø - Q ø... N. 17 "ordinary" words in each group). No frequency of use is assigned to such words (ie their presence does not affect the average frequency of use of the group (s) in question).

Systemet kan anpassas för att karma igen en speciell röst eller en ”allmän” röst eller både och. I en speciell utföringsfonn av uppfinningen förladdas en standardvokabulär med motsvarande digitala blind-etiketter till det permanenta minnet 3 för att inspirera och modiﬁeras av användaren.The system can be adapted to karma again a special voice or a "general" voice or both. In a special embodiment of the invention, a standard vocabulary with corresponding digital blind labels is preloaded into the permanent memory 3 to inspire and modify the user.

Innehållet i grupperna av ord i vilka vokabulären är uppdelad kan ändras, dvs. den ordning i vilken de individuella orden utväljs med hjälp av tal-igenkännaren 5 för jämförelse med det föreliggande talade inmatrringsordet kan ändras eller individuella ord kan ersättas med nya. Naturligtvis kan detta göras manuellt under träningen av systemet, varvid olika användningsfrekvenser anvisas till de befintliga orden eller så byts vokabulären helt och hållet. Prioritetsordningen för de individuella orden i den befintliga vokabulären kan emellertid även ändras dynamiskt som funktion av den faktiskt erfarna användningsfrekvensen för orden. En strategi kan vara att växla ett ord som verkligen förekommer som inmatning ett steg uppåt i ”sökningshierar- kin” (dvs. i den ordning som minnet genomsöks). En annan strategi kan att anvisa en användningsfrekvensetikett till varje ord och att höja nämnda etikett varje gång ordet används. Denna information kan sedan användas för att uppdatera sökordningen för de individuella grupperna och referensvokabulären som sådan. Ännu en annan strategi kunde vara att ﬂytta ned orden i ”sökhierarkin” enligt en datumstänrpel på senaste förekomst. Andra enklare eller mer soﬁstikerade scheman för dynamisk uppdatering av prioritetsordningen för de individuella orden i vokabulären kan också användas.The content of the groups of words into which the vocabulary is divided can be changed, ie. the order in which the individual words are selected by means of the speech recognizer 5 for comparison with the present spoken input word can be changed or individual words can be replaced with new ones. Of course, this can be done manually during the training of the system, whereby different frequencies of use are assigned to the existing words or the vocabulary is changed completely. However, the order of priority of the individual words in the existing vocabulary can also be changed dynamically as a function of the actually experienced frequency of use of the words. One strategy may be to switch a word that actually occurs as input one step up in the “search hierarchy” (ie in the order in which the memory is searched). Another strategy may be to assign a frequency of use label to each word and to raise said label each time the word is used. This information can then be used to update the search order for the individual groups and the reference vocabulary as such. Yet another strategy could be to flatten the words in the “search hierarchy” according to a date string on the most recent occurrence. Other simpler or more sophisticated schemas for dynamically updating the order of priority for the individual words in the vocabulary can also be used.

Olika uppsättningar av vokabulärer 1, dvs. uppsättningar av sarnplade och digitaliserade avbildningar av talade ord och deras motsvarande digitala etiketter, kan lagras i ett permanent minne 3. En första inmatning till systemet, som indikerar den relevanta vokabulär som skall användas, tillförs av användaren, t.ex. genom att 10 15 20 25 30 523 066 18 välja från en meny eller genom att ge en påloggnings-id, ett speciellt nyckelord eller lik-nande. Altemativt kan systemet anpassas för att kunna välja mellan olika vokabulärer på basis av en tal-inmatning, t.ex. genom att göra skillnad mellan olika användar-röster eller på basis av olika ord från samma användare.Different sets of vocabulary 1, ie. sets of sarnplate and digitized images of spoken words and their corresponding digital labels, can be stored in a permanent memory 3. A first input to the system, which indicates the relevant vocabulary to be used, is supplied by the user, e.g. by selecting from a menu or by providing a login ID, a special keyword or the like. Alternatively, the system can be adapted to be able to choose between different vocabularies on the basis of a speech input, e.g. by distinguishing between different user voices or on the basis of different words from the same user.

Gruppema av ord kan väljas i enlighet med det änme som skall behandlas (dvs. olika ord eller grupper av ordsom har den högsta förväntade förekomstfrekvensen, beroende på ”tillämpningen”, t.ex. efter användaren ifråga).The groups of words can be selected according to the subject to be treated (ie different words or groups of words have the highest expected frequency of occurrence, depending on the "application", eg after the user in question).

Fig. 6 visar en cellulär telefon 50 enligt uppfinningen. Telefonen är anpassad för röststyming av uppringningsprocessen. De permanenta 3 och icke permanenta delarna 4 av minnet 2 är integrerade i telefonen 50 tillsammans med tal- igenkännaren 5, som är inrymd i processorn 9. Användaren 45 uttalar ett inmatningsord 40 (i exemplet i ﬁg. 6: ”HEM!”), vilket upptas av mikrofonen 51 i den cellulära telefonen 50. Ordet samplas och digitaliseras och används i den överensstämningsprocessen av tal-igenkärmaren i det att gruppen av ”ord” 8, som för tillfället är inladdade i det icke permanenta minnet 4, jämförs ett och ett med inmatade ”ord” 40. Om ett överensstärnmande bland orden 8 i det icke permanenta minnet 4 erhålls, laddas motsvarande digitala etikett (i exemplet i ﬁg. 6: ”HEM” och motsvarande telefonnummer ”8765432”) till skärmen 52 på den cellulära telefonen 50 för bekräftelse av användaren 45. I en särskild utföringsforrn är den cellulära telefonen anpassad att utföra ”handlingen”, dvs. uppringningen av telefonnurnret 53 som hör till det föreslagna överensstämrnande ordet, om förslaget accepteras av an- vändaren genom ett uttalat ”JA” (eller något annat användardefmierat ord för god- kännande), och att fortsätta genomsökningen om det föreslagna matchande ordet avvisas av användaren 45 genom ett uttalat ”NEJ” (eller något armat användardefniierat nekande ord). Om man antar att gruppen av ord som f.n. är lagrad i det snabba minnet 4, är den grupp av ord 6 som har den högsta användningsfrekvensen: om ingen överensstämmande hittas i denna grupp, laddas gruppen av ord 7 med den näst högsta användningsfrekvensen in i RAM 4 från EEPROM 3, och genomsökningen fortsätter i denna grupp av ord och så vidare tills 10 15 19 antingen ett överensstämmande ord har hittats eller hela referensvokabulären 1 har genomsökts.Fig. 6 shows a cellular telephone 50 according to the invention. The phone is adapted for voice control of the dialing process. The permanent 3 and non-permanent parts 4 of the memory 2 are integrated in the telephone 50 together with the speech recognizer 5, which is housed in the processor 9. The user 45 pronounces an input word 40 (in the example in ﬁ g. 6: "HOME!"), which is picked up by the microphone 51 of the cellular telephone 50. The word is sampled and digitized and used in the matching process by the speech pager in that the group of "words" 8 currently loaded in the non-permanent memory 4 is compared one by one. with entered "words" 40. If a match among the words 8 in the non-permanent memory 4 is obtained, the corresponding digital label (in the example in ﬁ g. 6: "HOME" and the corresponding telephone number "8765432") is loaded to the screen 52 of the cellular telephone 50 for confirmation by the user 45. In a particular embodiment, the cellular telephone is adapted to perform the "action", i.e. the dialing of the telephone number 53 belonging to the proposed matching word, if the proposal is accepted by the user by a pronounced “YES” (or any other user-defined word for approval), and to continue the scan if the proposed matching word is rejected by the user 45 by a pronounced "NO" (or some other user-defined negative word). Assuming that the group of words currently is stored in the fast memory 4, is the group of words 6 that has the highest frequency of use: if no match is found in this group, the group of words 7 with the second highest frequency of use is loaded into RAM 4 from EEPROM 3, and the scan continues in this group of words and so on until either a matching word has been found or the entire reference vocabulary 1 has been searched.

I de olika utföringsforrnema som beskrivits ovan har man antagit att den relativt sett större och långsammare delen av minnet är ett permanent minne och att den relativt sett mindre och snabbare delen är ett icke permanent minne. Det kan emellertid, beroende på omständigheterna, också vara tvärtom i det att den relativt sett större och lång-sarnrnare delen är ett icke permanent minne och den relativt sett mindre och snabbare delen är ett permanent minne. Vidare kan båda delarna av minnet vara permanenta eller båda delarna vara icke permanenta. Även om uppfinningen har beskrivits i samband med röststymingen av en mobiltelefon, kan samma begrepp användas i andra tillämpningar, där användningen av ett relativt sett snabbare minne är begränsad på grund av effektförbrukningsbegränsningar, pris eller andra begränsningar.In the various embodiments described above, it has been assumed that the relatively larger and slower part of the memory is a permanent memory and that the relatively smaller and faster part is a non-permanent memory. However, depending on the circumstances, it may also be the other way around in that the relatively larger and longer part is a non-permanent memory and the relatively smaller and faster part is a permanent memory. Furthermore, both parts of the memory may be permanent or both parts may be non-permanent. Although the invention has been described in connection with the voice control of a mobile telephone, the same term may be used in other applications, where the use of a relatively faster memory is limited due to power consumption limitations, price or other limitations.

Claims

1. 0 15 20 25 523 066. . u. -a. . A method for reducing the mean access time of a reference set of elements (1) in a memory (2), wherein said elements (1) are accessed in the memory (2) at different frequencies, wherein said memory (2) comprises a relatively larger and slower first part (3), in which all elements in the reference set of elements (1) are stored, and a relatively smaller and faster second part (4), which is directly accessible for scanning from an application ( 5) and to which second part (4) a part of the elements stored in the first part (3) is downloaded at the request of the application (5), and which application (5) is occasionally in the off position, characterized in that they elements in the first part of the memory with the highest frequency of use are collected in a group (6), the size of which is limited by the size of the second part of the memory (4), and that said group of elements (6) is downloaded to the second part (4) of the rninnet after each end of a switched off position, and g one is searched first. .

Method according to claim 1, characterized in that the elements (1) in the first part (3) of the memory are divided into groups (6, 7, 20) according to their frequency of use, the size of each group of elements being limited by the size of the second part (4) of the memory, and that the groups (6, 7, 20) of elements are downloaded to the second part (4) of the memory in order of decreasing average frequency of use (30, 31, 32), and that the scanning is continued until the relevant the element has been found or until all groups of elements (6, 7, 20) that make up the reference set of elements (1) have been searched. .

Method according to claim 1 or 2, characterized in that the scanning in the relatively smaller and faster second part (4) of the memory is performed in a given order where the individual elements (1 1, 12, 13; 14, 15, 16) in each group (6; 7) are arranged according to their frequency of use (40, 41, 42; 43, 44, 45) so that 10 15 20 25 30 oon »uu .e a. a» .nn u. oon u. ee . . -. n. u. n o p. u s v. n u e a u. o. n. u p. n .nu nu: »n a a.> -. v un »n n o a a u o. ~. ~ n u u o. ~ e .- v. n. n. 21 the element with the highest frequency of use is searched first and the element with the second highest frequency of use is then searched, and so on. .

Method according to one of Claims 1 to 3, characterized in that the frequency of use (40, 41, 42, 43, 44, 45, 46, 47, 48) of the individual elements in the memory (2) is dynamically updated. .

Method according to one of Claims 1 to 4, characterized in that selected elements (17) are copied into two or more groups (6, 7, 20) of elements. .

Method according to one of Claims 1 to 5, characterized in that the relatively larger and longer arm first part of the memory is a permanent memory (3), and the relatively smaller and faster second part of the memory is a non-permanent memory (4). ). .

Method according to Claim 6, characterized in that the group of elements (6) with the highest frequency of use is loaded into the non-permanent memory (4) during switch-on. .

Method according to one of Claims 1 to 7, characterized in that the reference set of elements are the sampled and digitized images of the speech signals in the reference set of recognizable words (1) in a speech recognition system, and that the application is a speech recognizer ( 5) which scans the reference set of elements (1) for a match with the sampled and digitized image of a given spoken input word. .

Method according to claim 8, characterized in that each element in the reference set of elements (1) is associated with a digital label (21, 22, 23, 24, 211, 26, 27) consisting of one or more characters to be linked. with an address in a register, a command, or a telephone number, etc. 10 15 20 25 30 523 066 ~ - -. - u 22

Method according to one of Claims 8 to 9, characterized in that the same or ﬂ your special words (17), for which rapid availability is of the utmost importance, are incorporated in each group (6, 7, 20) which constitutes the reference set of elements. (1).

Method according to one of Claims 8 to 10, characterized in that the reference set of recognizable words (1) stored in the permanent memory (3) and the input word to be matched are based on words pronounced with the same voice.

Method according to claim 11, characterized in that the storage of the reference set of recognizable words (1) in the pencil memory (3) and the assignment of digital labels (22, 23, 23, 25, 26, 27) and expected frequencies of use (40, 41, 42, 43, 44, 45) for each word (11, 12, 13, 14, 15, 16) is performed by the user before the speech recognizer (5) is used.

Method according to one of Claims 8 to 12, characterized in that the reference set of recognizable words (1) stored in the permanent memory (3) can be selected from two or upps sets, depending on the user or the substance to be handled.

Apparatus, comprising a processor (9), housing an application (5), and a memory (2), the memory consisting of a relatively larger and slower first part (3), and a relatively smaller and faster second part (4), which second part (4) is directly accessible for scanning by the application (5), wherein said apparatus is adapted to reduce the average access time for a reference set of elements (1) stored in the memory (2), said elements (1) accessed in the memory (2) with different frequencies, where said reference set of elements (1) is stored in said relatively larger and slower first part (3), and a part of the elements, which are stored in the first part of the memory ( 3) downloaded to the 10 15 20 25 30 523 066 - ~ - u »-. . . p. 23 second part (4) at the request of the application (5), and which application (5) is occasionally in the off position, characterized in that the apparatus is adapted to collect in a group (6) the elements in the first part of the memory (3), which has the highest frequency of use, where the size of said group (6) is limited by the size of the second part of the memory (4), and by the fact that the apparatus is adapted to download said group of elements (6) to the the second part (4) of the memory after each end of a disabled mode, and to scan it first.

Apparatus according to claim 14, characterized in that the apparatus is adapted to divide the elements (1) in the first part (3) of the memory into groups (6, 7, 20) according to their frequency of use, where the size of each group of elements is limited of the size of the second part (4) of the memory, and to download the groups (6, 7, 20) of elements to the second part (4) of the memory in order of decreasing average usage sequence (30, 31, 32), and that continue the search until the relevant element has been found or until all groups of elements (6, 7, 20), which make up the reference set of elements (2), have been searched.

Apparatus according to claim 14 or 15, characterized in that the relatively larger and slower first part (3) of the memory and the relatively smaller and faster second part (4) of the memory are physically located in different units, where adjacent units can exchange information.

Apparatus according to any one of claims 14-16, characterized in that the relatively larger and slower first part of the memory is a permanent memory (3), and the relatively smaller and faster second part of the memory is a non-permanent memory (4). ).

Apparatus according to one of Claims 14 to 17, characterized in that the apparatus is adapted to charge the group of elements (6) with the highest frequency of use into the non-permanent memory (4) when switched on. 525 066. . a. n. .... »- 24

Apparatus according to any one of claims 14-18, characterized in that the apparatus is adapted to sample and digitize and store in a memory (2) the speech signals in a reference set of recognizable words (1), which constitutes the reference set of 5 elements in a speech. recognition system, and in that the application is a speech recognizer (5) which scans the reference set of recognizable words (1) for a match with the sampled and digitized image of a given spoken input word (40). 10

Apparatus according to one of Claims 17 to 19, characterized in that at least the non-permanent memory (4) and the speech sensor (5) are integrated in a unit whose access to electrical power is limited, e.g. a cellular telephone (50).