FI20216113A1 - Speech recognition optimisation for service centres - Google Patents
Speech recognition optimisation for service centres Download PDFInfo
- Publication number
- FI20216113A1 FI20216113A1 FI20216113A FI20216113A FI20216113A1 FI 20216113 A1 FI20216113 A1 FI 20216113A1 FI 20216113 A FI20216113 A FI 20216113A FI 20216113 A FI20216113 A FI 20216113A FI 20216113 A1 FI20216113 A1 FI 20216113A1
- Authority
- FI
- Finland
- Prior art keywords
- speech recognition
- information
- recognition model
- domain specific
- spoken
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 54
- 238000004590 computer program Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims description 27
- 230000003993 interaction Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 7
- 238000010801 machine learning Methods 0.000 claims description 6
- 238000012544 monitoring process Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 2
- 230000015654 memory Effects 0.000 description 26
- 238000004891 communication Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 239000007784 solid electrolyte Substances 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/015—Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
- G06Q30/016—After-sales
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/32—Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Human Resources & Organizations (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Primary Health Care (AREA)
- Navigation (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
A method, apparatus, and computer program for speech recognition optimisation for service centres and for producing same, including: receiving (600) spoken information from a person; and automatically: converting (601) the spoken information to textual information using a generic speech recognition model; determining (602) a contact reason based on the textual information; selecting (603) a domain specific speech recognition model according to the determined contact reason; and using (604) the selected domain specific speech recognition model for obtaining information spoken by the person.
Description
SPEECH RECOGNITION OPTIMISATION FOR SERVICE CENTRES
The present disclosure generally relates to speech recognition optimisation for service centres.
This section illustrates useful background information without admission of any technique described herein representative of the state of the art.
Man-machine interfaces have been first implemented using switches and buttons. Modern speech recognition technology has enabled also inputting information from speech of human users. Hence, the user need not physically interface with a particular control element. Resulting hands-free operation is technically beneficial in various situations, such as on driving a car, but also in numerous other cases as well. Service centres may receive information from users through written contact forms, but also through phone calls. In fact, spoken form is often most convenient for the user, because of ease of delivery for the user. — Transition from keyboard equipped computers to smart phones as a platform accents this need: it is far more efficient to convey information by speech and far more ubiquitous.
Moreover, in speech delivery, there is no need to adapt a keyboard to provide language dependent characters, such as a or o umlauts that are missing from English.
Unfortunately, speech recognition is not egually easy for the recipient end. Different people have different vocal tracts and different people speak using different dialects and slangs.
There may are also often different background noises. Mobile communication channels may cause intermittent loss of audio, so some utterances get distorted or lost. Hence, the
SN accuracy of speech recognition tends to vary both temporally and by individual.
O
5 One approach to improved accuracy is the use of individual speech recognition models. For o 25 example, the user may train a speech recognition system by reading given texts.
N Improvement in the accuracy may be significant, but user perception may be poor.
I
T Moreover, before the user has trained her speech recognition system, no individual model
O is available and a generic one must be used. Furthermore, some users might not want their © data to be used for improving a speech recognition engine.
N
N 30 Some speech recognition systems may have models for specific subject matter such as medical terminology. A speech recognition system with a medical terminology specific model may be useful in recognising dictation of doctors and surgeons, for example. A surgeon may dictate notes during operation. However, such specialised systems are difficult to deploy as each field of enterprise or human activity must be provided with a tailored system. Moreover, if the surgeon were to use such a tailored speech recognition system for any other purpose, the speech recognition would be prone to fail.
It is desirable to provide or improve a man-machine interface capable of receiving information from arbitrary people without prior individual training and without restriction to a particular terminology specific model, as such needs are particularly present in speech recognition for service centres. Alternatively, it is desirable to provide a new technical alternative or alternatives to existing technology.
The appended claims define the scope of protection. Any examples and technical descriptions of apparatuses, products and/or methods in the description and/or drawings not covered by the claims are presented not as embodiments of the invention but as background art or examples useful for understanding the invention.
According to a first example aspect there is provided a method for speech recognition optimisation for service centres, comprising: receiving spoken information from a person; and automatically: converting the spoken information to textual information using a generic speech recognition model; determining a contact reason based on the textual information; selecting a domain specific speech recognition model according to the determined contact reason; and using the selected domain specific speech recognition model for obtaining information spoken by the person.
O 25 Advantageously, by determining the contact reason and accordingly selecting and using the
O domain specific speech recognition model, man-machine interface can be optimised for 5, information acguisition from a human being. Further advantageously, by using domain - specific speech recognition models, accuracy of speech recognition can be improved using
E earlier acguired topical knowledge. Since the speech recognition model need not be user = 30 specific, deployment of the system can be significantly facilitated and / or broader base of = training material may be acquired than with individual speech recognition models. Further
S advantageously, the person in guestion does not bind the speech recognition to a given narrow model. Hence, the method may provide for improved accuracy without restriction to a particular terminology specific model.
Further advantageously, by improving accuracy of the speech recognition in service centres, the hardware use efficiency may be improved. On reducing errors, the person may successfully deliver information with fewer attempts to rephrase or correct automatic speech recognition. This may result in improved service capacity and so reduce hardware required to obtain a desired man-machine information delivery throughput.
The determining of the contact reason may be performed based on the textual information and contact reason history data obtained from a plurality of persons.
The using of the selected domain specific speech recognition model for obtaining information spoken by the person may comprise converting the spoken information to — textual information using the domain specific speech recognition model. Alternatively, the using of the selected domain specific speech recognition model for obtaining information spoken by the person may comprise converting subsequently received spoken information subsequently received from the user to textual information using the domain specific speech recognition model.
The spoken information may be received as one submission and processed in the method.
Alternatively, first spoken information may be received from the person and processed according the first example aspect, and thereafter second spoken information may be received from the person used as a source for obtaining the information spoken by the person using the selected domain specific speech recognition model.
The method may further comprise maintaining a plurality of domain specific speech recognition models.
The method may further comprise monitoring current contact reason and updating the contact reason and contact reason specific speech recognition model accordingly if a _ change of the contact reason is identified during the monitoring.
S 25 Advantageously, by updating the contact reason, the speech recognition accuracy may be 2 optimised even if the person changed the contact reason in the spoken information.
N According to a second example aspect there is provided an apparatus for speech
E recognition optimisation for service centres, comprising: 0 an input for receiving spoken information from a person; 5 30 means for automatically converting the spoken information to textual information
N using a generic speech recognition model;
N means for automatically determining a contact reason based on the textual information;
means for automatically selecting a domain specific speech recognition model according to the determined contact reason; and means for automatically using the selected domain specific speech recognition model for obtaining information spoken by the person.
According to a third example aspect there is provided a computer program comprising computer executable program code which when executed by at least one processor causes an apparatus at least to perform the method of the first example aspect.
According to a fourth example aspect there is provided a computer program product comprising a non-transitory computer readable medium having the computer program of the third example aspect stored thereon.
According to a fifth example aspect there is provided a method for producing a speech recognition optimisation system for service centres, comprising: forming in the system an input for receiving spoken information from a person; forming in the system an automated process of converting the spoken information to — textual information using a generic speech recognition model; forming in the system an automated process of determining a contact reason based on the textual information; forming in the system an automated process of selecting a domain specific speech recognition model according to the determined contact reason; and forming in the system an automated process of using the selected domain specific speech recognition model for obtaining information spoken by the person.
The method may further comprise forming in the system the domain specific speech recognition model. The domain specific speech recognition model may be formed from earlier interaction data obtained from different persons using a contact reason classification.
N 25 Thecontactreason classification may comprise classifying the earlier interaction data items
N by contact reason; and accordingly building domain specific speech recognition models for 2 different contact reasons.
N According to a sixth example aspect there is provided an apparatus for producing a speech = recognition optimisation system for service centres, comprising:
O 30 means for forming in the system an input for receiving spoken information from a © person;
O means for forming in the system an automated process of converting the spoken information to textual information using a generic speech recognition model; means for forming in the system an automated process of determining a contact reason based on the textual information; means for forming in the system an automated process of selecting a domain specific speech recognition model according to the determined contact reason; and means for forming in the system an automated process of using the selected domain 5 specific speech recognition model for obtaining information spoken by the person.
According to a seventh example aspect there is provided a computer program comprising computer executable program code which when executed by at least one processor causes an apparatus at least to perform the method of the fifth example aspect.
According to an eighth example aspect there is provided a computer program product comprising a non-transitory computer readable medium having the computer program of the seventh example aspect stored thereon.
Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette; optical storage; magnetic storage; holographic storage; opto-magnetic storage; phase-change memory; resistive random-access memory; magnetic random-access — memory; solid-electrolyte memory; ferroelectric random-access memory; organic memory; or polymer memory. The memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer; a chip set; and a sub assembly of an electronic device.
According to a ninth example aspect there is provided an apparatus comprising at least one memory and at least one processor configured to at least perform the method of any preceding aspect.
Different non-binding example aspects and embodiments have been illustrated in the foregoing. The embodiments in the foregoing are used merely to explain selected aspects
N 25 — or steps that may be utilized in different implementations. Some embodiments may be 5 presented only with reference to certain example aspects. It should be appreciated that o corresponding embodiments may apply to other example aspects as well.
N
O Some example embodiments will be described with reference to the accompanying figures, © 30 in which:
O Fig. 1 shows a process of performing speech recognition of an example embodiment;
Fig. 2 shows a schematic illustration of an automatic speech recognition system 200 according to an example embodiment;
Fig. 3 schematically illustrates producing domain specific speech recognition models according to an example embodiment;
Fig. 4 shows a block diagram of an ASR optimised service centre;
Fig. 5 shows a block diagram of an apparatus suited for performing operation of the ASR system and / or producing the ASR system;
Fig. 6 illustrates a process for speech recognition optimisation for service centres; and
Fig. 7 illustrates a process for producing a speech recognition optimisation system for service centres.
Inthe following description, like reference signs denote like elements or steps.
In an example embodiment, a solution is provided that improves the performance of speech recognition systems by using data from a Customer Relationship Management (CRM) system or from some other suitable system(s) to select domain specific data for training automatic speech recognition systems and to classify incoming contact messages based on a contact reason information provided in the CRM data. This process is performed with automatic speech recognition (ASR) and optionally using Natural Language Processing (NLP), such as Bidirectional Encoder Representations from Transformers (BERT).
The solution of this example embodiment may be capable to - Learn to identify contact reasons from textual contact inguiries available in the CRM; - Scrape all possible text materials from a company's customer-interaction data in order to build domain specific Automatic Speech Recognition (ASR) systems, or domain specific speech recognition models, by finetuning a language model component; and / or - select corresponding domain specific speech recognition model based on the contact reason.
S 25 Fig. 1 shows a process of performing speech recognition of an example embodiment with > further reference to Fig. 2 that shows a schematic illustration of an automatic speech o recognition system 200 according to an example embodiment. In the process, 110: the ASR - system 200 takes as input a speech signal 210 and produces the corresponding text 280 a as an output. Depending on implementation or case, the speech signal 210 may be a real-
O 30 time signal or a not, such as a recorded audio message. The process further comprises © 120: extracting speech features 230 by a feature extractor 220 from the input signal. For
O example, this may involve turning a speech signal into a numerical representation. The process continues by 130: passing the extracted speech features 230 through an Acoustic
Model (AM) 240 to a decoder 270. In an example embodiment, the AM 240 contains statistical representations of phonemes. The process further comprises 140: the decoder 270 keeps track of the produced phonemes until reaching a silence phoneme; 150: the decoder 270 looks for identified phonemes are looked in a lexicon 260, such as a pronunciation dictionary containing a mapping from phonemes to words; 160: the decoder 270 scores by a language model 250 that how likely a series of words are to occur with each other. Advantageously, the scoring may allow using statistical knowledge of the language to separate words which might sound similar. The language model 250 may allow deciding between competing interpretations of the acoustic information.
In an example embodiment, the ASR is implemented to produce as accurate text as possible given an input speech. The ASR is used, e.g., in enterprise customer interactions using voice-bots that use ASR. The voice bots may be used to input information and / or to automatically answer guestions, for example, about different topics such as billing or different products and services.
In an example embodiment, the ASR system is trained using transcribed audio data, i.e., audio data together with a corresponding text transcript. There are open source audio data suited for training such systems, especially for common languages such as English.
However, such public data are not available for all use cases. For example, an ASR system trained on audio from general radio programs might perform poorly in a medical domain.
While it is also possible to manually annotate data for specific use-cases, such annotation — work is costly and time consuming. It is thus desirable to improve the language modelling of an ASR system to better select word seguences based on prior knowledge corresponding to the very domain concerned. An advantage of improving the language model only is a capability to leverage large amounts of textual data available within a company and online compared to smaller and more costly amounts of audio-text pairs. — 25 Fig. 3 schematically illustrates producing domain specific speech recognition models
O according to an example embodiment. A contact reason analysis system 310 makes use of
O Customer Relationship Management (CRM) text data 320 and / or customer interaction data © (text) 330 from similar system(s), such as a ticketing system or a service centre that contains - contact reasons for a set of historical customer interactions. The contact-reason & 30 classification tool or analysis system produces and outputs domain specific speech 2 recognition model training data 340 for language models. Different domain specific speech © recognition models 350-1, 350-2, ...350-n are formed for each contact reason. The
O language models may be machine learning models. The language models may be trained using the different domain specific speech recognition models 350-1, 350-2, ...350-n.
Fig. 4 shows a block diagram of an ASR optimised service centre 400. A person, such as customer 410, provides a speech signal to a generic ASR model 420, e.g., by a voice call or by sending a voice message. A contact reason classifying system 430 receives a transcript of the speech signal from the generic ASR model and determines a contact reason. Then, a suitable domain specific speech recognition model selection 440 chooses one of the available domain specific speech recognition models 450, 460, 470. ASR is then performed using the chosen domain specific speech recognition models to decode the speech signal or subsequently received speech into text.
Fig. 5 shows a block diagram of an apparatus 500 suited for performing operation of the
ASR system 200 and / or producing the ASR system 200. It shall be appreciated that the apparatus 500 may be a dedicated device, or a logical apparatus implemented by one or more apparatuses or equipment with further uses. Moreover, the apparatus 500 may be virtualised and / or implemented by cloud computing.
The apparatus 500 comprises a communication interface 510; a processor 520; a user — interface 530; and a memory 540.
The communication interface 510 comprises in an embodiment a wired and/or wireless communication circuitry, such as Ethernet; Wireless LAN; Bluetooth; GSM; CDMA;
WCDMA; LTE; and/or 5G circuitry. The communication interface can be integrated in the apparatus 500 or provided as a part of an adapter, card, or the like, that is attachable to the apparatus 500. The communication interface 510 may support one or more different communication technologies. The apparatus 500 may also or alternatively comprise more than one of the communication interfaces 510.
In this document, a processor may refer to a central processing unit (CPU); a microprocessor; a digital signal processor (DSP); a graphics processing unit; an application — 25 specific integrated circuit (ASIC); a field programmable gate array; a microcontroller; or a
O combination of such elements. 2 The user interface 530 may comprise a circuitry for receiving input from a user of the
N apparatus 500, e.g., via a keyboard; graphical user interface shown on the display of the
E apparatus 500; speech recognition circuitry; or an accessory device; such as a headset; 0 30 and for providing output to the user via, e.g., a graphical user interface or a loudspeaker. © The memory 540 comprises a work memory 542 and a persistent memory 544 configured
O to store computer program code 546 and data 548. The memory 540 may comprise any one or more of: a read-only memory (ROM); a programmable read-only memory (PROM); an erasable programmable read-only memory (EPROM); a random-access memory (RAM);
a flash memory; a data disk; an optical storage; a magnetic storage; a smart card; a solid- state drive (SSD); or the like. The apparatus 500 may comprise a plurality of the memories 540. The memory 540 may be constructed as a part of the apparatus 500 or as an attachment to be inserted into a slot; port; or the like of the apparatus 500 by a user or by another person or by a robot. The memory 540 may serve the sole purpose of storing data or be constructed as a part of an apparatus 500 serving other purposes, such as processing data.
A skilled person appreciates that in addition to the elements shown in Figure 5, the apparatus 500 may comprise other elements, such as microphones; displays; as well as additional circuitry such as input/output (I/O) circuitry; memory chips; application-specific integrated circuits (ASIC); processing circuitry for specific purposes such as source coding/decoding circuitry; channel coding/decoding circuitry; ciphering/deciphering circuitry; and the like. Additionally, the apparatus 500 may comprise a disposable or rechargeable battery (not shown) for powering the apparatus 500 if external power supply — is not available.
Fig. 6 illustrates a process for speech recognition optimisation for service centres, comprising various possible steps including some optional steps while also further steps can be included and/or some of the steps can be performed more than once: 600: receiving spoken information from a person; and automatically: 601: converting the spoken information to textual information using a generic speech recognition model; 602: determining a contact reason based on the textual information; 603: selecting a domain specific speech recognition model according to the determined contact reason; 604: using the selected domain specific speech recognition model for obtaining information = spoken by the person;
N 605: performing the determining the contact reason based on the textual information and 2 contact reason history data obtained from a plurality of persons; e 606: in the using of the selected domain specific speech recognition model for obtaining =E 30 information spoken by the person, converting the spoken information to textual information
N using the domain specific speech recognition model; — 607: in the using of the selected domain specific speech recognition model for obtaining = information spoken by the person, converting subsequently received spoken information
N subseguently received from the user to textual information using the domain specific speech recognition model;
608: receiving the spoken information as one submission and processed in the method; 609: receiving first spoken information the person and processing same according the first example aspect, and thereafter receiving second spoken information from the person used as a source for obtaining the information spoken by the person using the selected domain specific speech recognition model; 610: maintaining a plurality of domain specific speech recognition models; 611: monitoring current contact reason; and / or 612: updating the contact reason and contact reason specific speech recognition model accordingly if a change of the contact reason is identified during the monitoring.
Fig. 7 illustrates a process for producing a speech recognition optimisation system for service centres, comprising various possible steps including some optional steps while also further steps can be included and/or some of the steps can be performed more than once: 700: forming in the system an input for receiving spoken information from a person; 701: forming in the system an automated process of converting the spoken information to — textual information using a generic speech recognition model; 702: forming in the system an automated process of determining a contact reason based on the textual information; 703: forming in the system an automated process of selecting a domain specific speech recognition model according to the determined contact reason; 704: forming in the system an automated process of using the selected domain specific speech recognition model for obtaining information spoken by the person; 705: forming in the system the domain specific speech recognition model; 706: forming the domain specific speech recognition model from earlier interaction data obtained from different persons using a contact reason classification; 707: in the contact reason classification, classifying the earlier interaction data items by — contact reason; and accordingly building domain specific speech recognition models for
O different contact reasons; and / or
O 708: the domain specific speech recognition models comprise machine learning based 2 language models; and the building of the domain specific speech recognition models
T 30 comprise training the machine learning based language models using the earlier interaction
S data. 0 = Any of the afore described methods, method steps, or combinations thereof, may be
N controlled or performed using hardware; software; firmware; or any combination thereof.
N The software and/or hardware may be local; distributed; centralised; virtualised; or any combination thereof. Moreover, any form of computing, including computational intelligence, may be used for controlling or performing any of the afore described methods, method steps, or combinations thereof. Computational intelligence may refer to, for example, any of artificial intelligence; neural networks; fuzzy logics; machine learning; genetic algorithms; evolutionary computation; or any combination thereof.
Various embodiments have been presented. It should be appreciated that in this document, words comprise; include; and contain are each used as open-ended expressions with no intended exclusivity.
The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments a full and informative description of the best mode presently contemplated by the inventors for carrying out the invention. It is however clear to a person skilled in the art that the invention is not restricted to details of the embodiments presented in the foregoing. For example, some of the features of the afore-disclosed example embodiments may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the — principles of the claimed invention, and not in limitation thereof. The scope of the invention is only restricted by the appended patent claims.
N
O
N
O o
N
I a a
O
©
N
O
N
Claims (12)
1. A method for speech recognition optimisation for service centres, comprising: receiving (600) spoken information from a person; and automatically: converting (601) the spoken information to textual information using a generic speech recognition model, determining (602) a contact reason based on the textual information; selecting (603) a domain specific speech recognition model according to the determined contact reason; and using (604) the selected domain specific speech recognition model for obtaining information spoken by the person.
2. The method of claim 1, further comprising performing (605) the determining of the contact reason based on the textual information and contact reason history data obtained from a plurality of persons.
3. The method of claim 1 or 2, further comprising in the using of the selected domain specific speech recognition model for obtaining information spoken by the person, converting (606) spoken information subsequently received from the person to textual information using the domain specific speech recognition model.
4. The method of claim 1 or 2, further comprising monitoring (611) current contact reason and updating (612) the contact reason and contact reason specific speech recognition model accordingly if a change of the contact reason is identified during the monitoring.
5. A method for producing a speech recognition optimisation system for service centres, comprising: forming (700) in the system an input for receiving spoken information from a person; forming (701) in the system an automated process of converting the spoken N 25 information to textual information using a generic speech recognition model; 5 forming (702) in the system an automated process of determining a contact reason o based on the textual information; N forming (703) in the system an automated process of selecting a domain specific = speech recognition model according to the determined contact reason; and O 30 forming (704) in the system an automated process of using the selected domain © specific speech recognition model for obtaining information spoken by the person. N
N 6. The method of claim 5, further comprising forming (705) in the system the domain specific speech recognition model.
7. The method of claim 6, wherein the domain specific speech recognition model is formed (706) from earlier interaction data obtained from different persons using a contact reason classification.
8. The method of claim 7, wherein the contact reason classification comprises classifying (707) the earlier interaction data items by contact reason; and accordingly building domain specific speech recognition models for different contact reasons.
9. The method of claim 8, wherein the domain specific speech recognition models comprise machine learning based language models; and the building of the domain specific speech recognition models comprise training (708) the machine learning based language models using the earlier interaction data.
10. An apparatus for speech recognition optimisation for service centres, comprising means for performing the method of any one of preceding claims.
11. A computer program comprising computer executable program code which when executed by at least one processor causes an apparatus at least to perform the method of any one of claims 1 to 9.
12. A computer program product comprising a non-transitory computer readable medium having the computer program of claim 11 stored thereon. N O N O o N I a a O © N O N
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI20216113A FI20216113A1 (en) | 2021-10-28 | 2021-10-28 | Speech recognition optimisation for service centres |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FI20216113A FI20216113A1 (en) | 2021-10-28 | 2021-10-28 | Speech recognition optimisation for service centres |
Publications (1)
Publication Number | Publication Date |
---|---|
FI20216113A1 true FI20216113A1 (en) | 2023-04-29 |
Family
ID=86144598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
FI20216113A FI20216113A1 (en) | 2021-10-28 | 2021-10-28 | Speech recognition optimisation for service centres |
Country Status (1)
Country | Link |
---|---|
FI (1) | FI20216113A1 (en) |
-
2021
- 2021-10-28 FI FI20216113A patent/FI20216113A1/en unknown
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11361768B2 (en) | Utterance classifier | |
CN1645477B (en) | Automatic speech recognition learning using user corrections | |
US8204748B2 (en) | System and method for providing a textual representation of an audio message to a mobile device | |
EP2943950B1 (en) | Distributed speech unit inventory for tts systems | |
CN107134279A (en) | A kind of voice awakening method, device, terminal and storage medium | |
WO2020226789A1 (en) | Contextual biasing for speech recognition | |
US20160293157A1 (en) | Contextual Voice Action History | |
CN113841195A (en) | Joint endpoint determination and automatic speech recognition | |
CN106713111B (en) | Processing method for adding friends, terminal and server | |
CN106981289A (en) | A kind of identification model training method and system and intelligent terminal | |
KR102415519B1 (en) | Computing Detection Device for AI Voice | |
CN114328867A (en) | Intelligent interruption method and device in man-machine conversation | |
US20220399013A1 (en) | Response method, terminal, and storage medium | |
CN112201275A (en) | Voiceprint segmentation method, voiceprint segmentation device, voiceprint segmentation equipment and readable storage medium | |
CN111768789A (en) | Electronic equipment and method, device and medium for determining identity of voice sender thereof | |
FI20216113A1 (en) | Speech recognition optimisation for service centres | |
CN116564286A (en) | Voice input method and device, storage medium and electronic equipment | |
TWI776296B (en) | Voice response system and voice response method | |
CN115376558A (en) | Role recognition method and device, computer equipment and storage medium | |
FI20216112A1 (en) | Speech recognition optimization | |
CN113936660B (en) | Intelligent speech understanding system with multiple speech understanding engines and interactive method | |
KR20240068723A (en) | Convergence of sound and text expression in an automatic speech recognition system implemented with Rnn-T | |
KR20210150833A (en) | User interfacing device and method for setting wake-up word activating speech recognition | |
CN113887554A (en) | Method and device for processing feedback words |