FI20225480A1 - Computer-implemented method for automated call processing - Google Patents

Computer-implemented method for automated call processing Download PDF

Info

Publication number
FI20225480A1
FI20225480A1 FI20225480A FI20225480A FI20225480A1 FI 20225480 A1 FI20225480 A1 FI 20225480A1 FI 20225480 A FI20225480 A FI 20225480A FI 20225480 A FI20225480 A FI 20225480A FI 20225480 A1 FI20225480 A1 FI 20225480A1
Authority
FI
Finland
Prior art keywords
call
environment
identified
automated
processing system
Prior art date
Application number
FI20225480A
Other languages
Finnish (fi)
Swedish (sv)
Inventor
Honain Derrar
Ville Ruutu
Jussi Ruutu
Original Assignee
Elisa Oyj
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elisa Oyj filed Critical Elisa Oyj
Priority to FI20225480A priority Critical patent/FI20225480A1/en
Priority to PCT/FI2023/050248 priority patent/WO2023233068A1/en
Publication of FI20225480A1 publication Critical patent/FI20225480A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

According to an embodiment, a computer-implemented method for automated call processing comprises: receiving a call from a user; identifying an environment of the user during the call using acoustic scene classification; configuring at least one property of an automated call processing system according to the identified environment; and processing the call at least partially using the automated call processing system.

Description

COMPUTER-IMPLEMENTED METHOD FOR AUTOMATED CALL
PROCESSING
TECHNICAL FIELD
[0001] The present disclosure relates to call pro- cessing, and more particularly to a computer-implemented method for automated call processing, a computing de- vice, and a computer program product.
BACKGROUND
[0002] Automated call processing can utilise various technologies, such as machine learning and automatic speech recognition, to improve the efficiency of call processing. However, there can be various situations in which a user calling a service utilising automated call processing is in a less than ideal environment or sit- uation to interact with the service. This can make the automated call processing to not function properly and reduce the efficiency of the call processing.
N SUMMARY
&
S [0003] This summary is provided to introduce a selec-
O tion of concepts in a simplified form that are further
E described below in the detailed description. This sum- o 25 mary is not intended to identify key features or essen- s tial features of the claimed subject matter, nor is it
N intended to be used to limit the scope of the claimed
N subject matter.
[0004] It is an objective to provide a computer-im- plemented method for automated call processing, a com- puting device, and a computer program product. The fore- going and other objectives are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
[0005] According to a first aspect, a computer-imple- mented method for automated call processing comprises: receiving a call from a user; identifying an environment of the user during the call using acoustic scene clas- sification; configuring at least one property of an au- tomated call processing system according to the identi- fied environment; and processing the call at least par- tially using the automated call processing system. The method can, for example, improve the functionality of the automated call processing system in various differ- ent call conditions.
[0006] In an implementation form of the first aspect, the automated call processing system comprises an auto-
A matic speech recognition system and the configuring the
O at least one property of the automated call processing
O system comprises configuring the automatic speech recog- = nition system according to the identified environment; = 25 and the processing the call using the automated call - processing system comprises interpreting speech data s received from the user during the call using the auto-
N matic speech recognition system. The method can, for
N example, improve the functionality of the automatic speech recognition system in various different call con- ditions.
[0007] In another implementation form of the first aspect, the configuring the automatic speech recognition system according to the identified environment comprises at least one of: configuring the automatic speech recog- nition system according to an amount of noise identified in the environment; and/or configuring a filtering of the automatic speech recognition system according to the identified environment. The method can, for example, improve the functionality of the automatic speech recog- nition system in varying call noise conditions.
[0008] In another implementation form of the first aspect, the configuring the automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environment. The method can, for example, efficiently choose an appropriate automatic speech recognition instance for the call.
N [0009] In another implementation form of the first
O aspect, processing the call using the automated call
O processing system comprises at least one of: configuring = whether the automated call processing system repeats
I 25 information during the call according to the identified - environment; configuring whether the automated call pro- s cessing system sends information to the user using at
N least one communication channel other than the call ac-
N cording to the identified environment; configuring whether the automated call processing system provides a call-back option to the user according to the identified environment; configuring whether the automated call pro- cessing system forwards the call to a human according to the identified environment; adjusting dialogue pro- vided by the automated call processing system during the call according to the identified environment; adjusting a priority of the call according to the identified en- vironment; and/or adjusting at least one property of a speaking voice provided by the automated call processing system during the call according to the identified en- vironment. The method can, for example, efficiently con- figure at least some of the aforementioned properties of the automated call processing system.
[0010] In another implementation form of the first aspect, the at least one communication channel other than the call comprises at least one of: email, chat messaging, and/or text messaging. The method can, for example, efficiently send information to the user using alternative communication channels. ~ [0011] In another implementation form of the first
O aspect, the adjusting the at least one property of the
O speaking voice provided by the automated call processing
S system during the call according to the identified en-
I 25 vironment comprises at least one of: selecting a re- > cording or speech-synthesis voice according to the iden- > tified environment; adjusting a speed of the speaking 3 voice according to the identified environment; and/or
N adjusting a volume of the speaking voice according to the identified environment. The method can, for example, improve the functionality of the automatic speech recog- nition system in varying call conditions.
[0012] In another implementation form of the first 5 aspect, the method further comprises transmitting in- formation about the identified environment to another system. The method can, for example, allow other systems to utilise the information about the identified envi- ronment.
[0013] According to a second aspect, a computing de- vice comprises at least one processor and at least one memory including computer program code, the at least one memory and the computer program code being configured to, with the at least one processor, cause the computing device to perform the method according to the first aspect.
[0014] According to a third aspect, a computer program product comprises program code configured to perform the method according to the first aspect when the computer program product is executed on a computer.
N [0015] Many of the attendant features will be more
O readily appreciated as they become better understood by
O reference to the following detailed description consid- = ered in connection with the accompanying drawings.
I 25 - DESCRIPTION OF THE DRAWINGS
O
O [0016] In the following, example embodiments are de-
O scribed in more detail with reference to the attached figures and drawings, in which:
[0017] Fig. 1 illustrates a flow chart representation of a method according to an embodiment;
[0018] Fig. 2 illustrates a signalling diagram ac- cording to an embodiment;
[0019] Fig. 3 illustrates a schematic representation of system modules according to an embodiment;
[0020] Fig. 4 illustrates a schematic representation of modules of an automatic speech recognition system according to an embodiment;
[0021] Fig. 5 illustrates a schematic representation of acoustic scene classification according to an embod- iment; and
[0022] Fig. 6 illustrates a schematic representation of a computing device according to an embodiment.
[0023] In the following, like reference numerals are used to designate like parts in the accompanying draw- ings.
DETAILED DESCRIPTION
[0024] In the following description, reference is made
N to the accompanying drawings, which form part of the > disclosure, and in which are shown, by way of illustra- ? tion, specific aspects in which the present disclosure 7 may be placed. It is understood that other aspects may
E 25 be utilised, and structural or logical changes may be > made without departing from the scope of the present a disclosure. The following detailed description, there-
N fore, is not to be taken in a limiting sense, as the scope of the present disclosure is defined be the ap- pended claims.
[0025] For instance, it is understood that a disclo- sure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding de- vice may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. On the other hand, for ex- ample, if a specific apparatus is described based on functional units, a corresponding method may include a step performing the described functionality, even if such step is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various example aspects described herein may be combined with each other, unless specifically noted oth- erwise.
[0026] Fig. 1 illustrates a flow chart representation of a method according to an embodiment.
A [0027] According to an embodiment, a computer-imple-
O mented method 100 for automated call processing com-
O prises receiving 101 a call from a user. = [0028] The call may also be referred to as a phone = 25 call, a voice call, or similar. > [0029] The method 100 may further comprise identifying s 102 an environment of the user during the call using ä acoustic scene classification (ASC).
[0030] The environment may also be referred to as an
ASC class, a context, an acoustic scene, a location, or similar.
[0031] The method 100 may further comprise configuring 103 at least one property of an automated call pro- cessing system according to the identified environment.
[0032] The at least one property of an automated call processing system may comprise, for example, one or more of the properties disclosed herein. The at least one property may also be referred to as at least one con- figuration or similar.
[0033] The automated call processing system may also be referred to as a call processing system, an automatic call processing system, a voicebot, a voicebot system, a call processing device, or similar.
[0034] The method 100 may further comprise processing 104 the call at least partially using the automated call processing system.
[0035] The automated call processing system may com- prise, for example, a script according to which the
N system processes the call. The script may comprise, for
N example, questions that the automated call processing
S system asks the user and how the system should respond
O to different answers provided by the user. The automated
E 25 call processing system can comprise definition for var- 3 ious environments according to which the behaviour of
D the automated call processing system can change. For
S example, the script can comprise definitions of the form "if environment is X then perform action Y".
[0036] There are various situations where a user call- ing a service provider using an automated call pro- cessing system may be in a less than ideal environment or situation to interact with the automated call pro- cessing system. For example, the user may be in a noisy environment and may have difficulties hearing clearly, the background noise may negatively impact the speech recognition capabilities of the automated call pro- cessing system, the user may be on the move, the user may be driving, and/or the user has may not have access to a computer.
[0037] Also, the circumstances and the situation of the user calling the service provider may vary and the automated call processing system may not function in an optimal way in all cases. For example, one user may call and want to reserve train tickets well in advance while at home, whereas another user may want to reserve tick- ets while already rushing to the train station.
[0038] A service provider may correspond to, for ex- ample, an entity utilising the method. For example, the
N service provider may be a company and the user may be a
O customer of that company.
O [0039] According to an embodiment, the automated call = processing system comprises an automatic speech recog- z 25 nition (ASR) system and the configuring the at least one > property of the automated call processing system com- s prises configuring the automatic speech recognition sys-
N tem according to the identified environment, and the
N processing the call using the automated call processing system comprises interpreting speech data received from the user during the call using the automatic speech recognition system.
[0040] The automated call processing system may com- prise the ASR system and/or the ASC system or the ASR system and/or the ASC system can be separate systems from the automated call processing system. The ASC sys- tem may also be referred to as an ASC, an ASC module, an ASC function, or similar. The ASR system may also be referred to as an ASR, an ASR module, an ASR function, or similar.
[0041] If the call is forwarded to a human, the human can, for example, assess the situation and process the call at least partially manually.
[0042] According to an embodiment, the configuring the automatic speech recognition system according to the identified environment comprises at least one of: con- figuring the automatic speech recognition system ac- cording to an amount of noise identified in the envi- ronment and/or configuring a filtering of the automatic
A speech recognition system according to the identified
O environment.
O [0043] The automated call processing system may also = apply filtering to the audio signal of the call or inform z 25 the ASR to apply filtering based on the identified en- 3 vironment. > [0044] According to an embodiment, the configuring the ä automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environ- ment.
[0045] An ASR instance may refer to a specific way of configuring the ASR system. For example, each ASR in- stance may be optimized for specific audio characteris- tics, such as noise level. The ASR system may comprise a plurality of ASR instances according to which the ASR system can be configured. An ASR instance may also be referred to as an ASR configuration or similar.
[0046] The identified environment can impact the se- lection of the ASR instance that the automated call processing system utilizes. For example, the automated call processing system may select an ASR instance that has been optimized for noisy speech when the identified environment comprises, for example, a street, a public place, the outdoors, or similar.
[0047] According to an embodiment, processing the call using the automated call processing system comprises configuring whether the automated call processing system
A repeats information during the call according to the
O identified environment
O [0048] The automated call processing system may repeat = information and/or ask for confirmation if, for example, = 25 the identified environment comprises a noisy environment - where the user may have difficulties to hear the voice s provided by the automated call processing system. In
N silent environments, such repetition or requesting con-
N firmation may not be needed and may be omitted in order to make the dialogue between the user and the automated call processing system more fluent.
[0049] Additionally or alternatively, the processing the call using the automated call processing system may comprise configuring whether the automated call pro- cessing system sends information to the user using at least one communication channel other than the call ac- cording to the identified environment.
[0050] Additionally or alternatively, the processing the call using the automated call processing system may comprise configuring whether the automated call pro- cessing system provides a call-back option to the user according to the identified environment.
[0051] Additionally or alternatively, the processing the call using the automated call processing system may comprise configuring whether the automated call pro- cessing system forwards the call to a human according to the identified environment.
[0052] Additionally or alternatively, the processing the call using the automated call processing system may
N comprise adjusting dialogue provided by the automated
O call processing system during the call according to the
O identified environment. o [0053] Additionally or alternatively, the processing
E 25 the call using the automated call processing system may
S comprise adjusting a priority of the call according to > the identified environment. ä [0054] The automated call processing system may adjust the priority of the user's issue. For example, a user making a train ticket reservation in a train station may be prioritized, since the issue is probably urgent.
[0055] Additionally or alternatively, the processing the call using the automated call processing system com- prises adjusting at least one property of a speaking voice provided by the automated call processing system during the call according to the identified environment.
[0056] According to an embodiment, the at least one communication channel other than the call comprises at least one of: email, chat messaging, and/or text mes- saging.
[0057] For example, the automated call processing sys- tem may send information via email, chat messaging, and/or text messaging to a user on the move.
[0058] According to an embodiment, the adjusting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment comprises se- lecting a recording or speech-synthesis voice according to the identified environment.
N [0059] The automated call processing system may adjust
S the speaking voice provided to the user during the call
S according to the identified environment. The adjusting o may comprise, for example, selecting recordings and/or
E 25 — speech-synthesis voice so that it is, for example, o clearer or slower for environments with background s noise.
N o [0060] Additionally or alternatively, the adjusting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment may comprise adjusting a speed of the speaking voice according to the identified environment.
[0061] Additionally or alternatively, the adjusting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment may comprise adjusting a volume of the speaking voice according to the identified environment.
[0062] The volume of speech provided by the automated call processing system, such as announcements, prompts, etc., can be adjusted according to the background noise volume in the call.
[0063] At least some embodiments disclosed herein can improve user experience with the automated call pro- cessing system.
[0064] At least some embodiments disclosed herein can improve efficiency of the automated call processing sys- tem. Thus, more call may be handled automatically by the
N automated call processing system.
S [0065] At least some embodiments disclosed herein can
S improve ASR accuracy.
O [0066] The various processing operations disclosed
E 25 herein may be performed in various different orders. For 3 example, the embodiment of Fig. 1 only illustrates an
D exemplary order of operations. Furthermore, at least
S some operations may be performed at least partially in parallel.
[0067] Fig. 2 illustrates a signalling diagram ac- cording to an embodiment.
[0068] A user 201 can call 205 the automated call processing system 202.
[0069] The automated call processing system 202 can forward 206 the audio of the call to an ASC system 203 for analysis.
[0070] The ASC system 203 can return information about the identified environment 207 to the automated call processing system 202.
[0071] The automated call processing system 202 can respond 208 to the user 201 adjusting its response based on the identified environment.
[0072] The user 201 can respond 209 to the automated call processing system 202.
[0073] The automated call processing system 202 can select the ASR 204 to be used based on identified envi- ronment and forward 210 audio of the call to the selected
ASR 204.
[0074] The ASR 204 can transcribe the speech of the
N call and return the transcript 211 to the automated call
N processing system 202. The transcript 211 may comprise
S text data corresponding to the audio of the call. > [0075] The automated call processing system 202 can a 25 use the transcript 211 to respond 212 to the user 201 > and may adjust the response based on identified envi- 3 ronment.
N [0076] Operations 209 - 212 may be repeated as needed during the call. In some situations, operations 206 and
207 may also be repeated, for example, periodically in order to detect if the environment changes.
[0077] Fig. 3 illustrates a schematic representation of system modules according to an embodiment.
[0078] The call 205 may be provided via, for example, a telephony network 301, such as a public switched tel- ephone network (PSTN) or a mobile telephone network, voice over IP (VoIP) network, or similar.
[0079] A script 302 can define what the automated call processing system 202 should do and how to behave during the call. The script 302 can also comprise information about how to deal with various identified environments.
[0080] The automated call processing system 202 can be responsible for the overall control and orchestration of the dialogue with the user 201. The automated call processing system 202 can perform the defined script 302.
[0081] The ASR system 204 can perform the actual speech-to-text conversion. There can be multiple ASR instances that the automated call processing system 202
N may utilize.
O
N [0082] The ASC system 203 can receive audio from the
S user 201 via the automated call processing system 202
O and return the identified environment.
E 25 [0083] According to an embodiment, the method further 2 comprises transmitting information about the identified
D environment to another system.
S [0084] Customer Relationship Management (CRM) and En- terprise Resource Planning (ERP) 303 are examples of other systems that may receive information about the identified environment. The automated call processing system 202 may send information about the identified environment to an external system such as a CRM or an
ERP 303.
[0085] Fig. 4 illustrates a schematic representation of modules of an automatic speech recognition system according to an embodiment.
[0086] ASR systems can utilize principles from several different fields, such as signal processing, artificial intelligence, and linguistics, in order to automatically convert an audio signal comprising speech into text that corresponds to the content of the speech in the system’s input signal. An embodiment of an ASR system is illus- trated in Fig. 4.
[0087] An ASR system can perform feature extraction 401 on speech data 410. The extracted features can be provided to an acoustic model 402. The acoustic model 402 can comprise a statistical model that identifies sound units from an input speech signal 410 after rel-
N evant features have been extracted from it.
O [0088] A decoder 405 can deduce the text based on
O information from various components, such as the acous- = tic model 402, a language model 403, and a lexicon 404. z 25 The language model 403 can comprise a statistical model > that scores how likely words are to occur with each s other in a given language. The lexicon 404 can comprise
N a pronunciation dictionary that indicates how words are
N constructed from sound units.
[0089] The acoustic model 402 may be produced using audio-text pairs where the text part corresponds to the speech in the audio signal. The language model 403 can be produced using textual resources for the target lan- guage, like English, while the lexicon 404 can be cre- ated with the help of linguists.
[0090] The embodiment of Fig. 4 is only an example of an ASR system. Alternatively, the ASR system can be implemented in various alternative ways.
[0091] Fig. 5 illustrates a schematic representation of acoustic scene classification according to an embod- iment.
[0092] ASC can classify audio samples to categories corresponding to the environment in which the audio sam- ple was recorded. For instance, an audio sample could be recorded in an office, in an airport, or in a factory.
One of the objectives of ASC can be to provide context- awareness to automated audio systems.
[0093] The input of an ASC pipeline is an audio signal 501. Before performing the ASC operation per se, a fea-
N ture extraction step 502 can be applied to the audio
S signal 501. This step can transform the audio signal 501
S into a format that contains relevant information for the o actual classification operation and could involve, for
E 25 example, signal processing algorithms such as the Fast
S Fourier Transform (FFT) algorithm.
D [0094] Once the feature extraction step 502 has been
S performed, the extracted features are used to perform the actual ASC operation 503. The ASC module can be a statistical model that assigns the most likely category (environment/scene) based on the input features. In the embodiment of Fig. 5, the selected acoustic scene 504 is “Office”. The statistical module is typically pro- duced using a training step by using audio-context pairs that can be collected, for instance, via human annota- tors.
[0095] In some embodiments, the method 100 may be im- plemented using various modules/components such as those disclosed herein. Alternatively, the method 100 may also be implemented using various other systems.
[0096] Fig. 6 illustrates a schematic representation of a computing device according to an embodiment.
[0097] According to an embodiment, a computing device 600 comprises at least one processor 601 and at least one memory 602 including computer program code, the at least one memory 602 and the computer program code con- figured to, with the at least one processor 601, cause the computing device to perform the method 100.
[0098] The computing device 600 may comprise at least
N one processor 601. The at least one processor 601 may
S comprise, for example, one or more of various processing
O devices, such as a co-processor, a microprocessor, a = digital signal processor (DSP), a processing circuitry z 25 with or without an accompanying DSP, or various other > processing devices including integrated circuits such s as, for example, an application specific integrated cir-
N cuit (ASIC), a field programmable gate array (FPGA), a
N microprocessor unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
[0099] The computing device 600 may further comprise a memory 602. The memory 602 may be configured to store, for example, computer programs and the like. The memory 602 may comprise one or more volatile memory devices, one or more non-volatile memory devices, and/or a com- bination of one or more volatile memory devices and non- volatile memory devices. For example, the memory 602 may be embodied as magnetic storage devices (such as hard disk drives, magnetic tapes, etc.), optical magnetic storage devices, and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable
PROM), flash ROM, RAM (random access memory), etc.).
[0100] The computing device 600 may further comprise other components not illustrated in the embodiment of
Fig. 6. The computing device 600 may comprise, for ex- ample, an input/output bus for connecting the computing device 600 to other devices. Further, a user may control the computing device 600 via the input/output bus.
N [0101] When the computing device 600 is configured to
O implement some functionality, some component and/or com-
O ponents of the computing device 600, such as the at = least one processor 601 and/or the memory 602, may be
I 25 configured to implement this functionality. Further- - more, when the at least one processor 601 is configured s to implement some functionality, this functionality may
N be implemented using program code comprised, for exam-
N ple, in the memory.
[0102] The computing device 600 may be implemented at least partially using, for example, a computer, some other computing device, or similar.
[0103] The method 100 and/or the computing device 600 may be utilised in, for example, in a so-called voice- bot. A voicebot may be configured to obtain information from users by, for example, phone and convert the voice information into text information using ASR. The method 100 may be used to improve functionality of the ASR. The voicebot may further be configured to further process, such as classify, the text information. The voicebot can, for example, ask questions about, for example, basic information from a customer in a customer service situation over the phone, obtain the answers using ASR and the method 100, and save the information in a system.
Thus, the customer service situation can be made more efficient and user experience can be improved.
[0104] Any range or device value given herein may be extended or altered without losing the effect sought.
Also any embodiment may be combined with another embod-
A iment unless explicitly disallowed.
O [0105] Although the subject matter has been described
O in language specific to structural features and/or acts, = it is to be understood that the subject matter defined z 25 in the appended claims is not necessarily limited to the > specific features or acts described above. Rather, the s specific features and acts described above are disclosed as examples of implementing the claims and other equiv- alent features and acts are intended to be within the scope of the claims.
[0106] It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be un- derstood that reference to 'an' item may refer to one or more of those items.
[0107] The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter de- scribed herein. Aspects of any of the embodiments de- scribed above may be combined with aspects of any of the other embodiments described to form further embodiments without losing the effect sought.
N [0108] The term 'comprising' is used herein to mean
O including the method, blocks or elements identified, but
O that such blocks or elements do not comprise an exclu- = sive list and a method or apparatus may contain addi-
I 25 tional blocks or elements. - [0109] It will be understood that the above descrip- s tion is given by way of example only and that various
N modifications may be made by those skilled in the art.
N The above specification, examples and data provide a complete description of the structure and use of exem- plary embodiments. Although various embodiments have been described above with a certain degree of particu- larity, or with reference to one or more individual embodiments, those skilled in the art could make numer- ous alterations to the disclosed embodiments without departing from the spirit or scope of this specifica- tion.
N
N
O
N
O
<Q o
I jami a oO 0 <
LO
N
N
O
N

Claims (10)

CLAIMS:
1. A computer-implemented method (100) for au- tomated call processing, the method comprising: receiving (101) a call from a user; identifying (102) an environment of the user during the call using acoustic scene classification; configuring (103) at least one property of an automated call processing system according to the iden- tified environment; and processing (104) the call at least partially using the automated call processing system.
2. The computer-implemented method (100) ac- cording to claim 1, wherein: the automated call processing system com- prises an automatic speech recognition system and the configuring the at least one property of the automated call processing system comprises configuring the auto- matic speech recognition system according to the iden- tified environment; and A the processing the call using the automated O call processing system comprises interpreting speech O data received from the user during the call using the S automatic speech recognition system. I 25 a o 3. The computer-implemented method (100) ac- s cording to claim 2, wherein the configuring the auto- N matic speech recognition system according to the iden- N tified environment comprises at least one of:
configuring the automatic speech recognition system according to an amount of noise identified in the environment; and/or configuring a filtering of the automatic speech recognition system according to the identified environment.
4. The computer-implemented method (100) ac- cording to claim 2 or claim 3, wherein the configuring the automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environ- ment.
5. The computer-implemented method (100) ac- cording to any preceding claim, wherein processing the call using the automated call processing system com- prises at least one of: configuring whether the automated call pro- cessing system repeats information during the call ac- N cording to the identified environment; > configuring whether the automated call pro- ? cessing system sends information to the user using at 7 25 least one communication channel other than the call ac- E cording to the identified environment; > configuring whether the automated call pro- a cessing system provides a call-back option to the user & according to the identified environment;
configuring whether the automated call pro- cessing system forwards the call to a human according to the identified environment; adjusting dialogue provided by the automated call processing system during the call according to the identified environment; adjusting a priority of the call according to the identified environment; and/or adjusting at least one property of a speaking voice provided by the automated call processing system during the call according to the identified environment.
6. The computer-implemented method (100) ac- cording to claim 5, wherein the at least one communica- tion channel other than the call comprises at least one of: email, chat messaging, and/or text messaging.
7. The computer-implemented method (100) ac- cording to claim 5 or claim 6, wherein the adjusting the at least one property of the speaking voice provided by the automated call processing system during the call N according to the identified environment comprises at 3 least one of: = selecting a recording or speech-synthesis 7 25 voice according to the identified environment; E adjusting a speed of the speaking voice ac- > cording to the identified environment; and/or a adjusting a volume of the speaking voice ac- & cording to the identified environment.
8. The computer-implemented method (100) ac- cording to any preceding claim, the method further com- prising: transmitting information about the identified environment to another system.
9. A computing device, comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one pro- cessor, cause the computing device to perform the method according to any preceding claim.
10. A computer program product comprising pro- gram code configured to perform the method according to any of claims 1 - 8 when the computer program product is executed on a computer. N N O N O <Q O I a a oO © + LO N N O N
FI20225480A 2022-06-01 2022-06-01 Computer-implemented method for automated call processing FI20225480A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
FI20225480A FI20225480A1 (en) 2022-06-01 2022-06-01 Computer-implemented method for automated call processing
PCT/FI2023/050248 WO2023233068A1 (en) 2022-06-01 2023-05-08 Computer-implemented method for automated call processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
FI20225480A FI20225480A1 (en) 2022-06-01 2022-06-01 Computer-implemented method for automated call processing

Publications (1)

Publication Number Publication Date
FI20225480A1 true FI20225480A1 (en) 2023-12-02

Family

ID=86386698

Family Applications (1)

Application Number Title Priority Date Filing Date
FI20225480A FI20225480A1 (en) 2022-06-01 2022-06-01 Computer-implemented method for automated call processing

Country Status (2)

Country Link
FI (1) FI20225480A1 (en)
WO (1) WO2023233068A1 (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7490042B2 (en) * 2005-03-29 2009-02-10 International Business Machines Corporation Methods and apparatus for adapting output speech in accordance with context of communication
US8175007B2 (en) * 2007-06-14 2012-05-08 Cisco Technology, Inc. Call priority based on audio stream analysis
US8699674B2 (en) * 2010-04-21 2014-04-15 Angel.Com Incorporated Dynamic speech resource allocation
US9270798B2 (en) * 2010-12-03 2016-02-23 International Business Machines Corporation Ring-tone detection in a VoIP call
US9495960B1 (en) * 2014-03-26 2016-11-15 West Corporation IVR engagements and upfront background noise
US20160284349A1 (en) * 2015-03-26 2016-09-29 Binuraj Ravindran Method and system of environment sensitive automatic speech recognition
US20170331949A1 (en) * 2016-05-11 2017-11-16 International Business Machines Corporation Automated call handling based on context of call
JP2020120170A (en) * 2019-01-18 2020-08-06 株式会社東芝 Automatic response device and program

Also Published As

Publication number Publication date
WO2023233068A1 (en) 2023-12-07

Similar Documents

Publication Publication Date Title
KR102509464B1 (en) Utterance classifier
US10810997B2 (en) Automated recognition system for natural language understanding
US9368116B2 (en) Speaker separation in diarization
CN111797632B (en) Information processing method and device and electronic equipment
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
CN113205803B (en) Voice recognition method and device with self-adaptive noise reduction capability
Gupta et al. Speech feature extraction and recognition using genetic algorithm
CN112992147A (en) Voice processing method, device, computer equipment and storage medium
US7689414B2 (en) Speech recognition device and method
CN116420188A (en) Speech filtering of other speakers from call and audio messages
CN112087726B (en) Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
FI20225480A1 (en) Computer-implemented method for automated call processing
CN109841216B (en) Voice data processing method and device and intelligent terminal
CN113658596A (en) Semantic identification method and semantic identification device
CN117496984A (en) Interaction method, device and equipment of target object and readable storage medium
KR102389995B1 (en) Method for generating spontaneous speech, and computer program recorded on record-medium for executing method therefor
KR102408455B1 (en) Voice data synthesis method for speech recognition learning, and computer program recorded on record-medium for executing method therefor
KR102395399B1 (en) Voice data disassemble method for speech recognition learning, and computer program recorded on record-medium for executing method therefor
CN114328867A (en) Intelligent interruption method and device in man-machine conversation
Ranzenberger et al. Integration of a Kaldi speech recognizer into a speech dialog system for automotive infotainment applications
KR20210010133A (en) Speech recognition method, learning method for speech recognition and apparatus thereof
US20230298609A1 (en) Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
KR102378895B1 (en) Method for learning wake-word for speech recognition, and computer program recorded on record-medium for executing method therefor
KR102378885B1 (en) Method for generating metadata using face of speaker, and computer program recorded on record-medium for executing method therefor
US20230038982A1 (en) Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition