WO2023233068A1 - Procédé mis en œuvre par ordinateur pour un traitement d'appel automatisé - Google Patents

Procédé mis en œuvre par ordinateur pour un traitement d'appel automatisé Download PDF

Info

Publication number
WO2023233068A1
WO2023233068A1 PCT/FI2023/050248 FI2023050248W WO2023233068A1 WO 2023233068 A1 WO2023233068 A1 WO 2023233068A1 FI 2023050248 W FI2023050248 W FI 2023050248W WO 2023233068 A1 WO2023233068 A1 WO 2023233068A1
Authority
WO
WIPO (PCT)
Prior art keywords
call
environment
processing system
speech recognition
call processing
Prior art date
Application number
PCT/FI2023/050248
Other languages
English (en)
Inventor
Ville Ruutu
Jussi Ruutu
Honain DERRAR
Original Assignee
Elisa Oyj
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elisa Oyj filed Critical Elisa Oyj
Publication of WO2023233068A1 publication Critical patent/WO2023233068A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present disclosure relates to call processing, and more particularly to a computer-implemented method for automated call processing, a computing device , and a computer program product .
  • Automated call processing can util ise various technologies , such as machine learning and automatic speech recognition, to improve the efficiency of call processing .
  • machine learning and automatic speech recognition can be various technologies , such as machine learning and automatic speech recognition, to improve the efficiency of call processing .
  • a computer-implemented method for automated call processing comprises : receiving a call from a user ; identifying an environment of the user during the cal l using acoustic scene clas sification ; configuring at least one property of an automated call processing system according to the identified environment ; and processing the call at least partially using the automated call processing system .
  • the method can, for example , improve the functionality of the automated call processing system in various different call conditions .
  • the automated call processing system comprises an automatic speech recognition system and the configuring the at least one property of the automated cal l process ing system comprises configuring the automatic speech recognition system according to the identified environment ; and the processing the call using the automated call processing system comprises interpreting speech data received from the user during the call us ing the automatic speech recognition system .
  • the method can, for example , improve the functionality of the automatic speech recognition system in various different call conditions .
  • the configuring the automatic speech recognition system according to the identified environment comprises at least one of : configuring the automatic speech recognition system according to an amount of noise identified in the environment ; and/or configuring a filtering of the automatic speech recognition system according to the identified environment .
  • the method can, for example , improve the functionality of the automatic speech recognition system in varying call noise conditions .
  • the configuring the automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environment .
  • the method can, for example , efficiently choose an appropriate automatic speech recognition instance for the call .
  • processing the call using the automated call processing system comprises at least one of : configuring whether the automated call processing system repeats information during the call according to the identified environment ; configuring whether the automated call processing system sends information to the user using at least one communication channel other than the call according to the identified environment ; configuring whether the automated call processing system provides a call-back option to the user according to the identified environment ; configuring whether the automated call processing system forwards the call to a human according to the identified environment ; adj usting dialogue provided by the automated call processing system during the call according to the identified environment ; adj usting a priority of the call according to the identif ied environment ; and/or adj usting at least one property of a speaking voice provided by the automated call processing system during the call according to the identified environment .
  • the method can, for example , efficiently configure at least some of the aforementioned properties of the automated call processing system .
  • the at least one communication channel other than the call comprises at least one of : email , chat messaging, and/or text messaging .
  • the method can, for example , efficiently send information to the user using alternative communication channels .
  • the adj usting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment comprises at least one of : selecting a recording or speech-synthesis voice according to the identified environment ; adj usting a speed of the speaking voice according to the identified environment ; and/or adj usting a volume of the speaking voice according to the identified environment .
  • the method can, for example , improve the functionality of the automatic speech recognition system in varying call conditions .
  • the method further comprises transmitting information about the identified environment to another system .
  • the method can, for example , allow other systems to utilise the information about the identified environment .
  • the identifying the environment of the user during the call using acoustic scene classification comprises identifying the environment of the user during the call using acoustic scene classification a plurality of times .
  • the selecting the automatic speech recognition instance used by the automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environment a plurality of times .
  • a computing device compri ses at least one processor and at least one memory including computer program code , the at least one memory and the computer program code being configured to , with the at least one proces sor, cause the computing device to perform the method according to the first aspect .
  • a computer program product comprises program code configured to perform the method according to the first aspect when the computer program product is executed on a computer .
  • FIG. 1 illustrates a flow chart representation of a method according to an embodiment
  • Fig . 2 illustrates a signalling diagram according to an embodiment
  • FIG. 3 illustrates a schematic representation of system modules according to an embodiment
  • Fig . 4 illustrates a schematic representation of modules of an automatic speech recognition system according to an embodiment
  • Fig . 5 illustrates a schematic representation of acoustic scene classification according to an embodiment
  • Fig . 6 illustrates a schematic representation of an automatic speech recognition system according to an embodiment
  • Fig . 7 illustrates a schematic representation of a computing device according to an embodiment .
  • a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa .
  • a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or il lustrated in the f igures .
  • a corresponding method may include a step performing the described functionality, even if such step is not explicitly described or illustrated in the figures .
  • the features of the various example aspects described herein may be combined with each other, unless specifically noted otherwise .
  • Fig . 1 illustrates a flow chart representation of a method according to an embodiment .
  • a computer-implemented method 100 for automated call processing comprises receiving 101 a call from a user .
  • the call may also be referred to as a phone call , a voice call , or similar .
  • the method 100 may further comprise identifying
  • ASC acoustic scene classification
  • the environment may also be referred to as an ASC class , a context , an acoustic scene , a location, or similar .
  • the method 100 may further comprise configuring
  • the at least one property of an automated call processing system may comprise , for example , one or more of the properties disclosed herein .
  • the at least one property may also be referred to as at least one configuration or similar .
  • the automated call processing system may also be referred to as a call proces sing system, an automatic call processing system, a voicebot , a voicebot system, a call processing device , or similar .
  • the method 100 may further comprise processing 104 the call at least partially using the automated cal l processing system .
  • the automated call processing system may comprise , for example , a script according to which the system processes the call .
  • the script may comprise , for example , questions that the automated call processing system as ks the user and how the system should respond to different answers provided by the user .
  • the automated call processing system can comprise definition for various environments according to which the behaviour of the automated call processing system can change .
  • the script can comprise definitions of the form " if environment is X then perform action Y" .
  • a user calling a service provider using an automated call process ing system may be in a less than ideal environment or situation to interact with the automated call processing system .
  • the user may be in a noisy environment and may have difficulties hearing clearly, the background noise may negatively impact the speech recognition capabilities of the automated call processing system, the user may be on the move , the user may be driving, and/or the user has may not have access to a computer .
  • the circumstances and the situation of the user cal ling the service provider may vary and the automated call processing system may not function in an optimal way in all cases .
  • one user may call and want to reserve train tickets well in advance while at home , whereas another user may want to reserve tickets while already rushing to the train station .
  • a service provider may correspond to , for example , an entity utilising the method .
  • the service provider may be a company and the user may be a customer of that company .
  • the automated call processing system comprises an automatic speech recognition (ASR) system and the configuring the at least one property of the automated call processing system comprises configuring the automatic speech recognition system according to the identified environment , and the processing the call using the automated call process ing system comprises interpreting speech data received from the user during the call using the automatic speech recognition system .
  • ASR automatic speech recognition
  • the automated call processing system may comprise the ASR system and/or the ASC system or the ASR system and/or the ASC system can be separate systems from the automated call processing system .
  • the ASC system may also be referred to as an ASC, an ASC module , an ASC function, or similar .
  • the ASR system may also be referred to as an ASR, an ASR module , an ASR function, or similar .
  • I f the call is forwarded to a human, the human can, for example , as ses s the situation and process the call at least partially manually .
  • the configuring the automatic speech recognition system according to the identified environment comprises at least one of : configuring the automatic speech recognition system according to an amount of noise identi fied in the environment and/or configuring a filtering of the automatic speech recognition system according to the identified environment .
  • the automated call processing system may also apply filtering to the audio signal of the call or inform the ASR to apply fi ltering based on the identified environment .
  • the configuring the automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environment .
  • An ASR instance may refer to a specific way of configuring the ASR system .
  • each ASR instance may be optimi zed for specific audio characteristics , such as noi se level .
  • the ASR system may comprise a plurality of ASR instances according to which the ASR system can be configured .
  • An ASR instance may also be referred to as an ASR configuration, an ASR, or similar .
  • the identified environment can impact the selection of the ASR instance that the automated call processing system utili zes .
  • the automated cal l processing system may select an ASR instance that has been optimi zed for noisy speech when the identified environment comprises , for example , a street , a public place , the outdoors , or similar .
  • the identifying the environment of the user during the call us ing acoustic scene classification comprises identifying the environment of the user during the cal l using acoustic scene classification a plurality of times .
  • acoustic scene classification may be performed for each time period in a plurality of time periods .
  • the selecting the automatic speech recognition instance used by the automatic speech recognition system according to the identified environment comprises selecting an automatic speech recognition instance used by the automatic speech recognition system according to the identified environment a plurality of times .
  • an automatic speech recognition instance used by the automatic speech recognition system may be selected for each time period in a plural ity of time periods .
  • the ASC may perform scene classification multiple times or substantially continuously so that if the acoustic scene changes during the call due to , for example , the cal ler moving from outdoors to indoors , the ASR may be changed accordingly .
  • the configuring the automatic speech recognition system according to the identified environment comprises selecting an ASR used by the automatic speech recognition system from a plurality of available ASRs according to the identified environment .
  • the selected ASR can comprise , for example , an optimal ASR for the identified environment .
  • the ASC may be applied to the call or other voice samples that needs to be transcribed .
  • processing the call using the automated call processing system comprises configuring whether the automated call processing system repeats information during the cal l according to the identified environment .
  • the automated call processing system may repeat information and/or ask for confirmation if , for example , the identified environment comprises a noisy environment where the user may have dif ficulties to hear the voice provided by the automated call proces sing system . In silent environments , such repetition or requesting confirmation may not be needed and may be omitted in order to make the dialogue between the user and the automated call processing system more fluent .
  • the processing the call using the automated call processing system may comprise configuring whether the automated call processing system sends information to the user using at least one communication channel other than the call according to the identified environment .
  • the processing the call using the automated call processing system may comprise configuring whether the automated call processing system provides a call-back option to the user according to the identified environment .
  • the processing the call using the automated call processing system may comprise configuring whether the automated call processing system forwards the call to a human according to the identified environment .
  • the processing the call using the automated call processing system may comprise adj usting dialogue provided by the automated call processing system during the call according to the identified environment .
  • the processing the call using the automated call processing system may comprise adj usting a priority of the call according to the identified environment .
  • the automated call processing system may adj ust the priority of the user ' s issue . For example , a user making a train ticket reservation in a train station may be prioriti zed, since the issue is probably urgent .
  • the processing the call using the automated call processing system comprises adj usting at least one property of a speaking voice provided by the automated call processing system during the call according to the identified environment .
  • the at least one communication channel other than the call comprises at least one of : email , chat messaging, and/or text messaging .
  • the automated call processing system may send information via emai l , chat messaging, and/or text messaging to a user on the move .
  • the adj usting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment comprises selecting a recording or speech-synthesis voice according to the identified environment .
  • the automated call processing system may adj ust the speaking voice provided to the user during the call according to the identified environment .
  • the adj usting may comprise , for example , selecting recordings and/or speech-synthesis voice so that it is , for example , clearer or slower for environments with background noise .
  • the adj usting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment may comprise adj usting a speed of the speaking voice according to the identified environment .
  • the adj usting the at least one property of the speaking voice provided by the automated call processing system during the call according to the identified environment may comprise adj usting a volume of the speaking voice according to the identified environment .
  • the volume of speech provided by the automated call processing system can be adj usted according to the background noise volume in the call .
  • At least some embodiments disclosed herein can improve user experience with the automated call processing system .
  • At least some embodiments disclosed herein can improve efficiency of the automated call processing system .
  • more call may be handled automatically by the automated call processing system .
  • At least some embodiments disclosed herein can improve ASR accuracy .
  • FIG. 2 illustrates a signalling diagram according to an embodiment .
  • a user 201 can call 205 the automated call processing system 202 .
  • the automated call processing system 202 can forward 206 the audio of the call to an ASC system 203 for analysis .
  • the ASC system 203 can return information about the identified environment 207 to the automated call processing system 202 .
  • the automated call processing system 202 can respond 208 to the user 201 adj usting its response based on the identified environment .
  • the user 201 can respond 209 to the automated call processing system 202 .
  • the automated call processing system 202 can select the ASR 204 to be used based on identified environment and forward 210 audio of the call to the selected ASR 204 .
  • the ASR 204 can transcribe the speech of the call and return the transcript 211 to the automated cal l processing system 202 .
  • the transcript 211 may comprise text data corresponding to the audio of the call .
  • the automated call processing system 202 can use the transcript 211 to respond 212 to the user 201 and may adj ust the response based on identified environment .
  • Operations 209 - 212 may be repeated as needed during the call . In some situations , operations 206 and 207 may also be repeated, for example , periodically in order to detect if the environment changes .
  • Fig . 3 illustrates a schematic representation of system modules according to an embodiment .
  • the call 205 may be provided via, for example , a telephony network 301 , such as a public switched telephone network ( PSTN) or a mobile telephone network, voice over I P (VoI P) network, or similar .
  • a telephony network 301 such as a public switched telephone network ( PSTN) or a mobile telephone network, voice over I P (VoI P) network, or similar .
  • PSTN public switched telephone network
  • VoIP P voice over I P
  • a script 302 can define what the automated call processing system 202 should do and how to behave during the call .
  • the script 302 can also comprise information about how to deal with various identi fied environments .
  • the automated call processing system 202 can be responsible for the overall control and orchestration of the dialogue with the user 201 .
  • the automated cal l processing system 202 can perform the defined script 302 .
  • the ASR system 204 can perform the actual speech-to-text conversion . There can be multiple ASR instances that the automated call processing system 202 may utili ze .
  • the ASC system 203 can receive audio from the user 201 via the automated call processing system 202 and return the identified environment . Additionally, the ASC system 203 can return a proposal for the best ASR to be used . The system 202 can select the ASR to be used based on input from the ASC 203 .
  • the method further comprises transmitting information about the identified environment to another system .
  • Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) 303 are examples of other systems that may receive information about the identified environment .
  • the automated call processing system 202 may send information about the identified environment to an external system such as a CRM or an ERP 303 .
  • Fig . 4 illustrates a schematic representation of modules of an automatic speech recognition system according to an embodiment .
  • ASR systems can utili ze principles from several different fields , such as signal processing, artificial intelligence , and linguistics , in order to automatically convert an audio signal comprising speech into text that corresponds to the content of the speech in the system' s input signal .
  • An embodiment of an ASR system is il lus trated in Fig . 4 .
  • An ASR system can perform feature extraction
  • the extracted features can be provided to an acoustic model 402 .
  • 402 can comprise a statistical model that identifies sound units from an input speech signal 410 after relevant features have been extracted from it .
  • a decoder 405 can deduce the text based on information from various components , such as the acoustic model 402 , a language model 403 , and a lexicon 404 .
  • the language model 403 can comprise a statistical model that scores how likely words are to occur with each other in a given language .
  • the lexicon 404 can comprise a pronunciation dictionary that indicates how words are constructed from sound units .
  • the acoustic model 402 may be produced using audio-text pairs where the text part corresponds to the speech in the audio signal .
  • the language model 403 can be produced using textual resources for the target language , like English, while the lexicon 404 can be created with the help of linguists .
  • FIG. 4 The embodiment of Fig . 4 is only an example of an ASR system .
  • the ASR system can be implemented in various alternative ways .
  • Fig . 5 illustrates a schematic representation of acoustic scene classification according to an embodiment .
  • ASC can classify audio samples to categories corresponding to the environment in which the audio sample was recorded . For instance , an audio sample could be recorded in an office , in an airport , or in a factory .
  • One of the obj ectives of ASC can be to provide contextawareness to automated audio systems .
  • the input of an ASC pipeline is an audio signal 501 .
  • a feature extraction step 502 can be applied to the audio signal 501 .
  • This step can trans form the audio signal 501 into a format that contains relevant information for the actual classification operation and could involve , for example , signal processing algorithms such as the Fast Fourier Transform ( FFT ) algorithm .
  • FFT Fast Fourier Transform
  • the extracted features are used to perform the actual ASC operation 503 .
  • the ASC module can be a statistical model that assigns the most likely category (environment /scene) based on the input features .
  • the selected acoustic scene 504 is "Office" .
  • the statistical module is typically produced using a training step by using audio-context pairs that can be collected, for instance , via human annotators .
  • the method 100 may be implemented using various modules /components such as those disclosed herein . Alternatively, the method 100 may also be implemented using various other systems .
  • Fig . 6 illustrates a schematic representation of an automatic speech recognition system according to an embodiment .
  • FIG. 6 illustrates a schematic representation of a so-called end-to-end (E2E ) ASR architecture .
  • An ASR architecture can comprise an end- to-end model 610 which may comprise a neural network architecture i . e . a trainable model that can be trained to output text 611 based on a speech input 410 using audio-transcript pairs .
  • the selection of an acoustic model may not be relevant as they do not comprise a standalone acoustic model contrary to , for example , an architecture illustrated in the embodiment of Fig . 4 .
  • the roles of the different traditional components can be learned by the single neural network architecture.
  • the training procedure of such systems can be simplified.
  • an ASR, an ASR system, and/or an ASR instance can comprise, for example, end-to-end ASR architecture, an ASR architecture similar to that disclosed in the embodiment of Fig. 4, or any other type of ASR architecture.
  • FIG. 7 illustrates a schematic representation of a computing device according to an embodiment.
  • a computing device 600 comprises at least one processor 601 and at least one memory 602 including computer program code, the at least one memory 602 and the computer program code configured to, with the at least one processor 601, cause the computing device to perform the method 100.
  • the computing device 600 may comprise at least one processor 601.
  • the at least one processor 601 may comprise, for example, one or more of various processing devices, such as a co-processor, a microprocessor, a digital signal processor (DSP) , a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC) , a field programmable gate array (FPGA) , a microprocessor unit (MCU) , a hardware accelerator, a special-purpose computer chip, or the like.
  • various processing devices such as a co-processor, a microprocessor, a digital signal processor (DSP) , a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC) , a field programmable gate array (FPGA) , a microprocessor unit (MCU) , a hardware accelerator, a special-purpose computer chip,
  • the computing device 600 may further comprise a memory 602.
  • the memory 602 may be configured to store, for example, computer programs and the like.
  • the memory 602 may comprise one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and nonvolatile memory devices.
  • the memory 602 may be embodied as magnetic storage devices (such as hard disk drives, magnetic tapes, etc.) , optical magnetic storage devices, and semiconductor memories (such as mask ROM, PROM (programmable ROM) , EPROM (erasable PROM) , flash ROM, RAM (random access memory) , etc.) .
  • the computing device 600 may further comprise other components not illustrated in the embodiment of Fig. 7.
  • the computing device 600 may comprise, for example, an input/output bus for connecting the computing device 600 to other devices. Further, a user may control the computing device 600 via the input/output bus.
  • some component and/or components of the computing device 600 such as the at least one processor 601 and/or the memory 602, may be configured to implement this functionality.
  • this functionality may be implemented using program code comprised, for example, in the memory.
  • the computing device 600 may be implemented at least partially using, for example, a computer, some other computing device, or similar.
  • the method 100 and/or the computing device 600 may be utili sed in, for example , in a so-called voice- bot .
  • a voicebot may be configured to obtain information from users by, for example, phone and convert the voice information into text information using ASR .
  • the method 100 may be used to improve functionality of the ASR .
  • the voicebot may further be configured to further process , such as classify, the text information .
  • the voicebot can, for example , as k questions about , for example , basic information from a customer in a customer service situation over the phone , obtain the answers using ASR and the method 100 , and save the information in a system .
  • the customer service situation can be made more efficient and user experience can be improved .

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Selon un mode de réalisation, un procédé mis en œuvre par ordinateur pour un traitement d'appel automatisé consiste à : recevoir un appel d'un utilisateur; identifier un environnement de l'utilisateur pendant l'appel à l'aide d'une classification de scène acoustique; configurer au moins une propriété d'un système de traitement d'appel automatisé en fonction de l'environnement identifié; et traiter l'appel au moins en partie à l'aide du système de traitement d'appel automatisé.
PCT/FI2023/050248 2022-06-01 2023-05-08 Procédé mis en œuvre par ordinateur pour un traitement d'appel automatisé WO2023233068A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20225480 2022-06-01
FI20225480A FI20225480A1 (en) 2022-06-01 2022-06-01 COMPUTER IMPLEMENTED AUTOMATED CALL PROCESSING METHOD

Publications (1)

Publication Number Publication Date
WO2023233068A1 true WO2023233068A1 (fr) 2023-12-07

Family

ID=86386698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2023/050248 WO2023233068A1 (fr) 2022-06-01 2023-05-08 Procédé mis en œuvre par ordinateur pour un traitement d'appel automatisé

Country Status (2)

Country Link
FI (1) FI20225480A1 (fr)
WO (1) WO2023233068A1 (fr)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060229873A1 (en) * 2005-03-29 2006-10-12 International Business Machines Corporation Methods and apparatus for adapting output speech in accordance with context of communication
US20080310398A1 (en) * 2007-06-14 2008-12-18 Mukul Jain Call priority based on audio stream analysis
US20120140680A1 (en) * 2010-12-03 2012-06-07 International Business Machines Company Ring-tone Detection in a VoIP Call
US20130272511A1 (en) * 2010-04-21 2013-10-17 Angel.Com Dynamic speech resource allocation
US20160284349A1 (en) * 2015-03-26 2016-09-29 Binuraj Ravindran Method and system of environment sensitive automatic speech recognition
US9495960B1 (en) * 2014-03-26 2016-11-15 West Corporation IVR engagements and upfront background noise
US20170331949A1 (en) * 2016-05-11 2017-11-16 International Business Machines Corporation Automated call handling based on context of call
JP2020120170A (ja) * 2019-01-18 2020-08-06 株式会社東芝 自動応答装置、及びプログラム

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060229873A1 (en) * 2005-03-29 2006-10-12 International Business Machines Corporation Methods and apparatus for adapting output speech in accordance with context of communication
US20080310398A1 (en) * 2007-06-14 2008-12-18 Mukul Jain Call priority based on audio stream analysis
US20130272511A1 (en) * 2010-04-21 2013-10-17 Angel.Com Dynamic speech resource allocation
US20120140680A1 (en) * 2010-12-03 2012-06-07 International Business Machines Company Ring-tone Detection in a VoIP Call
US9495960B1 (en) * 2014-03-26 2016-11-15 West Corporation IVR engagements and upfront background noise
US20160284349A1 (en) * 2015-03-26 2016-09-29 Binuraj Ravindran Method and system of environment sensitive automatic speech recognition
US20170331949A1 (en) * 2016-05-11 2017-11-16 International Business Machines Corporation Automated call handling based on context of call
JP2020120170A (ja) * 2019-01-18 2020-08-06 株式会社東芝 自動応答装置、及びプログラム

Also Published As

Publication number Publication date
FI20225480A1 (en) 2023-12-02

Similar Documents

Publication Publication Date Title
CN112804400B (zh) 客服呼叫语音质检方法、装置、电子设备及存储介质
US10810997B2 (en) Automated recognition system for natural language understanding
CN110389996B (zh) 实现用于自然语言处理的全句递归神经网络语言模型
US20190259388A1 (en) Speech-to-text generation using video-speech matching from a primary speaker
CN111460111B (zh) 评估自动对话服务的重新训练推荐
CN108682420B (zh) 一种音视频通话方言识别方法及终端设备
CN109960723B (zh) 一种用于心理机器人的交互系统及方法
US8560321B1 (en) Automated speech recognition system for natural language understanding
WO2011088049A2 (fr) Moteur de messages intelligent et parcimonieux
CN111081230A (zh) 语音识别方法和设备
CN111785275A (zh) 语音识别方法及装置
CN112530408A (zh) 用于识别语音的方法、装置、电子设备和介质
CN113239147A (zh) 基于图神经网络的智能会话方法、系统及介质
CN106875936A (zh) 语音识别方法及装置
JP2024502946A (ja) 音声認識トランスクリプトの句読点付け及び大文字化
CN112825248A (zh) 语音处理方法、模型训练方法、界面显示方法及设备
CN111178081B (zh) 语义识别的方法、服务器、电子设备及计算机存储介质
Gupta et al. Speech feature extraction and recognition using genetic algorithm
CN112992147A (zh) 语音处理方法、装置、计算机设备和存储介质
CN112087726B (zh) 彩铃识别的方法及系统、电子设备及存储介质
WO2023233068A1 (fr) Procédé mis en œuvre par ordinateur pour un traitement d'appel automatisé
CN115273862A (zh) 语音处理的方法、装置、电子设备和介质
CN114328867A (zh) 一种人机对话中智能打断的方法及装置
WO2024114303A1 (fr) Procédé et appareil de reconnaissance de phonèmes, dispositif électronique et support de stockage
US20230298609A1 (en) Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23724338

Country of ref document: EP

Kind code of ref document: A1