WO2023081962A1 - Procédés d'authentification et d'ouverture de session d'utilisateur - Google Patents

Procédés d'authentification et d'ouverture de session d'utilisateur Download PDF

Info

Publication number
WO2023081962A1
WO2023081962A1 PCT/AU2022/051333 AU2022051333W WO2023081962A1 WO 2023081962 A1 WO2023081962 A1 WO 2023081962A1 AU 2022051333 W AU2022051333 W AU 2022051333W WO 2023081962 A1 WO2023081962 A1 WO 2023081962A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
audio signal
representation
voice
authentication
Prior art date
Application number
PCT/AU2022/051333
Other languages
English (en)
Inventor
Dmitry NIKITIN
Original Assignee
Omni Intelligence Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2021903574A external-priority patent/AU2021903574A0/en
Application filed by Omni Intelligence Pty Ltd filed Critical Omni Intelligence Pty Ltd
Publication of WO2023081962A1 publication Critical patent/WO2023081962A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/667Preventing unauthorised calls from a telephone set
    • H04M1/67Preventing unauthorised calls from a telephone set by electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/18Network architectures or network communication protocols for network security using different networks or channels, e.g. using out of band channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/642Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations storing speech in digital form
    • H04M1/645Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations storing speech in digital form with speech synthesis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/60Substation equipment, e.g. for use by subscribers including speech amplifiers
    • H04M1/6008Substation equipment, e.g. for use by subscribers including speech amplifiers in the transmitter circuit
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/60Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
    • H04M2203/6045Identity confirmation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/60Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
    • H04M2203/6054Biometric subscriber identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/38Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/38Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections
    • H04M3/382Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords
    • H04M3/385Graded-service arrangements, i.e. some subscribers prevented from establishing certain connections using authorisation codes or passwords using speech signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/06Authentication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/60Context-dependent security
    • H04W12/65Environment-dependent, e.g. using captured environmental data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/60Context-dependent security
    • H04W12/69Identity-dependent
    • H04W12/72Subscriber identity

Definitions

  • the present invention relates generally to the field of authentication of a user of a digital service provider online or via telephone. More particularly the invention relates the methods for the authentication of a user using by signal analysis of a user's voice by digital audio signal processing means.
  • a user In accessing various services online or by telephone, a user is often required to firstly authenticate their identity. Such authentication is required to minimise any opportunity for unauthorised access to a service by a third party having dishonest intent. Authentication is particularly important where a service holds a user's private information or for services dealing in financial matters, such as a bank or an online stockbroker.
  • a user When accessing an online bank account for example, a user is typically required to insert a username into a first data input box and then a password into a second data input box, followed by clicking an "login" button or similar. If the password matches that stored against the username, then the login will be successful and the user permitted to transact on the account.
  • These entries are generally made manually, and are difficult if not impossible to perform when the hands are otherwise occupied, such as when driving. In any event, the process of entering details is time consuming.
  • the user may not recall the username and/or password for the account in question.
  • the user must make direct contact with the service provider (often by telephone) and authenticate their identity.
  • the service provider will then provide the username, or ask the user to enrol again with a fresh username.
  • the username may be issued by the service provider (such as an account number) and not be easily recalled by the user.
  • the user will be required to manually reset the password to a new password via a previously elected email address associated with the account.
  • Users are encouraged to use passwords that are not easily derived by a third party, and to contain long strings characters of different type including upper and lower case letters, numbers, and special characters.
  • many providers require that a user regularly changes their password. Thus, it is not uncommon for a user to forget a password when making a login attempt.
  • User authentication is also often required when a user calls a telephone contact center.
  • the contact center operator will usually ask the user for a unique identifier and then a series of questions in order to authenticate their identity before proceeding with the discussion.
  • the disadvantage of authentication questions providing a higher level of security is that they are generally more obscure to the user and therefore more easily forgotten.
  • a further problem is that the process of obtaining the unique identifier and presenting and answering a series of authentication questions can be time consuming. Again, the user may not remember an important detail such as an account number, or have the answer for an authentication question.
  • An overarching problem that applies to both login methods and voice call authentication is security.
  • a user may write down a password or the answer to a user authentication query, thereby instantly compromising security.
  • a further overarching problem that applies to both login methods and voice call authentication is the time and complexity involved in enrolling for a digital service.
  • a user In the course of enrolment via a webpage, a user will be required to manually enter a password.
  • the password must be chosen according to a number of parameters dictated by the service provider (length, character types, no consecutive numbers, no birth dates etc), and it can take the user some time and thought to devise a password that complies and can also be remembered.
  • a further aspect of the present invention is to provide an improvement in prior art methods for authentication of a user in the course of a voice call. It is a further aspect to provide a useful alternative to prior art authentication methods in that environment.
  • a further aspect of the present invention is to provide an improvement in prior art methods for enrolment of a user in a digital service. It is a further aspect to provide a useful alternative to prior art enrolment methods.
  • the present invention provides a computer- implemented method for authenticating or partially authenticating the identity of a user, the method comprising the steps of: receiving an audio signal encoding a user's voice speaking a unique identifier, recognising the unique identifier by speech recognition and using the recognised identifier to determine the user amongst a plurality of users, and comparing the user audio signal or a representation thereof to a reference audio signal for the user or a representation of a reference audio signal for the user, wherein the identity of the user is authenticated or partially authenticated where the user audio signal or a representation thereof and the reference audio signal or representation of a reference audio signal for the user are comparable to at least a minimum level.
  • the unique identifier is a unique string of letters, a unique string of numbers, a unique string of letters or numbers of combination thereof, a telephone number, an account identifier, a customer identifier, a business identifier, an email address, or a username.
  • the unique identifier is unique to the user amongst the plurality of users.
  • the authentication or partial authentication is for the purpose of a user logging into a computer-implemented service, or for identifying a user participating in a voice call.
  • the method comprises an auxiliary authentication or step.
  • the auxiliary authentication step comprises the step of transmitting a verification code to the user via a validated communication channel, and requesting input of the transmitted verification code from the user.
  • the validated communication channel is a cell phone or an email address of the user.
  • the auxiliary authentication step is software-enabled.
  • the software is embodied in the form of an authenticator application software.
  • the authenticator application software is configured to present a time-limited verification code to the user.
  • the authenticator application software is the GoogleTM authenticator app or a functional equivalent thereof.
  • the reference audio signal or representation thereof was provided by user before execution of the method in the course of enrolment.
  • the method is implemented on an authentication server or a login server.
  • the audio signal is generated by the user speaking into: the microphone of a cell phone, or the microphone of a processor enabled device having Internet connectivity.
  • the audio signal is transmitted by a cell phone, or a processor enabled device having Internet connectivity to the authentication server or the login server.
  • the step of receiving a user audio signal encoding a user's voice speaking a unique identifier is performed by an audio input software module, and/or the step of recognising the unique identifier by speech recognition is performed by a speech recognition software module, and/or the step of using the recognised identifier to determine the user amongst a plurality of users is performed by a user determination software module, and/or the step of comparing the user audio signal or a representation thereof to a reference audio signal or representation thereof for the user is performed by a comparison software module.
  • the audio input software module, and/or the speech recognition software module, and/or the user determination software module, and/or the comparison software module software is/are executed on an authentication server or a login server.
  • the present invention provides a computer-implemented method of logging into a digital service or participating in a voice call, the method comprising the steps of: a user speaking a unique identifier into a user processor-enabled device so as to provide an audio signal encoding the user's voice speaking the unique identifier, receiving the audio signal encoding a user's voice speaking a unique identifier into a login server or an authentication server, recognising the unique identifier by speech recognition, and using the recognised identifier to determine the user amongst a plurality of users, and comparing the user audio signal or a representation thereof to a reference audio signal or a representation of the reference audio signal for the user, wherein the identity of the user is authenticated or partially authenticated where the user audio signal or a representation thereof and the reference audio signal or representation thereof for the user are comparable to at least a minimum level.
  • the user processor enabled device is a personal computer or a mobile device.
  • the mobile device is a smart phone or a tablet device or a smart watch.
  • the method has a feature of any embodiment of the first aspect.
  • the present invention provides a processor-enabled device configured to authenticate or partially authenticate the identity of a user, the processor enabled device configured to: receive an audio signal encoding a user's voice speaking a unique identifier, recognise the unique identifier by speech recognition, and using the recognised identifier to determine the user amongst a plurality of users, compare the user audio signal or a representation thereof to a reference audio signal or representation thereof for the user, wherein the identity of the user is authenticated or partially authenticated where the user audio signal or a representation thereof and the reference audio signal or representation of the reference audio signal for the user are comparable to at least a minimum level.
  • the processor enabled device has a feature of any embodiment of the first aspect.
  • the processor-enabled device is a personal computer or a mobile device of the user.
  • the mobile device is a smart phone or a tablet device.
  • the present invention provides computer-readable medium comprising software configured to authenticate or partially authenticate the identity of a user, the medium having stored thereon program instructions configured to execute the method of any embodiment of the first aspect.
  • the medium is configured as an application programming interface.
  • the present invention provides a system configured to authenticate or partially authenticate the identity of a user, the system comprising: a user processor-enabled device, and a login server or an authentication server configured to: receive an audio signal encoding a user's voice speaking a unique identifier from the user processor-enabled device, recognise the unique identifier by speech recognition, and using the recognised identifier determine the user amongst a plurality of users, and compare the user audio signal or a representation thereof to a reference audio signal for the user or a derivate of a reference audio signal for the user, wherein the identity of the user is authenticated or partially authenticated where the user audio signal or a representation thereof and the reference audio signal or representation of a reference audio signal for the user are comparable to at least a minimum level.
  • the login server or authentication server is configured to execute the method of any embodiment of the first aspect.
  • the user processor enabled device is a personal computer or a mobile device.
  • the mobile device is a smart phone or a tablet device or a smart watch.
  • the present invention provides a method for enrolling a user in a digital service, the method comprising the step of obtaining a digital audio signal of a user's voice speaking so as to allow for later recognition of the user's voice when the user speaks a unique identifier.
  • the recorded user's voice speaking a unique identifier is converted into a representation thereof.
  • the method comprises a feature of any embodiment of the first aspect.
  • the present invention provides a computer-implemented method of enrolling a user for a digital service, the method comprising the steps of: receiving an audio signal encoding the user's voice speaking a unique identifier, and storing the audio signal or a representation thereof in a database in linked association with the unique identifier.
  • the method comprises the step of receiving a unique identifier from the user.
  • the method comprises the step of receiving form the user information relating to an auxiliary authentication method and storing that information in a database in linked association with the unique identifier.
  • the method comprises a feature of the method of any embodiment of the first aspect.
  • the present invention provides an enrolment server configured to execute the method of any embodiment of the first aspect.
  • the enrolment server is administered by a digital service and configured to allow a user to enroll for the digital service. In one embodiment of the eighth aspect, the enrolment server is configured to make network connection with a plurality of digital services and a plurality of users of each of the plurality of digital services.
  • FIG. 1 is a flow chart illustrating a user enrolment process of the present invention.
  • FIG. 2 shows a series of database records obtained from the enrolment process shown in FIG. 1.
  • FIG. 3 is a flow chart of a login process of the present invention.
  • the login process relies on (i) extraction of a cell phone number obtained in the course of a login attempt to (li) locate a reference representation of a user voice in a database (as obtained in the enrolment process of FIG. 1 and stored as a database record as shown in FIG. 2) with a user voice representation obtained in the course of a login attempt, and comparison of the representations.
  • FIG. 4 is a flow chart of a voice call authentication process of the present invention.
  • the process relies on (i) extraction of a cell phone number obtained in the course of a voice call to (ii) locate a reference representation of a user voice in a database (as obtained in the enrolment process of FIG. 1 and stored as a database record as shown in FIG. 2) with a user voice representation obtained in the course of, and comparison of the representations.
  • FIG. 5 is a diagram of a system of the present invention whereby a login or voice call authentication server of a digital service is in network connection with a plurality of users, and also other servers of the digital service.
  • FIG. 6 is a diagram of a system of the present invention whereby a login or voice call authentication server provides login or authentication servers of a plurality of digital services is in network connection therewith.
  • a plurality of users are in network connection with the login of voice call authentication server, such that a user is able to login in or have voice call authentication with at least one of plurality of digital servers.
  • FIG. 7 is a flow diagram showing a voice flow for enrolment, used when registering a new user for voice authentication.
  • FIG. 8 is a flow diagram showing a voice flow for voice verification, used to check if a speaker is registered for voice authentication.
  • FIG. 9 is a flow diagram showing a voice flow for voice Log-in Basic, used to identify and authenticate a person with just 1 audio file (the audio contains a spoken mobile phone number)
  • FIG. 10 is a flow diagram showing a voice flow for voice Log-in Strong, used to implement a voice- enabled log-in requiring require a secure identification and authentication mechanism.
  • FIG. 11 is a block diagram showing security architecture for the platform.
  • FIG. 12 is a block diagram showing security architecture for the biometrics.
  • the present invention is predicated at least in part on the inventor's discovery that a user's voice is able to provide two functions.
  • a first function is to provide a unique identifier, such as a customer account identifier.
  • a second function is to provide a means for authenticating the identity of a user by virtue of the unique characteristics of an individual's speech.
  • a single spoken string of text and/or numbers can be used to log into a user into a digital service, and avoiding the need to manually enter a unique identifier (such as a username) and an authentication factor (such as a password) into a digital service login page.
  • a user's voice is able to provide the same first and second functions in the course of a voice call to a contact center.
  • the operator Upon receiving a call from a user, the operator (robot or human) requests the user to speak their account number (or other identifier). No time is wasted by the operator listening to the user's unique identifier, accessing an authentication question associated with the unique identifier, asking the user the question and assessing the user's answer against the expected answer.
  • a further finding is that a user cell phone number is a preferred unique identifier in the context of the invention. Firstly, the vast majority of users will readily be able to recall their cell phone number. Secondly, numbers are more readily recognisable by speech-to-text processing means. A cell phone number does not comprise letter or words and is therefore more readily converted to text without error. In some prior art methods, speech recognition is used to identify a user however more complex speech forms the basis for analysis, such as words and sentences. Such prior art methods are more prone to error in identification, or simply failing to identify any user because of the more complex comparisons that are required by the identification algorithms used.
  • the present invention is directed to the authentication or partial authentication of a user of a digital service.
  • the authentication is a partial authentication with supplementary authentication means (as discussed in more detail infra) being further implemented to result in a full authentication or substantially full authentication of a user. It will be understood that the present invention does not necessarily provide fail-safe authentication no matter how many authentication means are implemented.
  • the invention is used in the context of a user's interaction with with a digital service.
  • the digital service may be accessed by way of web browser, an app, or a voice call.
  • Exemplary digital services include a financial institution, a government department, an insurance company, a retailer, a telephone company, an airline, a booking agency, a gambling service, a social media company, an online business retailing a good or a service.
  • Even a "bricks and mortar" business may be considered a digital service where a customer can interact with the business by digital means (including digitally-enabled voice call), Given the benefit of the present specification, other digital services having a use of the present invention will be apparent to the skilled person.
  • the user's interaction with the digital service is one requiring some means for identifying a person as an enrolled user of the service.
  • a common scenario is where a user logs into an online or offline service.
  • the login step typically requires a user to have a unique identifier (such as a username) to identify the user, and a password to authenticate the identity of the user.
  • the security of such login arrangements is reliant on the password being kept secret, or at least not being easily guessed or inferred.
  • the user's voice is exploited to provide both a unique identifier (such as a username) and also to identify the user, as more fully described elsewhere herein.
  • an enrolled user of a digital service makes a voice call to the service (say, for an account enquiry) and their identity must be confirmed to protect private information that may be disclosed by the digital service in the course of the call or to prevent unauthorised instructions being given to the digital service for example.
  • the caller being a putative enrolled user of the service
  • the user's voice is provides both a unique identifier (such as an account number) and also to identify the user, as more fully described elsewhere herein.
  • the invention requires the receipt of a digital audio signal (such as in the form of an audio file or an audio stream) for analysis in a login or user authentication procedure, the signal file being of a user's voice speaking a unique identifier.
  • a digital audio signal such as in the form of an audio file or an audio stream
  • the unique identifier is obtained by a speech recognition means such as provided by known speech-to-text algorithms.
  • Exemplary means include Project DeepSpeech (Mozilla), being an open source speech-to-text library that operates within the TensorFlow framework; Kaldi (released under the Apache public license); Julius, Wav2Letter++ (Facebook), being a trainable tool, DeepSpeech2 (Baidu, released under BSD license), Vosk, providing a steaming API allowing for online speech recognition, Athena (released under the Apache public license); and ESPnet (released under the Apache public license).
  • Project DeepSpeech Mozilla
  • Kaldi released under the Apache public license
  • Julius Wav2Letter++
  • Vosk providing a steaming API allowing for online speech recognition
  • Athena released under the Apache public license
  • ESPnet released under the Apache public license
  • the identifier is used to locate an electronic database record held by the digital service for the user.
  • the unique identifier may be the number string "685262", and the record associated with that unique identifier is located.
  • a "fuzzy search” may be performed to locate the record (or at least a number of candidate records) in an attempt to locate a user by the provided identifier.
  • a "fuzzy search” may be preferred since there is a degree of variability in the way the identifier can be submitted (i.e. pronounced by the user).
  • a phone number it can be cited with a country code +614xxxxx or the area code 04xxxxx.
  • the search algorithm can be anything from a simple "string matching" to a neural network based elastic search system.
  • the record will generally exist as a database record amongst the records of a plurality of users.
  • the correct record i.e. the actual record for the user
  • the correct record is determined by some secondary means. For example, each of the voice references for the candidate users may be analysed for similarity to the voice used for login or voice call identification. The record associated with the closest voice representation match, and also having at least a minimum required similarity as is otherwise required, is likely to be the correct record.
  • a user interface may prompt the user for further information such as a post code (zip code) or other item of information recorded for users so as to locate the correct record.
  • the located record will comprise a reference audio signal for the user or a representation of a reference audio signal for the user.
  • the reference audio signal has been previously obtained from the user in the course of an enrollment process, and comprises the user's voice speaking sufficient text so as to allow for identification of the user speaking the unique identifier .
  • the reference audio signal per se is used as a comparator against the received audio signal to determine the likelihood that the received audio signal is the voice of the user and thereby authenticating the identity of the user.
  • the reference audio signal is processed to form a representation of reference audio signal, and then used as a comparator against a representation of the received audio signal.
  • the same or similar processing is used to generate the representation of the reference audio signal and the received audio signal so as to allow for a useful comparison to the made.
  • the audio signal may be electronically processed to form a representation thereof.
  • the representation may a mathematical representation, a tensor, a vector (including a feature vector), or a scalar.
  • Speaker recognition may be achieved in three main steps: acoustic processing, feature extraction (to provide a representation) and classification/recognition.
  • the speech audio signal is firstly processed to improve the signal.
  • One possible step is to remove noise so as avoid confounding the extraction of the important speech attributes and therefore negatively affecting speaker identification.
  • Another possible step is level normalisation or compression of the audio signal. In many circumstance, a silence removal step will be required.
  • the purpose of the feature extraction step is to characterise a speech audio signal by a reference to predetermined number of signal elements. This avoids consideration of the entirety of data in the speech audio signal, which is generally avoided to limit processor burden. Ideally, the extraction process removes data that are not important or at least not of prime importance in the identification task.
  • Feature extraction may be accomplished by processing the speech waveform by digital means to a form of parametric representation at a relatively lower data rate or total amount of data for the comparison analysis.
  • the audio signal is processed into a representation that is actually more discriminative and/or reliable than the original signal.
  • the feature extraction is configured to identify a representation that is relatively reliable for at least several conditions for the same speech signal, to take account of variations in the environmental conditions or the speaker themselves, while still retaining the portion of the data that characterizes the information in the speech signal.
  • Feature extraction processes typically yield a multidimensional feature vector for every speech signal.
  • the prior art provides means for vectorizing audio signal data, and given the benefit of the present invention the skilled person is enabled to vectorize the digital audio data comprised in the received and reference audio signals.
  • a multidimensional linear vector is used as provided by a neural network configured to distinguish one user's speech from all others.
  • the vector representations may have any number of dimensions, and in some embodiments only two dimensions.
  • an audio frequency spectrum has only two dimensions, and therefore may be represented as a simple matrix with each number of the matrix representing an amplitude.
  • Such a matrix may be used a means for comparison between a received audio signal and a reference audio signal.
  • More complex representations of the audio signal may be used, and including representations having 10, 100, 1000 or more dimensions.
  • a useful representation is provided by 256 dimensions implementing a COS distance as a measure.
  • non-spectrum means may be used to process the audio stream, including audio embedding or speech recognition means.
  • a speech audio signal is a slow time varying signal and quasi-stationary when observed over a short time period (such as between 5 and 100 msec). Accordingly, short time spectral analysis (which includes MFCC, LPCC and PLP) may be preferred for the extraction of important discriminatory information from a speech audio signal.
  • a neural network-based feature extraction method is used, whereby the extractor is specifically trained to distinguish speakers using Mel-spectrograms.
  • the pre-processing method may comprise a pre-emphasis step involving passing the signal through a first-order finite impulse response (FIR) filter.
  • FIR finite impulse response
  • a second step of frame blocking may be performed, whereby the speech audio signal is partitioned into a series of frames generally for the purpose of removing the acoustic interface present at the beginning and end of the signal.
  • the framed speech audio signal may then then windowed by passing through a frequency filter (such as a bandpass filter) to minimize disjointedness at the start and finish of each frame.
  • a frequency filter such as a bandpass filter
  • Useful types of window include Hamming and Rectangular windows. The aim of this step is generally to improve the sharpness of harmonics, eliminate the discontinuity in the signal by tapering beginning and ending of the frame zero, and reduce spectral distortion.
  • the identity of the user may be authenticated or partially authenticated where the received user audio signal or a representation thereof and the reference audio signal or representation of a reference audio signal for the user are comparable to at least a minimum level.
  • Such comparison is typically undertaken by speaker recognition means as incorporated into the tools discussed above.
  • the minimum level the skilled person having the benefit of the present specification is enabled to determine a suitable minimal level of comparability.
  • the level will be set sufficiently high such that the rate false positive identifications are below an acceptable level and/or the rate of false negative identifications are below an acceptable level.
  • the false positive rate will typically be the more critical given the possibility that a dishonest third party will gain access to the digital service involved, and therefore also access to funds, information and other resources not intended for third party access.
  • the acceptable false positive rate may be expressed in terms of a percentage, and in some embodiments is equal to or less than about 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1 .0%, 1.5%, 2.0%, 2.5%, 3.0%, 3.5%, 4.0%, 4.5% or 5.0%.
  • the unique identifier should be unique amongst all of the identifiers implemented for the plurality of users enrolled for the digital service. For ease of recall, it is preferable that the unique identifier is chosen by the user, as distinct from one chosen by the digital service or some other party.
  • the identifier may be a unique string of letters and numbers, a telephone number, an account identifier, a customer identifier, a business identifier, an email address, or a username. Applicant has found that a telephone number (such as a cell phone number) as a unique identifier provides a number of advantages. Firstly, a telephone number (and particularly a person cell phone number) is retained by the user for many years, and possible even a life time. The number becomes very well known to the user, and therefore easily recalled for use when required in a login or voice call authentication process.
  • a second advantage compounding the first advantage is a user's telephone number will be absolutely unique (at least in the user's country of residence) because telephone numbers must be unique by their very nature. Accordingly, there is no requirement during a user enrolment process to check whether a proposed unique identifier is in fact unique amongst all existing users.
  • a third advantage compounding the previously mentioned advantages is that a telephone number is normally spoken in an almost staccato manner, with each number being recited in a discrete manner, with slight pauses in between each number.
  • the discretely spoken numbers are more easily discerned as numbers and therefore better recognised by a speech to text converter.
  • a fourth advantage compounding the previously mentioned advantages is that a telephone number comprises only ten sounds (zero, one, two . . nine). This dramatically decreases the difficulty of the analysis by the speech to text converter, thereby increasing the rate of success is transforming a received audio speech signal into a string of numbers (i.e. the unique identifier).
  • a fifth advantage compounding the previously mentioned advantages is that a telephone number may be used as a validated communication channel through which an auxiliary authentication is step is effected. Whilst identification of a user's voice may provide a certain level of certainty that a user is in fact the real user, it is generally preferred that an auxiliary authentication method is used in addition to speaker recognition via the voice. As is presently common, a two-factor authentication (2FA) method is preferred to better ensure that a putative user is the real user.
  • the auxiliary authentication method may be one that exploits the user's cell phone number and cell phone.
  • a verification code is sent to the user's cell phone by the digital service, and the received code is read by the user an input by the user into a web browser or an app forming part of the authentication system of the digital service.
  • the user's telephone number because the text is sent to that number. There is no requirement to separately locate (say, via a database record) the user's cell phone number to which the verification code is sent.
  • the verification application is set up specifically for the digital service concerned so as to provide a shared secret key to the user over a secure channel, to be stored in the authenticator application. This secret key will be used for all future logins to the site.
  • the user provides a voice sample of the unique identifier to the authentication system of the digital service, which computes (but does not display) a required verification code and requests the user to enter it.
  • the user runs the authenticator application, which independently computes and displays the same verification code, which the user types into a web page or app of the digital service, thereby authenticating their identity.
  • An exemplary authenticator app useful in the context of the present invention is GoogleTM Authenticator.
  • the present methods may be executed on a computer server.
  • the server may be administered by the digital service requiring authentication of a user.
  • the authentication process is performed by a third party entity authorized to do so by the digital service and in which case the method is executed on a third party entity server.
  • the present invention may be implemented in one of the many forms as discussed herein, and also further forms such a user interface configured to prompt or allow the various inputs and display the various outputs.
  • the interface may be configured to prompt the user for voice input, and optionally the prompt is by audio output such that the user is not required to view of otherwise interact with the interface.
  • FIG. 1 shows a non-limiting process flow for an enrolment process.
  • the user has no existing relationship with a digital service, and seeks via the enrolment process to become a user of that digital service.
  • the user starts the enrolment process by opening a relevant webpage of the digital service.
  • the webpage requests entry of the usual details (name, address, email address, cell phone number etc), at which time a database entry is made for the new user.
  • the database entry is identified within the database by reference to the user cell phone number.
  • the user is prompted to clearly speak a passage of text (which may be provided by user interface for the user to read out aloud) .
  • the user's speech is received as a digital audio signal.
  • a simplified version of the audio signal is generated to form a representation of the audio signal, and the representation is saved as a data file in the new database record for the user.
  • the representation is used after enrolment as a reference for the user.
  • FIG. 2 An extract from the database referred to above is shown at FIG. 2. Partial records for several users are shown. Each record is uniquely identifiable by the user cell phone number.
  • the login process is commenced by a user speaking their cell phone number into a user interface provided by the digital service.
  • the interface may be presented by way of an app, a webpage or any other means.
  • the user's voice is digitally processed to provide a representation of the voice audio signal (the representation being generated in the same manner as for the enrolment process).
  • the representation obtained in the login process is compared to the representation obtained during enrolment. Where the two representations display a minimum level of similarity, the login is successful. If not, the login is aborted. Preferably, the login will not be successful until and auxiliary authentication step (not shown) is successfully completed.
  • FIG. 4 shows a similar scheme to that of FIG. 3, although in the context of caller verification in the course of a voice call to a digital service.
  • the caller may be prompted to speak their cell phone number by a recording, a robot operator or a human operator.
  • the caller's (user's) voice representation is compared to a reference representation.
  • An optional auxiliary authentication step (not shown) may also be implemented.
  • the digital service allows access any of the information and resources offered by the digital service by telephone. For example, a human operator may be permitted to discuss banking details or an automated transaction service may be accessed.
  • FIG. 5 showing a non-limiting system of the invention comprising a server (100) being configured as an enrolment server, a login server, or a voice call authentication server of a digital service as described elsewhere herein.
  • a plurality of user devices (110) are in network connection with the server (100) via the Internet (120).
  • the user Upon successful login or authentication, the user is enabled to access various services of the digital services via a further connected servers (130).
  • FIG. 6 showing a non-limiting system of the present invention whereby enrolment, login or authentication is performed by a third party server (200).
  • the plurality of user devices (110) are in network connection with the server (200) via the Internet (120).
  • the third party server (200) is in network connection with a plurality of digital services (100).
  • the user devices (110) are used to authenticate a user for one of the digital services (100), by logging into the third party server (200). Upon successful login the user is able to access the relevant digital service server (100) via third party server (200).
  • identity e. g. call center enrollments : o If identity is new, it will create a new speaker against the identity, then enrol against the speaker o If identity is not new, the speaker will be looked-up, then enrol against the speaker
  • Lookup speaker by identifier may contain cust lookup by voiceprint for a more robust search - next version
  • a voiceprint is generated based on containing a speech sample.
  • the voiceprint is then compared against each of the reference voiceprmts stored in a database.
  • the best match that passes a confidence threshold is then returned.
  • this match contains a comparison score (a value between [0..1 ]) and Speaker's Identifier (a phone number) that corresponds to the best match.
  • the Enrolment scenario is required to populate a reference database at first instance. This is performed by calling the Enrolment API.
  • the user records a wav file of their voice, speaking at least their cell phone number, preferably in E. 164 format (e.g. +1234567890). Enrolment requires a valid cell phone number; and will not complete until the user clicks on the link sent to their phone.
  • the flow sends an audio file containing speaker's voice recording to create a voiceprint and associate it with the speaker's identifier.
  • the user records a wav file of their voice, speaking at least their cell phone number.
  • the flow checks if the system has an enrolled voiceprint associated with a speaker identifier (e.g. account number) and, if so, sends an audio file containing speaker's voice and compares it with the voiceprint stored in the system.
  • a speaker identifier e.g. account number
  • the user records a wav file of their voice, speaking at least their cell phone number.
  • the flow checks if the system has an enrolled voiceprint associated with a speaker identifier (identifier is obtained using speech recognition) and, if so, compares speaker's voice with the voiceprint stored in the system.
  • This flow is similar to Voice Log-in Basic (where the speaker records themselves saying their phone number) but it also requires that the speaker submits a verification code sent via a cellular network.
  • the verification code can be submitted as a voice recoding in which case speech recognition and voice verification is performed on both, initial and verification audio submission.
  • Voice Biometrics compares voice samples and provide comparison results. Where the results show a minimum match (usually, expressed as a numeric score), the voice samples are considered to be spoken from the same person. In reality, various factors may impact the performance of the technology, including background noise, the presence of multiple speakers during voice sample collection, channel (e.g. phone line vs laptop mic), microphone gain distortions. In addition, the human voice exhibits numerous properties depending on the vocal task being performed (e.g. singing vs speaking). Indeed, depending on the context and emotional state, even the same phrase may have different emphases that communicate complex semantics.
  • the present invention comprises an artificial neural network configured to capture a combination of properties of a user voice as well as the way the user speaks. Performance will vary depending on the contents and duration of speech samples. Generally, a longer a speech sample exposes more voice properties and, subsequently improve performance.
  • Enrolment audio should be longer than Authentication audio; Net Speech duration is less than audio file duration in most cases; Cross-channel authentications will show lower authentication scores; increase speech sample length or use same-phrase strategy to mitigate; Encourage users to make multiple Enrolments for each of the channels they normally use; Min.
  • Net Speech requirement can be lowered when similar phrases are used for both, Enrolment and Authentication; Voice samples containing multiple speakers will affect system performance; Avoid Enrolling multiple speakers against the same account / phone number; Speech Recognition is not used for Voice Biometrics; it is used to improve Customer Experience; Provide clear instructions to users on what they should say during the speech sample collection process; Customer identifiers should be numeric; speech samples containing spoken identifiers must not have any other numbers in them; Numeric identifiers work best with Automatic Speech Recognition and yield reliable performance; do not use nonnumeric speaker identifiers.
  • Min Enrolment Audio (Random); 30 sec; Net speech requirement for Enrolment that is agnostic to language / speech content; Min Enrolment Audio (Phrase-dependant); 10 sec; Net speech requirement for Enrolment - specific phrase; Min Auth. Audio (Random); 10 sec; Net speech requirement for authentication - content agnostic; Min Auth. Audio (Phrase-dependant); 6 sec; Net speech requirement for authentication - specific phrase Score Threshold; 0.7; possibly lowered for cross-channel scenarios, Verification Code Complexity; 6; Number of digits in the verification code; Audio / Net Speech Ratio; 1.4:1 ; Typical ratio between an audio file and net speech extracted from it.
  • a 14 sec audio file will typically contain 10 sec of net speech; Audio / Net Speech Ratio 1.4:1; Typical ratio between an audio file and net speech extracted from it e.g. a 14 sec audio file will typically contain 10 sec of net speech; Audio / Net Speech Ratio 1.4:1
  • Enrolment is done rarely (maybe once or twice) but it requires that the person being Enrolled is authentic. As much of the user's voice as practically possible should be captured to improve the quality of their Enrolment voiceprint.
  • Authentication is performed each time a previously Enrolled person wishes to gain access to a resource (e.g. their website account). During Authentication, a fresh voice sample is collected and compared against the Enrolment voiceprint. If the Authentication fails, another attempt can be re-tried provided that voice sample collection is quick and easy (i.e. short audio samples). Enrolment Audio should be longer than Authentication Audio
  • Net Speech duration is less than audio file duration. While audio files capture a wide variety of environmental sounds (including silence), the present invention requires only human speech as its input (i.e. Net Speech). Non-speech intervals may be filtered from the input audio files. For short phrases (range 10-20 seconds), the average ratio between audio duration and Net Speech is 1.4:1. This, however, will depend on whether a person reads some unfamiliar text or says something that they are comfortable with. This ratio may have to be adjusted depending on the phrase strategy used.
  • Cross-channel authentications will show lower authentication scores; increase speech sample length or use same-phrase strategy to mitigate. Users should multiple Enrolments for each of the channels normally used.
  • Cross-channel authentication happens when Enrolments and Authentications arrive on different channels. For example, a person could be registered (Enrolled) on a telephone call; after that they could try to authenticate themselves by voice using a web browser (e.g. logging-in to a website).
  • Strategies to mitigate performance degradation for cross-channel authentications include: increase minimal speech sample duration for authentications, longer the speech samples, use Same-Phrase strategy (i.e. Enrolment and authentication both contain the same spoken phrase) It is recommended to Enrol more than 1 voice sample for each channel they might come in through. Users may submit a verification code using audio (Voice Login Strong scenarios). A verification code may be sent for 2nd Factor Authentication and can be recorded as well and sent as an audio recording; and in this case the recording will be appended to the initial speech sample thus increasing the amount of Net Speech provided.
  • a user it is preferred for a user to record a little more of their voice when trying to log-in. For example, the user can be instructed them to say their full name, town and suburb followed by their phone number; the phone number part will then be used in speech recognition to identify the user, followed by the voiceprint comparison
  • Min. Net Speech requirement can be lowered when similar phrases are used for both, Enrolment and Authentication
  • the minimum amount of Net Speech for Enrolment and Authentication should be 30 and 10 seconds respectively. It has been found that the Min. Net Speech requirement can be lowered when same-phrase approach is used. In this approach, both Enrolment and Authentication would be done with voice samples capturing the same phrase; in this case, the Voice Biometrics engine faces an easier task because it compares like with like.
  • Voice samples containing multiple speakers will affect system performance. It is preferable to avoid Enrolling multiple speakers against the same account / phone number. It has been found that when audio samples sent for voice matching contain multiple speakers talking, system performance will degrade. For this reason, if multiple people need to be Enrolled against the same account, it is best to Enrol them individually. A recommended approach in this situation is to provide a unique identifier (e.g. phone number) for each person being Enrolled in the system and implement authorized person policy in the internal system of the organization.
  • a unique identifier e.g. phone number
  • ASR Automatic Speech Recognition
  • Speech Recognition is not used for Voice Biometrics; it is used to improve Customer Experience.
  • Several Authentication Flows implemented in the system rely on ASR for two reasons: Identifier extraction (e.g. phone number) and Verification code extraction.
  • the main reason for using ASR is to reduce the number of steps required to complete a voice-enabled log-in process. Indeed, when a customer is asked to say their identifier as well as verification code, with just two steps the invention can:(i) Use this recording for voice authentication (li) identify the customer (their account / phone number), and (iii) complete 2 -Factor- Authentication for enhanced security. This is achieved mostly hands-free and provides a superior Customer Experience (CX).
  • CX Customer Experience
  • the speech sample should contain only a single numeric identifier (e g. phone number) and no other numbers. This is required only for Authentication speech samples but not for Enrolment. Numeric identifiers work best with Automatic Speech Recognition and yield reliable performance
  • Login / Authentication is a voiceprint is generated based on the given audio that contains a speech sample. The voiceprint is then compared against each of the reference voiceprints stored in a database. The best match that passes a confidence threshold is then returned. Among other parameters documented below, this match contains: Comparison Score (a value between [0..1]); Speaker's Identifier (a phone number) that corresponds to the best match.
  • Comparison Score a value between [0..1]
  • Speaker's Identifier a phone number
  • FIG. 7 showing a voice flow for enrolment, used when registering a new user for voice authentication.
  • FIG. 8 showing a voice flow for voice verification, used to check if a speaker is registered for voice authentication.
  • FIG. 9 showing a voice flow for voice Log-in Basic, used to identify and authenticate a person with just 1 audio file (the audio contains a spoken mobile phone number).
  • FIG. 10 showing a voice flow for voice Log-in Strong, used to implement a voice-enabled log-in requiring require a secure identification and authentication mechanism
  • the platform does not expose any Public APIs. It implements a permissionbased security model and issues expirable JSON WEB Tokens (JWT) to logged-in users that are handled by the Front-End run in browsers.
  • JWT JSON WEB Tokens
  • the Frontend APIs that deal with Sensitive Data are additionally protected using a traditional 2-F actor- Authentication (2FA) scheme.
  • Each API has an orthogonal permission associated with it and its access is fully determined whether the currently logged in user has the respective permission in its role.
  • the platform uses a dedicated data store that is physically and logically isolated from the publicly available network.
  • Biometrics Reference is made to FIG. 12. Unlike the Platform, access to any Biometrics API is determined by the type of API Key. This is true for both, Frontend and Backend APIs. Additionally, Frontend APIs require an OAuth token issued by the Biometrics OAuth Service when a redirect from the Platform is made. The OAuth tokens are always bound to an API Key, thus, when a Business User makes a Frontend API call, the permission is determined by the type of API Key the OAuth token is bound to.
  • the data stores utilized by Biometrics are physically and logically isolated from the publicly available network. Furthermore, Customer Data is implemented as a separate data store and can be hosted by an enterprise customer.
  • the computer-implemented training identification, reporting methods and systems described herein may be deployed in part or in whole through one or more processors that execute computer software, program codes, and/or instructions on a processor.
  • the processor may be part of a server, client, network infrastructure, mobile computing platform, stationary computing platform, or other computing platform.
  • a processor may be any kind of computational or processing device capable of executing program instructions, codes, binary instructions and the like.
  • the processor may be or may include a signal processor, digital processor, embedded processor, microprocessor or any variant such as a coprocessor (math co-processor, graphic co-processor, communication co-processor and the like) and the like that may directly or indirectly facilitate execution of program code or program instructions stored thereon.
  • a GPU configured to be operable with a parallel computing platform such as CUD ATM (compute unified device architecture; Nvidia, CA United States) is used.
  • CUD ATM is a parallel computing platform and programming model developed for general computing on GPUs.
  • the sequential part of the workload runs on the CPU (which is optimized for single-threaded performance) while the compute intensive portion of the application runs on a plurality (and even thousands) of GPU cores in parallel.
  • CUDATM may be implemented in well used languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions using only basic keywords and libraries.
  • the processor may enable execution of multiple programs, threads, and codes.
  • the threads may be executed simultaneously to enhance the performance of the processor and to facilitate simultaneous operations of the application.
  • methods, program codes, program instructions and the like described herein may be implemented in one or more thread.
  • the thread may spawn other threads that may have assigned priorities associated with them; the processor may execute these threads based on priority or any other order based on instructions provided in the program code.
  • the processor may include memory that stores methods, codes, instructions and programs as described herein and elsewhere.
  • Any processor or a mobile communication device or server may access a storage medium through an interface that may store methods, codes, and instructions as described herein and elsewhere.
  • the storage medium associated with the processor for storing methods, programs, codes, program instructions or other type of instructions capable of being executed by the computing or processing device may include but may not be limited to one or more of a CD-ROM, DVD, memory, hard disk, flash drive, RAM, ROM, cache and the like.
  • a processor may include one or more cores that may enhance speed and performance of a multiprocessor.
  • the processor may be a dual core processor, quad core processors, other chip-level multiprocessor and the like that combine two or more independent cores (called a die).
  • the methods and systems described herein may be deployed in part or in whole through one or more hardware components that execute software on a server, client, firewall, gateway, hub, router, or other such computer and/or networking hardware.
  • the software program may be associated with a server that may include a file server, print server, domain server, internet server, intranet server and other variants such as secondary server, host server, distributed server and the like.
  • the server may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other servers, clients, computers, and devices through a wired or a wireless medium, and the like.
  • the methods, programs or codes as described herein and elsewhere may be executed by the server.
  • other devices required for execution of methods as described in this application may be considered as a part of the infrastructure associated with the server.
  • the server may provide an interface to other devices including, without limitation, clients, other servers, printers, database servers, print servers, file servers, communication servers, distributed servers and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or all of these devices may facilitate parallel processing of a program or method at one or more location without deviating from the scope of the invention.
  • any of the devices attached to the server through an interface may include at least one storage medium capable of storing methods, programs, code and/or instructions.
  • a central repository may provide program instructions to be executed on different devices.
  • the remote repository may act as a storage medium for program code, instructions, and programs.
  • the software program may be associated with a client that may include a file client, print client, domain client, internet client, intranet client and other variants such as secondary client, host client, distributed client and the like.
  • the client may include one or more of memories, processors, computer readable media, storage media, ports (physical and virtual), communication devices, and interfaces capable of accessing other clients, servers, computers, and devices through a wired or a wireless medium, and the like.
  • the methods, programs or codes as described herein and elsewhere may be executed by the client.
  • other devices required for execution of methods as described in this application may be considered as a part of the infrastructure associated with the client.
  • the client may provide an interface to other devices including, without limitation, servers, other clients, printers, database servers, print servers, file servers, communication servers, distributed servers and the like. Additionally, this coupling and/or connection may facilitate remote execution of program across the network. The networking of some or all of these devices may facilitate parallel processing of a program or method at one or more location without deviating from the scope of the invention.
  • any of the devices attached to the client through an interface may include at least one storage medium capable of storing methods, programs, applications, code and/or instructions.
  • a central repository may provide program instructions to be executed on different devices.
  • the remote repository may act as a storage medium for program code, instructions, and programs.
  • the methods and systems described herein may be deployed in part or in whole through network infrastructures.
  • the network infrastructure may include elements such as computing devices, servers, routers, hubs, firewalls, clients, personal computers, communication devices, routing devices and other active and passive devices, modules and/or components as known in the art.
  • the computing and/or noncomputing device(s) associated with the network infrastructure may include, apart from other components, a storage medium such as flash memory, buffer, stack, RAM, ROM and the like.
  • the processes, methods, program codes, instructions described herein and elsewhere may be executed by one or more of the network infrastructural elements.
  • the methods, program codes, calculations, algorithms, and instructions described herein may be implemented on a cellular network having multiple cells.
  • the cellular network may either be frequency division multiple access (FDMA) network or code division multiple access (CDMA) network.
  • FDMA frequency division multiple access
  • CDMA code division multiple access
  • the cellular network may include mobile devices, cell sites, base stations, repeaters, antennas, towers, and the like.
  • the cell network may be a GSM, GPRS, 3G, 4G, EVDO, mesh, or other networks types.
  • the methods, programs codes, calculations, algorithms and instructions described herein may be implemented on or through mobile devices.
  • the mobile devices may include navigation devices, cell phones, mobile phones, mobile personal digital assistants, laptops, palmtops, netbooks, pagers, electronic books readers, music players and the like. These devices may include, apart from other components, a storage medium such as a flash memory, buffer, RAM, ROM and one or more computing devices.
  • the computing devices associated with mobile devices may be enabled to execute program codes, methods, and instructions stored thereon.
  • the mobile devices may be configured to execute instructions in collaboration with other devices.
  • the mobile devices may communicate with base stations interfaced with servers and configured to execute program codes.
  • the mobile devices may communicate on a peer to peer network, mesh network, or other communications network.
  • the program code may be stored on the storage medium associated with the server and executed by a computing device embedded within the server.
  • the base station may include a computing device and a storage medium.
  • the storage device may store program codes and instructions executed by the computing devices associated with the base station.
  • the computer software, program codes, and/or instructions may be stored and/or accessed on computer readable media that may include: computer components, devices, and recording media that retain digital data used for computing for some interval of time; semiconductor storage known as random access memory (RAM); mass storage typically for more permanent storage, such as optical discs, forms of magnetic storage like hard disks, tapes, drums, cards and other types; processor registers, cache memory, volatile memory, non-volatile memory; optical storage such as CD, DVD; removable media such as flash memory (e.g. USB sticks or keys), floppy disks, magnetic tape, paper tape, punch cards, standalone RAM disks.
  • RAM random access memory
  • mass storage typically for more permanent storage, such as optical discs, forms of magnetic storage like hard disks, tapes, drums, cards and other types
  • processor registers cache memory, volatile memory, non-volatile memory
  • optical storage such as CD, DVD
  • removable media such as flash memory (e.g. USB sticks or keys), floppy disks, magnetic tape, paper tape, punch cards,
  • Removable mass storage off-line, and the like; other computer memory such as dynamic memory, static memory, read/write storage, mutable storage, read only, random access, sequential access, location addressable, file addressable, content addressable, network attached storage, storage area network, bar codes, magnetic ink, and the like.
  • the methods and systems described herein may transform physical and/or or intangible items from one state to another.
  • the methods and systems described herein may also transform data representing physical and/or intangible items from one state to another.
  • the methods and/or processes described above, and steps thereof, may be realized in hardware, software or any combination of hardware and software suitable for a particular application.
  • the hardware may include a general purpose computer and/or dedicated computing device or specific computing device or particular aspect or component of a specific computing device.
  • the processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory.
  • the processes may also, or instead, be embodied in an application specific integrated circuit, a programmable gate array, programmable array logic, or any other device or combination of devices that may be configured to process electronic signals. It will further be appreciated that one or more of the processes may be realized as a computer executable code capable of being executed on a computer readable medium.
  • the Application software may be created using a structured programming language such as C, an object oriented programming language such as C++, or any other high-level or low-level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may be stored, compiled or interpreted to run on one of the above devices, as well as heterogeneous combinations of processors, processor architectures, or combinations of different hardware and software, or any other machine capable of executing program instructions.
  • a structured programming language such as C
  • an object oriented programming language such as C++
  • any other high-level or low-level programming language including assembly languages, hardware description languages, and database programming languages and technologies
  • each method described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof.
  • the methods may be embodied in systems that perform the steps thereof, and may be distributed across devices in a number of ways, or all of the functionality may be integrated into a dedicated, standalone device or other hardware.
  • the means for performing the steps associated with the processes described above may include any of the hardware and/or software described above. All such permutations and combinations are intended to fall within the scope of the present disclosure.
  • the invention may be embodied in program instruction set executable on one or more computers.
  • Such instructions sets may include any one or more of the following instruction types:
  • Data handling and memory operations which may include an instruction to set a register to a fixed constant value, or copy data from a memory location to a register, or vice-versa (a machine instruction is often called move, however the term is misleading), to store the contents of a register, result of a computation, or to retrieve stored data to perform a computation on it later, or to read and write data from hardware devices.
  • Arithmetic and logic operations which may include an instruction to add, subtract, multiply, or divide the values of two registers, placing the result in a register, possibly setting one or more condition codes in a status register, to perform bitwise operations, e.g., taking the conjunction and disjunction of corresponding bits in a pair of registers, taking the negation of each bit in a register, or to compare two values in registers (for example, to see if one is less, or if they are equal).
  • Control flow operations which may include an instruction to branch to another location in the program and execute instructions there, conditionally branch to another location if a certain condition holds, indirectly branch to another location, or call another block of code, while saving the location of the next instruction as a point to return to.
  • Coprocessor instructions which may include an instruction to load/store data to and from a coprocessor, or exchanging with CPU registers, or perform coprocessor operations.
  • a processor of a computer of the present system may include "complex" instructions in their instruction set.
  • a single “complex” instruction does something that may take many instructions on other computers. Such instructions are typified by instructions that take multiple steps, control multiple functional units, or otherwise appear on a larger scale than the bulk of simple instructions implemented by the given processor.
  • Some examples of "complex" instructions include: saving many registers on the stack at once, moving large blocks of memory, complicated integer and floating-point arithmetic (sine, cosine, square root, etc.), SIMD instructions, a single instruction performing an operation on many values in parallel, performing an atomic test-and-set instruction or other read-modify-write atomic instruction, and instructions that perform ALU operations with an operand from memory rather than a register.
  • An instruction may be defined according to its parts. According to more traditional architectures, an instruction includes an opcode that specifies the operation to perform, such as add contents of memory to register — and zero or more operand specifiers, which may specify registers, memory locations, or literal data. The operand specifiers may have addressing modes determining their meaning or may be in fixed fields. In very long instruction word (VLIW) architectures, which include many microcode architectures, multiple simultaneous opcodes and operands are specified in a single instruction.
  • VLIW very long instruction word
  • Some types of instruction sets do not have an opcode field (such as Transport Triggered Architectures (TTA) or the Forth virtual machine), only operand(s).
  • Other unusual "0-operand" instruction sets lack any operand specifier fields, such as some stack machines including NOSC.
  • Conditional instructions often have a predicate field — several bits that encode the specific condition to cause the operation to be performed rather than not performed. For example, a conditional branch instruction will be executed, and the branch taken, if the condition is true, so that execution proceeds to a different part of the program, and not executed, and the branch not taken, if the condition is false, so that execution continues sequentially. Some instruction sets also have conditional moves, so that the move will be executed, and the data stored in the target location, if the condition is true, and not executed, and the target location not modified, if the condition is false. Similarly, IBM z/ Architecture has a conditional store. A few instruction sets include a predicate field in every instruction; this is called branch predication.
  • the instructions constituting a program are rarefy specified using their internal, numeric form (machine code); they may be specified using an assembly language or, more typically, may be generated from programming languages by compilers.
  • any of the methods disclosed herein may be performed by application software executable on any past, present or future operating system of a processor-enabled device such as WindowsTM, LinuxTM AndroidTM, iOSTM, and the like. It will be appreciated that any software may be distributed across a number of devices or in a "software as a service” format, or “platform as a service” format whereby participants require only some computer-based means of engaging with the software.
  • the present invention may be configured to interface with the infrastructure (in terms of both hardware and software) of a contact centre.
  • the audio stream may be input into the present methods via a telephony line (physical or VOIP) of the contact centre.
  • a telephony line physical or VOIP
  • reporting outputs relating to back office activity time may output into existing databases of the contact centre and therefore be amenable to inclusion in existing reporting systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne l'authentification d'un utilisateur d'un fournisseur de services numériques en ligne ou par l'intermédiaire d'un téléphone. Plus particulièrement, l'invention concerne des procédés d'authentification d'un utilisateur à l'aide d'une analyse de signal de la voix d'un utilisateur par des moyens de traitement de signal audio numérique. Le traitement de signal peut être réalisé sur un signal audio de l'utilisateur prononçant un identifiant tel que leur numéro de téléphone cellulaire.
PCT/AU2022/051333 2021-11-09 2022-11-08 Procédés d'authentification et d'ouverture de session d'utilisateur WO2023081962A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2021903574A AU2021903574A0 (en) 2021-11-09 User authentication and login methods
AU2021903574 2021-11-09

Publications (1)

Publication Number Publication Date
WO2023081962A1 true WO2023081962A1 (fr) 2023-05-19

Family

ID=86334780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2022/051333 WO2023081962A1 (fr) 2021-11-09 2022-11-08 Procédés d'authentification et d'ouverture de session d'utilisateur

Country Status (1)

Country Link
WO (1) WO2023081962A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7386448B1 (en) * 2004-06-24 2008-06-10 T-Netix, Inc. Biometric voice authentication
US20110276323A1 (en) * 2010-05-06 2011-11-10 Senam Consulting, Inc. Speech-based speaker recognition systems and methods
US20150095028A1 (en) * 2013-09-30 2015-04-02 Bank Of America Corporation Customer Identification Through Voice Biometrics
US20180233152A1 (en) * 2017-02-13 2018-08-16 Google Llc Voice Signature for User Authentication to Electronic Device
US20210090561A1 (en) * 2019-09-24 2021-03-25 Amazon Technologies, Inc. Alexa roaming authentication techniques

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7386448B1 (en) * 2004-06-24 2008-06-10 T-Netix, Inc. Biometric voice authentication
US20110276323A1 (en) * 2010-05-06 2011-11-10 Senam Consulting, Inc. Speech-based speaker recognition systems and methods
US20150095028A1 (en) * 2013-09-30 2015-04-02 Bank Of America Corporation Customer Identification Through Voice Biometrics
US20180233152A1 (en) * 2017-02-13 2018-08-16 Google Llc Voice Signature for User Authentication to Electronic Device
US20210090561A1 (en) * 2019-09-24 2021-03-25 Amazon Technologies, Inc. Alexa roaming authentication techniques

Similar Documents

Publication Publication Date Title
US20180047397A1 (en) Voice print identification portal
US10650379B2 (en) Method and system for validating personalized account identifiers using biometric authentication and self-learning algorithms
US8255223B2 (en) User authentication by combining speaker verification and reverse turing test
US8898063B1 (en) Method for converting speech to text, performing natural language processing on the text output, extracting data values and matching to an electronic ticket form
WO2017215558A1 (fr) Procédé et dispositif de reconnaissance d'empreinte vocale
US11625467B2 (en) Authentication via a dynamic passphrase
JP2010079235A (ja) 個人(オーディ)情報を含まないメディア・ストリームを保存する方法
US9646613B2 (en) Methods and systems for splitting a digital signal
US10909991B2 (en) System for text-dependent speaker recognition and method thereof
Chakrabarty et al. Development and evaluation of online text-independent speaker verification system for remote person authentication
US11868453B2 (en) Systems and methods for customer authentication based on audio-of-interest
US20130339245A1 (en) Method for Performing Transaction Authorization to an Online System from an Untrusted Computer System
JP4143541B2 (ja) 動作モデルを使用して非煩雑的に話者を検証するための方法及びシステム
WO2023081962A1 (fr) Procédés d'authentification et d'ouverture de session d'utilisateur
Das et al. Multi-style speaker recognition database in practical conditions
US20220321350A1 (en) System for voice authentication through voice recognition and voiceprint recognition
Dovydaitis et al. Speaker authentication system based on voice biometrics and speech recognition
Kuznetsov et al. Methods of countering speech synthesis attacks on voice biometric systems in banking
US11436309B2 (en) Dynamic knowledge-based voice authentication
US11289080B2 (en) Security tool
JP4245948B2 (ja) 音声認証装置、音声認証方法及び音声認証プログラム
Vaughan-Nichols Voice authentication speaks to the marketplace
Anarkat et al. Detection of Mimicry Attacks on Speaker Verification System for Cartoon Characters’ Dataset
JP5436951B2 (ja) 本人認証装置および本人認証方法
Duraibi et al. Suitability of Voice Recognition Within the IoT Environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891202

Country of ref document: EP

Kind code of ref document: A1