BR102013024095A2

BR102013024095A2 - biometric recognition transaction authentication method

Info

Publication number: BR102013024095A2
Application number: BR102013024095A
Authority: BR
Inventors: Flávio Olmos Simões; José Eduardo De Carvalho Silva; Mário Uliani Neto; Ricardo Paranhos Velloso Violato
Original assignee: Banco Bradesco S A; Fundação Cpqd Ct De Pesquisa E Desenvolvimento Em Telecomunicações
Priority date: 2013-09-19
Filing date: 2013-09-19
Publication date: 2016-01-26

Abstract

método de autenticação de transações com reconhecimento biométrico a invenção refere-se a um método para autenticação de transações, em que tanto a identidade do usuário quanto a própria transação são autenticados por voz, utilizando tecnologias de verificação de locutor e reconhecimento de fala, respectivamente, em que a autenticação caracteriza uma assinatura para a transação, não necessitando do uso de senhas e/ou cartões, tokens ou equivalentes. o método compreende os passos de introduzir os dados da transação, gerar sentença com os dados da transação, ler em voz alta a sentença gerada, reconhecer o locutor por comparação com dita referência biométrica, reconhecer a fala por comparação com a sentença gerada, extrair os dados dinâmicos e específicos da transação, componentes do áudio e verificar biometricamente os segmentos de dados extraídos.Biometric Recognition Transaction Authentication Method The invention relates to a method for transaction authentication, wherein both the user's identity and the transaction itself are authenticated by voice, using speaker verification and speech recognition technologies, respectively. where authentication characterizes a signature for the transaction, not requiring the use of passwords and / or cards, tokens or equivalents. The method comprises the steps of entering transaction data, generating sentence with transaction data, reading aloud the generated sentence, recognizing the speaker by comparison with said biometric reference, recognizing speech by comparison with the generated sentence, extracting dynamic and transaction-specific data, audio components, and biometrically scan extracted data segments.

Description

MÉTODO DE AUTENTICAÇÃO DE TRANSAÇÕES COM RECONHECIMENTO BIOMÉTRICOAUTHENTICATION METHOD OF BIOMETRIC RECOGNITION TRANSACTIONS

Campo de Aplicação A presente invenção se aplica ao campo da Tecnologia da Informação, no referente a transações, particularmente à segurança destas. Mais especificamente, refere-se a um método para autenticação de transações, em que tanto a identidade do usuário quanto a própria transação são autenticadas por voz, utilizando tecnologias de reconhecimento de locutor (speaker recognition) e reconhecimento de fala (speech recognition).Field of Application The present invention applies to the field of Information Technology with regard to transactions, particularly their security. More specifically, it refers to a method for transaction authentication, wherein both the user identity and the transaction itself are authenticated by voice, using speaker recognition and speech recognition technologies.

Tal método provê uma "assinatura” para a transação, garantindo sua autenticidade, e podendo dispensar o uso de senhas, cartões, tokens e outros dispositivos desse tipo.Such method provides a "signature" for the transaction, ensuring its authenticity, and may dispense with the use of passwords, cards, tokens and other such devices.

Para um melhor entendimento do relatório descritivo, apresentam-se a seguir algumas expressões e termos utilizados no mesmo.For a better understanding of the descriptive report, the following are some expressions and terms used in it.

Applet - software aplicativo que é executado no contexto de outro programa, geralmente executando funções bem específicas.Applet - application software that runs in the context of another program, usually performing very specific functions.

Bluetooth - é uma especificação industrial para áreas de redes pessoais sem fio provendo uma maneira de conectar e trocar informações entre dispositivos como telefones celulares, notebooks, computadores, impressoras, câmeras digitais e consoles de videogames digitais através de uma frequência de rádio de curto alcance globalmente licenciada e segura.Bluetooth - is an industry specification for wireless personal networking areas providing a way to connect and exchange information between devices such as mobile phones, notebooks, computers, printers, digital cameras and digital video game consoles over a short-range radio frequency globally. licensed and secure.

Liveness detection - conjunto de técnicas cujo objetivo é realizar a detecção de vitalidade.Liveness detection - A set of techniques designed to perform vitality detection.

Non-keyboard - dados inseridos por outro meio que não o teclado.Non-keyboard - data entered by means other than the keyboard.

Near Field Communication (NFC) - padrão que permite comunicação sem fio entre dois dispositivos próximos.Near Field Communication (NFC) - A standard that allows wireless communication between two nearby devices.

Password - senha Speech recognition - conjunto de técnicas de reconhecimento de padrões, usadas para realizar reconhecimento de fala, ou seja, dado um arquivo de áudio, reconhecer o texto que está sendo falado neste áudio.Password - password Speech recognition - set of pattern recognition techniques used to perform speech recognition, ie given an audio file, to recognize the text being spoken in this audio.

Speaker recognition - conjunto de técnicas de reconhecimento de padrões usadas para reconhecer o locutor, ou seja, dado um arquivo de áudio, reconhecer quem está falando neste áudio.Speaker recognition - set of pattern recognition techniques used to recognize the speaker, ie given an audio file, to recognize who is speaking in this audio.

Voiceprint - marca registrada pelo inventor Lawrence Kersta na década de 1960, com base no espectrógrafo de som (citado na patente US2403985 da Bell Telephone Labs datada de 16/07/1946, de W.Koenig Jr.), referente a um sistema de identificação de indivíduos através da voz mediante um gráfico que exibe a amplitude relativa dos componentes de freqüência de uma fala em função do tempo, ou seja, um espectrograma. Esse termo adquiriu sentido mais amplo, sendo usado posteriormente para designar outras representações das características da fala de cada indivíduo. Atualmente, trata-se de uma designação obsoleta que, apesar de amplamente empregada em patentes, deveria ser substituída pela expressão mais exata referência biométrica, descrita a seguir.Voiceprint - a trademark of the inventor Lawrence Kersta in the 1960s based on a sound spectrograph (cited in Bell Telephone Labs US Patent No. 2,440,385 dated July 16, 1946, to W.Koenig Jr.), for an identification system of individuals through the voice by means of a graph that displays the relative amplitude of the frequency components of a speech as a function of time, that is, a spectrogram. This term acquired a broader meaning and was later used to designate other representations of each individual's speech characteristics. Today, it is an obsolete designation which, although widely used in patents, should be replaced by the more accurate expression biometric reference described below.

Referência Biométrica - Segundo a norma ISO/IEC 2382-37 Harmonized Biometríc Vocabulary, ainda em fase preliminar, a referência biométrica pode ser composta por um ou mais dentre: (i) amostras biométricas, por exemplo, uma imagem representando a característica biométrica; (ii) templates biométricos, isto é, um conjunto de atributos biométricos comparáveis diretamente a atributos biométricos de prova; ou (iii) modelos biométricos, ou seja, uma função gerada a partir de dados biométricos. Esse termo aplica-se a qualquer traço biométrico (voz, face, impressão digital etc.). No contexto da biometria de locutor, a referência biométrica é muitas vezes chamada de modelo ou template.Biometric Reference - According to ISO / IEC 2382-37 Harmonized Biometríc Vocabulary, still in its preliminary phase, the biometric reference may consist of one or more of: (i) biometric samples, for example, an image representing the biometric characteristic; (ii) biometric templates, that is, a set of biometric attributes comparable directly to proof biometric attributes; or (iii) biometric models, ie a function generated from biometric data. This term applies to any biometric trait (voice, face, fingerprint, etc.). In the context of speaker biometrics, the biometric reference is often called a template.

ZigBee - este termo designa um conjunto de especificações para a comunicação sem-fio entre dispositivos eletrônicos, com ênfase na baixa potência de operação, baixa taxa de transmissão de dados e no baixo custo de implantação. Destina-se a ligar pequenas unidades de coleta de dados e de tele-comando recorrendo a sinais de radiofrequência não licenciados. A tecnologia utilizada é comparável às redes WiFi e Bluetooth e diferencia-se destas por um menor consumo, alcance reduzido (cerca de 10 metros) e as comunicações entre duas unidades poder ser repetida sucessivamente pelas unidades existentes na rede até atingir o destino final.ZigBee - This term designates a set of specifications for wireless communication between electronic devices, with an emphasis on low operating power, low data rate and low deployment cost. It is intended to connect small data collection and remote control units using unlicensed radio frequency signals. The technology used is comparable to WiFi and Bluetooth networks and differs from them by lower consumption, reduced range (about 10 meters) and communications between two units can be repeated successively by the units on the network until reaching the final destination.

Estado da Técnica O uso de técnicas automáticas de reconhecimento biométrico já está razoavelmente difundido, tendo sido aplicado a serviços como o controle de acesso físico, vigilância ou autenticação, sendo a impressão digital, e a face, os traços biométricos mais recorrentes.State of the Art The use of automatic biometric recognition techniques is already fairly widespread and has been applied to services such as physical access control, surveillance or authentication, with fingerprint and face being the most recurrent biometric features.

No que diz respeito à voz, apesar de constituir um campo de estudo bastante prolífico, ainda são raras as aplicações que utilizem biometria de voz (ou de locutor) embora o uso da voz, ao invés de outro traço, não seja novo.Regarding voice, although it is a very prolific field of study, applications using voice (or speaker) biometrics are still rare, although the use of voice, rather than another feature, is not new.

Algumas abordagens já foram propostas para aplicações de reconhecimento de locutor, como as redes neurais artificiais, técnicas de quantização vetorial e técnicas baseadas em modelos ocultos de Markov, ou HMM (Hidden Markov Models).Some approaches have already been proposed for speaker recognition applications, such as artificial neural networks, vector quantization techniques, and Hidden Markov Models (HMM) based techniques.

Mais recentemente, a literatura da área tem sido dominada por técnicas baseadas em modelos de misturas gaussianas, ou GMMs (Gaussian Mixture Models), para modelar os dados, normalmente a partir de um UBM (Universal Background Model), seguidas de outras técnicas de classificação, tais como SVM (Support Vector Machines), JFA (Joint Factor Analysis) e iVector. Essa abordagem é utilizada em diversas aplicações de aprendizado de máquina (Machine Learning), sendo que, em um sistema de reconhecimento de locutor, um GMM é usado como um modelo probabilístico genérico, capaz de representar densidades multivariáveis arbitrárias. Um GMM busca modelar essa distribuição qualquer de dados multidimensionais como uma combinação linear de distribuições normais, ou Gaussianas.More recently, the literature in the area has been dominated by techniques based on Gaussian Mixture Models (GMMs) to model data, usually from a Universal Background Model (UBM), followed by other classification techniques. , such as SVM (Support Vector Machines), JFA (Joint Factor Analysis) and iVector. This approach is used in many Machine Learning applications, and in a speaker recognition system a GMM is used as a generic probabilistic model capable of representing arbitrary multivariate densities. A GMM seeks to model this any distribution of multidimensional data as a linear combination of normal, or Gaussian, distributions.

No caso da fala, os dados são geralmente obtidos por um processo de divisão do sinal de fala em quadros, normalmente com sobreposição nos quadros adjacentes, janelamento do quadro, atenuando as bordas e evitando, assim, a inserção de distorções principalmente de alta frequência, seguido da extração de parâmetros propriamente dita. A duração do quadro costuma estar na ordem de dezenas de milissegundos, enquanto sua frequência, na ordem de alguns quilohertz.In the case of speech, the data is usually obtained by a process of dividing the speech signal into frames, usually with overlapping adjacent frames, window trimming, attenuating the edges and thus avoiding the insertion of mainly high frequency distortions, followed by the extraction of parameters itself. The frame's duration is usually in the tens of milliseconds, while its frequency is in the order of a few kilohertz.

Os parâmetros mais largamente utilizados nas aplicações de reconhecimento de locutor são, sem dúvida, os coeficientes mel-cepstrais, ou MFCCs (Mel Frequency Cepstral Coefficients), os quais podem ser obtidos, por exemplo, através de uma técnica conhecida como análise por banco de filtros. Os coeficientes mel-cepstrais são a transformada discreta de cosseno (DCT -Discrete Cosine Transform) do logaritmo da energia do sinal resultante da filtragem do sinal original, por um banco de filtros passa-faixa, onde cada filtro define uma banda crítica (por isso, muitas vezes, esses filtros passa-faixa são também chamados de filtros de banda-crítica), espaçados uniformemente na escala mel e cobrindo o espectro de interesse do sinal. O método ora proposto independe das técnicas utilizadas para reconhecimento de locutor, sejam elas na etapa de extração dos parâmetros, de modelagem dos dados, de classificação ou em qualquer outra etapa do processo de reconhecimento em si.The most widely used parameters in speaker recognition applications are undoubtedly mel-cepstral coefficients, or MFCCs (Honey Frequency Cepstral Coefficients), which can be obtained, for example, by a technique known as bank analysis. filters. The mel-cepstral coefficients are the discrete cosine transform (DCT) of the signal energy logarithm resulting from the original signal filtering by a bandpass filter bank, where each filter defines a critical band (therefore often these bandpass filters are also called critical band filters), evenly spaced on the mobile scale and covering the spectrum of interest of the signal. The proposed method is independent of the techniques used for speaker recognition, whether in the parameter extraction, data modeling, classification or any other stage of the recognition process itself.

De uma maneira geral os sistemas de autenticação existem para garantir ou para se certificar que certo recurso está sendo acessado por pessoas autorizadas para tal. A autenticação de pessoas pode ser baseada em algo que a pessoa sabe, por exemplo, uma senha; algo que a pessoa possui, por exemplo, um cartão; ou em algo que a pessoa é, sendo este o caso da biometria. Esses fatores podem ainda ser combinados, geralmente aumentando a segurança da autenticação, porém, na maioria das vezes, comprometendo a usabilidade do sistema.In general, authentication systems exist to guarantee or to make sure that a certain resource is being accessed by authorized persons. People authentication can be based on something the person knows, for example a password; something that the person has, for example, a card; or something the person is, such as biometrics. These factors can still be combined, often increasing authentication security but often compromising system usability.

Contudo, tais sistemas estão sujeitos à ação de fraudadores, os quais tentam acessar o recurso sem que tenham a devida autorização. No caso da biometria não é diferente.However, such systems are subject to the action of fraudsters who attempt to access the resource without proper authorization. In the case of biometrics is no different.

Uma das formas de ataque a um sistema de reconhecimento biométrico é praticado utilizando-se uma amostra do traço biométrico (em questão) de um usuário autêntico, para se passar pelo mesmo (por exemplo, no caso de face, uma foto). No caso da biometria de locutor, uma gravação podería ser utilizada.One way to attack a biometric recognition system is by using a sample of the biometric trait (in question) of an authentic user to pass through it (for example, in the case of a face, a photo). In the case of speaker biometrics, a recording could be used.

Esse tipo de ataque é conhecido com spoofing e as contramedidas para combatê-lo de antispoofing. Uma categoria de técnicas com esse objetivo são as chamadas técnicas de liveness detection, como são conhecidas na área de biometria. Tais técnicas são empregadas para reconhecer que é uma pessoa real que está tentando acessar o recurso e não uma foto ou uma gravação, por exemplo.This type of attack is known as spoofing and countermeasures to combat it from antispoofing. One category of techniques for this purpose is called liveness detection techniques, as they are known in the field of biometrics. Such techniques are employed to recognize that it is a real person who is trying to access the resource and not a photo or recording, for example.

No caso da voz, uma das técnicas de liveness detection utilizadas é solicitar para que a pessoa fale um texto específico e conhecido pelo sistema; e em seguida, associar ao software de reconhecimento de locutor um sistema de reconhecimento de fala, cujo objetivo é extrair do áudio o conteúdo do que foi falado. Assim, pode-se comparar esse resultado com o texto fornecido e checar se são coincidentes.In the case of voice, one of the liveness detection techniques used is to ask the person to speak a specific text known by the system; and then to associate speaker recognition software with a speech recognition system, the purpose of which is to extract audio from the content of what was spoken. Thus, you can compare this result with the text provided and check if they match.

Dentre as técnicas mais usadas para tal tarefa, as cadeias ocultas de Markov (HMM) representam o estado da arte na atualidade. Apesar disso, também há quem utilize redes neurais e até uma combinação entre redes neurais e HMM.Among the most commonly used techniques for this task, Markov hidden chains (HMM) represent the state of the art today. Nevertheless, there are also those who use neural networks and even a combination of neural networks and HMM.

Todo esse arcabouço pode ser empregado para autenticar uma transação, de forma que, após uma etapa prévia de cadastro de sua referência biométrica, um usuário seja autenticado lendo, no momento da transação, um texto fornecido pelo sistema.All this framework can be used to authenticate a transaction, so that after a previous step of registering your biometric reference, a user is authenticated by reading a text provided by the system at the time of the transaction.

Apesar de existirem documentos de patentes e outras publicações relacionadas a esse tema, em nenhum deles a fala é usada para autenticar simultaneamente o usuário e a transação. A diferença fundamental do método proposto pela presente invenção está na natureza do texto gerado pelo sistema que deverá ser lido pelo usuário. Neste, o texto deve conter informações da própria transação, garantindo, através da fala, a assinatura da transação. Em outras palavras, o usuário autoriza a transação falando-a.Although there are patent documents and other publications related to this subject, neither is speech used to simultaneously authenticate the user and the transaction. The fundamental difference from the method proposed by the present invention lies in the nature of the system generated text that should be read by the user. In this, the text must contain information of the transaction itself, ensuring, through speech, the signature of the transaction. In other words, the user authorizes the transaction by speaking it.

Compreendem o estado da técnica diversos documentos de patente revelando métodos pertinentes ao assunto, O documento de patente EP1669836 A1 (Wang) “USER AUTHENTICATION BY COMBINING SPEAKER VERIFICATION AND REVERSE TURING TEST’ revela um método para autenticar um usuário a partir da verificação do locutor, sendo que a autenticação é baseada em uma frase gerada pelo sistema. No entanto, nesta proposta, a frase gerada é uma pergunta baseada em informações pessoais do usuário; o usuário deve falar a resposta, que, então, é reconhecida pelo sistema, que checa se ela está correta. Portanto na proposta da patente de Wang apenas o usuário é autenticado (a transação não). Além disso, um fraudador poderia ter acesso a essas informações pessoais (por engenharia social, por exemplo), que são estáticas e, com poucas gravações do usuário, conseguir burlar o sistema. A publicação W09900719 A1 (Van Compernolle et al.) “ACCESS-CONTROLLED COMPUTER SYSTEM WITH AUTOMATIC SPEECH RECOGNITION” revela um sistema que controla o acesso de um usuário usando entradas "non-keyboard". A ideia é que o usuário consiga acessar um sistema usando uma interface diferente do teclado e depois interaja também sem o utilizar. No entanto, nesta proposta o acesso ao sistema não é feito com uma frase gerada pelo sistema. O usuário diz alguma informação (por exemplo, uma senha) que é checada com o único objetivo de liberar o seu acesso. Desta forma, a locução emitida pelo usuário não caracteriza uma assinatura do processo. O documento US2003037004A1 (Buffum et al.) “DIALOG-BASED VOICEPRINT SECURITY FOR BUSINESS TRANSACTONS” descreve um sistema que permite autenticar usuários de uma determinada transação, através da combinação de reconhecimento de fala e reconhecimento de locutor. A arquitetura descrita é do tipo cliente-servidor, em que há: um servidor de autenticação onde é executada a aplicação na qual o usuário pretende se autenticar; e um cliente onde é executado um applet que faz a coleta das amostras de voz. O documento não define o tipo de dispositivo onde é feita a captura, podendo ser um desktop, um telefone, um dispositivo móvel, etc. São descritas as etapas de cadastramento e de verificação do usuário; e, para cada uma das etapas, as telas de interação com o usuário. Algumas características do sistema revelado nesse documento são: (i) na primeira etapa, de cadastramento, o usuário lê o seu "account ID", e depois lê os dígitos de 1 a 9; (ii) na etapa de verificação, o usuário não deve ler dados associados à transação que ele pretende realizar, mas sim uma sequência de dígitos aleatória. Ou seja, a autenticação por voz não funciona como uma assinatura da transação e o reconhecimento de fala limita-se à leitura de dígitos. A publicação W02006128171 (Di Mambro et al.) intitulada METHOD AND SYSTEM FOR BIO-METRIC VOICE PRINT AUTHENTICATION compreende a recepção de uma ou mais falas (utterances) de um usuário, reconhecimento de uma frase correspondente a uma das falas, identificação das características biométricas designadas como voiceprint do usuário a partir da frase recebida, determinação de um identificador de dispositivo associado e autenticação do usuário com base na frase, nas características biométricas e no identificador de dispositivo. A localização do terminal de comunicação ou do usuário poderá também ser utilizada como critério para permitir acesso a um ou mais serviços. Ditas características biométricas, ou voiceprint dependem da configuração anatômica do trato vocal, e são específicas do usuário. Como se verifica, o objeto desse documento se refere apenas à identificação do usuário sem considerar o conteúdo da transação. A publicação WO2012050780 (Moganti et al.) intitulada METHOD AND APPARATUS FOR VOICE SIGNATURE AUTHENTICATION se refere a um método escalável para autenticação da assinatura de voz utilizada em diversos serviços tais como identificação de usuário (utilizada em serviços bancários), assinatura de voz utilizada como senha permitindo acesso a serviços remotos e recuperação de documentos bem como outros serviços via internet, tais como compras on-line. O sistema compreende um servidor da assinatura de voz VSS (Voice Signature Server) que recebe e armazena informações diversas tais como as características biométricas, amostras de voz analógicas ou digitalizadas, propriedades do canal de voz, informação relativa à chamada, dados de autenticação, informação de contexto (por exemplo: (i) localização, (ii) se está ou não em roaming), dados pessoais ou qualquer outra informação que possa servir de base para a criação de uma meta-assinatura de voz. O objetivo é, em última análise, o de confirmar a identidade do usuário sem, todavia, levar em conta o conteúdo específico da transação. Além disso, o método não utiliza reconhecimento de fala e, portanto, não checa o conteúdo que é falado.Understanding the state of the art various patent documents disclosing methods pertinent to the subject. Patent document EP1669836 A1 (Wang) where authentication is based on a system generated phrase. However, in this proposal, the generated phrase is a question based on the user's personal information; The user must speak the answer, which is then recognized by the system, which checks if it is correct. Therefore in Wang's patent proposal only the user is authenticated (not the transaction). In addition, a fraudster could have access to this personal information (socially engineered, for example), which is static and, with few user recordings, could circumvent the system. Publication W09900719 A1 (Van Compernolle et al.) “ACCESS-CONTROLLED COMPUTER SYSTEM WITH AUTOMATIC SPEECH RECOGNITION” discloses a system that controls a user's access using non-keyboard inputs. The idea is that the user can access a system using a different keyboard interface and then interact without using it. However, in this proposal access to the system is not done with a system generated phrase. The user says some information (for example, a password) that is checked for the sole purpose of freeing their access. Thus, the voiceover issued by the user does not feature a process signature. US2003037004A1 (Buffum et al.) “DIALOG-BASED VOICEPRINT SECURITY FOR BUSINESS TRANSACTONS” describes a system that allows users to authenticate a particular transaction by combining speech recognition and speaker recognition. The described architecture is of client-server type, where there is: an authentication server where the application in which the user intends to authenticate is executed; and a client running an applet that collects voice samples. The document does not define the type of device to capture, which may be a desktop, a phone, a mobile device, etc. The registration and user verification steps are described; and for each of the steps, the user interaction screens. Some features of the system revealed in this document are: (i) in the first registration step, the user reads his "account ID", and then reads the digits from 1 to 9; (ii) In the verification step, the user should not read data associated with the transaction he intends to perform, but rather a random digit sequence. That is, voice authentication does not work as a transaction signature and speech recognition is limited to reading digits. Publication W02006128171 (Di Mambro et al.) Entitled METHOD AND SYSTEM FOR BIO-METRIC VOICE PRINT AUTHENTICATION comprises receiving one or more utterances from a user, recognizing a phrase corresponding to one of the lines, identifying biometric characteristics designated as the user's voiceprint from the phrase received, determination of an associated device identifier, and user authentication based on the phrase, biometric characteristics, and device identifier. The location of the communication terminal or user may also be used as a criterion to allow access to one or more services. These biometric characteristics, or voiceprint, depend on the anatomical configuration of the vocal tract, and are user specific. As it turns out, the object of this document refers only to user identification regardless of the transaction content. Publication WO2012050780 (Moganti et al.) Entitled METHOD AND APPARATUS FOR VOICE SIGNATURE AUTHENTICATION refers to a scalable method for voice signature authentication used in various services such as user identification (used in banking), voice signature used as password allowing access to remote services and document retrieval as well as other services via the internet such as online shopping. The system comprises a voice signature server (VSS) that receives and stores miscellaneous information such as biometric characteristics, analog or digitized voice samples, voice channel properties, call information, authentication data, context (for example: (i) location, (ii) whether or not it is roaming), personal data, or any other information that may serve as a basis for creating a voice meta-signature. The goal is ultimately to confirm the user's identity without, however, taking into account the specific content of the transaction. In addition, the method does not use speech recognition and therefore does not check the content that is spoken.

Como se depreende, nenhum dos documentos analisados requer que o “significado” do sinal de fala de entrada esteja amarrado à operação a ser realizada pelo usuário.As it turns out, none of the analyzed documents requires that the “meaning” of the incoming speech signal be tied to the operation to be performed by the user.

Além disso, quando utilizado, o reconhecimento de fala é usado apenas como um mecanismo de “liveness detection”, e, embora em alguns casos com frase gerada dinamicamente, é bastante limitado quanto a seu conteúdo.Also, when used, speech recognition is used only as a liveness detection mechanism, and although in some cases with dynamically generated phrase, it is quite limited in its content.

Em resumo, nos documentos do estado da técnica não há relação entre a frase a ser lida e a transação a ser efetuada.In summary, in prior art documents there is no relationship between the sentence to be read and the transaction to be performed.

Objetivos da Invenção Tendo em vista o acima exposto, é o objetivo principal da presente invenção prover um método que possibilite incrementar a segurança nas transações.Objectives of the Invention In view of the foregoing, it is the principal object of the present invention to provide a method for enhancing transaction security.

Outro objetivo é prover um método que relacione dinamicamente os dados da transação com a frase gerada a ser lida pelo usuário.Another goal is to provide a method that dynamically relates transaction data to the generated phrase to be read by the user.

Ainda outro objetivo da presente invenção é prover uma autenticação por voz que atue como uma assinatura da transação.Still another object of the present invention is to provide voice authentication that acts as a transaction signature.

Mais outro objetivo da presente invenção é prover maior segurança à transação sem obrigar o usuário a memorizar senhas adicionais ou carregar objetos como cartões e tokens.Another object of the present invention is to provide greater transaction security without forcing the user to memorize additional passwords or carry objects such as cards and tokens.

Descrição Sumária da Invenção Como se verá na descrição que segue, há vantagens inerentes ao uso da biometria na autenticação, em comparação com as senhas alfanuméricas tradicionais, pois a mesma não pode ser roubada nem emprestada, não precisa ser memorizada e não pode ser inferida, descoberta ou adivinhada.Summary Description of the Invention As will be seen from the following description, there are inherent advantages to using biometrics in authentication compared to traditional alphanumeric passwords because it cannot be stolen or borrowed, need not be memorized and cannot be inferred, discovered or guessed.

Além disso, de acordo com a invenção que será apresentada a seguir há um único procedimento para autenticar simultaneamente tanto o usuário quanto a própria transação, o que representa uma verdadeira “assinatura” para a transação, garantindo a sua autenticidade.In addition, according to the invention to be set forth below there is a single procedure for simultaneously authenticating both the user and the transaction itself, which represents a true "signature" for the transaction, ensuring its authenticity.

Os objetivos acima são alcançados por meio da invenção, a seguir descrita, que compreende primordialmente um método que, em uma primeira etapa, de cadastro, o usuário tem uma ou mais amostras da sua voz captadas, sendo sua referência biométrica criada e armazenada pelo sistema.The above objectives are achieved by the invention described below, which primarily comprises a method that, in a first step of registration, the user has one or more samples of his voice captured, and his biometric reference is created and stored by the system. .

Em uma segunda etapa, quando desejar realizar uma transação, o método compreende um passo que o usuário lê em voz alta uma sentença construída e fornecida pelo sistema, com os dados da própria transação cujo texto é dinâmico, gerado durante a execução da transação, e o sistema grava essa leitura. A nova amostra de voz, fornecida com a leitura da sentença é, então, comparada com a referência biométrica previamente obtida, para checar se é do próprio usuário ou não, o que constitui o passo de verificação de locutor.In a second step, when wishing to perform a transaction, the method comprises a step where the user reads aloud a sentence constructed and provided by the system, with the transaction data itself whose text is dynamic, generated during the execution of the transaction, and the system records this reading. The new voice sample, provided with the sentence reading, is then compared with the previously obtained biometric reference to check whether it is the user's own or not, which constitutes the speaker verification step.

Em paralelo, por reconhecimento de fala, o sistema checa se foi falada exatamente a sentença gerada pelo sistema, o que serve não apenas para (i) evitar que uma gravação seja usada por fraudadores para burlar o sistema (uma forma de liveness detection, no jargão da comunidade de biometria), mas também para (ii) “assinar” a transação em questão, uma vez que o usuário apenas lerá a sentença de verificação se os dados estiverem corretos de modo que, futuramente, não possa ser alegado que a transação desejada não foi feita.In parallel, by speech recognition, the system checks that the exact sentence generated by the system has been spoken, which serves not only to (i) prevent a recording from being used by fraudsters to circumvent the system (a form of liveness detection in jargon from the biometrics community), but also to (ii) “sign” the transaction in question, as the user will only read the verification sentence if the data is correct so that in the future it cannot be claimed that the transaction desired was not made.

Além disso, os dados dinâmicos e específicos da transação são “recortados” da gravação e passam por uma nova verificação biométrica, reduzindo as chances de que gravações parciais sejam utilizadas por fraudadores.In addition, dynamic and transaction-specific data is “trimmed” from the write and re-biometrically verified, reducing the chances of partial writes being used by fraudsters.

Descrição da Figura A invenção será mais bem compreendida a partir da descrição detalhada que segue e da figura que a ela se relaciona, em que: A Figura 1 é um fluxograma que ilustra a etapa de realizar uma transação, e respectivos passos.Description of the Figure The invention will be better understood from the following detailed description and the related figure, wherein: Figure 1 is a flowchart illustrating the step of performing a transaction and its steps.

Descrição Detalhada de uma Realização Preferencial Como já comentado na descrição geral, a presente invenção provê um método que, em uma primeira etapa, de cadastro, o usuário tem uma ou mais amostras da sua voz captadas e sua referência biométrica é criada e armazenada pelo sistema.DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT As already mentioned in the general description, the present invention provides a method that, in a first step of registration, the user has one or more samples of his voice captured and his biometric reference is created and stored by the system. .

Em uma etapa posterior, quando deseje realizar uma transação, o método propõe que o usuário leia em voz alta uma sentença, construída e fornecida pelo sistema, com os dados da própria transação cujo texto é dinâmico, gerado durante a execução da transação, e o sistema grava essa leitura. O método da invenção caracteriza-se por compreender, os seguintes passos essenciais: - Introduzir os dados da transação Através de uma interface homem-máquina (página web ou um aplicativo, dentre outros), o usuário insere os dados da transação desejada (por exemplo, mediante digitação das características da transação em um teclado, navegação com o mouse, telas de uma loja virtual, reconhecimento de fala, NFC, ZigBee, WiFi, Bluetooth ou qualquer outro meio de comunicação homem-máquina). Este passo está representado pela etapa 1 da figura 1. - Gerar sentença com os dados da transação O sistema, com base nos dados inseridos pelo usuário, cria automaticamente uma sentença. Este passo está representado pela etapa 2 da figura 1. - Ler em voz alta a sentença gerada O usuário grava a leitura da sentença gerada pelo sistema e exibida para ele. Este passo está representado pela etapa 3 da figura 1. - Reconhecer o locutor por comparação com dita referência biométrica Através de algoritmos computacionais (por exemplo, mistura de gaussianas, redes neurais artificiais ou cadeias ocultas de Markov, dentre outros), o sinal de fala gravado é processado e comparado com a referência biométrica armazenada correspondente ao usuário em questão. A resposta pode ser tanto na forma de uma decisão binária (sim/não) ou na forma de uma pontuação que expressa uma medida de confiança que aquele sinal de fala foi gerado ou não pelo usuário. Este passo está representado na etapa 4 da figura 1. - Reconhecer a fala por comparação com a sentença gerada Através de algoritmos computacionais (como redes neurais artificiais, cadeias de Markov, deep belief network, entre outros), o sinal de fala gravado é processado e o conteúdo que foi falado é extraído e comparado com a sentença gerada pelo sistema, A resposta pode ser tanto na forma de uma decisão binária (sim/não) ou na forma de uma pontuação que expressa uma medida de confiança no texto reconhecido a partir do sinal de fala, Essa confiança pode ser medida individualmente para cada evento acústico do sinal (fala/palavra, pausa/silêncio etc.). Além disso, o sistema de reconhecimento de fala retoma a duração de cada evento desses, permitindo a fragmentação do sinal para extrair partes de interesse. Este passo está representado na etapa 5 da figura 1. - Extrair os dados dinâmicos e específicos da transação, componentes do áudio Com base na resposta dos dois itens anteriores, o sistema “recorta” o sinal de fala original, separando as partes de interesse, ou seja, aquelas que se referem especificamente à transação. Este passo está representado na etapa 6 da figura 1. - Verificar biometricamente os segmentos de dados extraídos Cada uma dessas partes de interesse é também individualmente verificada pelo sistema de verificação de locutor, comparando esses trechos do sinal com a referência biométrica armazenada correspondente ao usuário em questão. A resposta pode ser tanto na forma de uma decisão binária (sim/não) ou na forma de uma pontuação que expressa uma medida de confiança que aquele trecho do sinal de fala foi gerado ou não pelo usuário. Este passo está representado na etapa 7 da figura 1.In a later step, when a transaction is desired, the method proposes that the user read aloud a sentence, constructed and provided by the system, with the transaction data itself whose text is dynamic, generated during the execution of the transaction, and the system records this reading. The method of the invention is characterized by comprising the following essential steps: - Entering transaction data Through a human machine interface (web page or an application, among others), the user enters the desired transaction data (eg , by typing the transaction characteristics on a keyboard, mouse navigation, webshop screens, speech recognition, NFC, ZigBee, WiFi, Bluetooth, or any other human-machine media). This step is represented by step 1 of figure 1. - Generate Sentence with Transaction Data The system, based on user-entered data, automatically creates a sentence. This step is represented by step 2 of figure 1. - Read out the generated sentence The user records the reading of the sentence generated by the system and displayed to him. This step is represented by step 3 of figure 1. - Recognizing the speaker by comparison with said biometric reference Through computational algorithms (eg, Gaussian mix, artificial neural networks or Markov hidden chains, among others), the speech signal The recorded data is processed and compared to the stored biometric reference corresponding to the user in question. The answer can be either in the form of a binary decision (yes / no) or in the form of a score that expresses a measure of confidence that the speech signal was generated or not by the user. This step is represented in step 4 of figure 1. - Recognize speech by comparison with the generated sentence Through computational algorithms (such as artificial neural networks, Markov chains, deep belief network, among others), the recorded speech signal is processed and the content that has been spoken is extracted and compared to the sentence generated by the system. The answer can be either in the form of a binary decision (yes / no) or in the form of a punctuation that expresses a measure of confidence in the text recognized from This confidence can be measured individually for each acoustic event of the signal (speech / word, pause / silence etc.). In addition, the speech recognition system resumes the duration of each such event, allowing signal fragmentation to extract parts of interest. This step is represented in step 5 of figure 1. - Extract the transaction-specific and dynamic data, audio components Based on the answer of the previous two items, the system “cuts” the original speech signal, separating the parts of interest, that is, those that specifically refer to the transaction. This step is represented in step 6 of Figure 1. - Verify the extracted data segments biometrically Each of these parts of interest is also individually verified by the speaker verification system, comparing these signal segments with the stored biometric reference corresponding to the user in question. question. The answer can be either in the form of a binary decision (yes / no) or in the form of a punctuation that expresses a measure of confidence that that portion of the speech signal was generated or not by the user. This step is represented in step 7 of figure 1.

No cenário da figura 1, caso o reconhecimento de locutor ou de fala falhe, a transação não é efetivada. É óbvio que este comportamento pode ser alterado, para que, por exemplo, todo o processo seja repetido até que um determinado número máximo de tentativas seja atingido antes da transação ser negada. Outras alternativas poderíam ser pensadas sem alterar o escopo da invenção.In the scenario of figure 1, if speaker or speech recognition fails, the transaction is not executed. It is obvious that this behavior can be changed so that, for example, the entire process is repeated until a certain maximum number of retries are reached before the transaction is denied. Other alternatives could be thought of without altering the scope of the invention.

Como exemplo considere-se o caso de uma transação bancária em que o usuário deseja realizar uma transferência pelo Internet Banking ou equivalente. O usuário primeiro acessa o site do seu banco com sua identidade de usuário e senha.As an example, consider the case of a bank transaction where the user wishes to make a transfer via Internet Banking or equivalent. The user first accesses their bank's website with their username and password.

Em seguida, digita os dados da transferência que deseja fazer como, por exemplo, o valor, a agência e a conta de destino. O sistema então, segundo o método da invenção, monta, com esses dados, uma sentença para ser lida pelo usuário, como, por exemplo: “Transfira X reais para a conta A, agência B.” O sistema então grava a leitura dessa frase, feita pelo usuário para autenticar essa operação. A amostra de voz é analisada biometricamente para verificar se é a voz do usuário cuja conta está sendo acessada. Essa mesma amostra também passa por um sistema de reconhecimento de fala, que checa se o que foi falado corresponde à sentença fornecida.Then enter the transfer data you want to make, such as the amount, agency, and destination account. The system then, according to the method of the invention, assembles with this data a sentence to be read by the user, such as: “Transfer X reais to account A, agency B.” The system then records the reading of this sentence. , made by the user to authenticate this operation. The voice sample is biometrically analyzed to verify that it is the voice of the user whose account is being accessed. This same sample also goes through a speech recognition system, which checks if what was said matches the sentence given.

Em seguida, e opcionalmente apenas se houver sucesso nas verificações biométricas e de fala, o sistema utiliza o resultado do reconhecimento de fala para segmentar a amostra de fala e obter apenas os trechos dinâmicos e específicos da operação, no exemplo, “X”, “A” e “B”. Esses três trechos são também, cada um deles individualmente, verificados pela biometria.Then, and optionally only if the biometric and speech checks are successful, the system uses the speech recognition result to segment the speech sample and obtain only the dynamic and operation-specific sections, for example, “X”, “ A ”and“ B ”. These three passages are also each individually verified by biometrics.

Considere como outro exemplo o caso de uma loja virtual em que o usuário deseja realizar a compra de um produto através da internet.Consider as another example the case of an online store where the user wants to buy a product over the internet.

Ele primeiro acessa o site da loja virtual com sua identidade de usuário e senha.He first accesses the online store website with his username and password.

Em seguida, seleciona um item disponível no site para compra e seleciona uma opção de confirmar a compra, especificando, por exemplo, a forma de pagamento (débito ou crédito). Dita seleção pode ser feita, por exemplo, mediante o teclado, mediante fala ou mediante a navegação com um mouse em uma ou mais telas da loja virtual. O sistema então, segundo o método da invenção, monta, com esses dados, uma sentença para ser lida pelo usuário, como, por exemplo: “Comprar item X, no valor de Y reais, com opção de pagamento Z.” Assim como no exemplo anterior, o sistema grava a leitura da frase feita pelo usuário para autenticar a operação. A amostra de voz passa por um sistema de reconhecimento de fala que checa se o que foi falado corresponde à sentença fornecida e também é analisada biometricamente para verificar se é realmente o dono da conta acessada que está realizando a transação.Then, select an item available on the site for purchase and select an option to confirm the purchase, specifying, for example, the form of payment (debit or credit). Such selection can be made, for example, by means of the keyboard, by speaking or by navigating with a mouse in one or more screens of the online store. The system then, according to the method of the invention, assembles, with this data, a sentence to be read by the user, such as: “Buy item X, in the value of real Y, with payment option Z.” As in In the previous example, the system records the reading of the phrase made by the user to authenticate the operation. The voice sample goes through a speech recognition system that checks if what has been said matches the given sentence and is also biometrically analyzed to see if it is actually the owner of the accessed account that is performing the transaction.

Em seguida, e também da mesma forma que no exemplo anterior, os trechos “X”, Ύ” e “Z” também podem ser verificados pela biometria.Then, and also as in the previous example, the sections “X”, Ύ ”and“ Z ”can also be verified by biometrics.

Outro exemplo seria o caso de uma transação via NFC (Near Field Communication) realizada por um aplicativo instalado em um smartphone em que o usuário deseja realizar a compra de um produto.Another example would be the case of a Near Field Communication (NFC) transaction performed by an application installed on a smartphone where the user wishes to purchase a product.

Ele primeiro realiza o cadastro de seus dados pessoais, incluindo, por exemplo, os dados do cartão de crédito. Nesse processo de cadastramento, o usuário também realiza o cadastro biométrico da sua voz. Todos esses dados ficam armazenados em uma área segura do aplicativo no smartphone.He first performs the registration of his personal data, including, for example, credit card data. In this registration process, the user also performs the biometric registration of his voice. All this data is stored in a secure area of the app on the smartphone.

Após realizar o cadastro, o usuário está apto para realizar uma compra via NFC. Para isso, basta o usuário aproximar um item disponível para compra, contendo uma etiqueta com código NFC, de seu smartphone. O aplicativo então, segundo o método da invenção, monta, com os dados do item que está sendo comprado, uma sentença para ser lida pelo usuário, como, por exemplo: “Compra do item A, no valor de B reais, com opção de pagamento C,” O aplicativo do smartphone grava a leitura dessa frase, feita pelo usuário para autenticar essa compra. Assim como nos exemplos anteriores, esta amostra de voz passa por um sistema de reconhecimento de locutor e de fala e, se houver sucesso nestas verificações, a amostra de fala pode ser segmentada e os trechos “A”, “B” e “C” podem ser verificados separadamente. Então, o aplicativo do smartphone entra em contato com a loja para enviar os dados relativos à transação. A realização dos cenários apresentados é de caráter meramente exemplificativo, não limitante do escopo da invenção. Assim, o acesso ao site de um banco, loja virtual ou instituição equivalente poderá ser feito através de qualquer dispositivo de comunicação como, por exemplo, um computador, um smartphone, um tablet, um totem bancário, ou equivalente.After registering, the user is able to make a purchase via NFC. For this, the user simply has to approach an item available for purchase, containing an NFC code tag, from your smartphone. The application then, according to the method of the invention, assembles, with the data of the item being purchased, a sentence to be read by the user, such as: “Purchase of item A, in the value of B reais, with option of payment C, ”The smartphone app records the reading of this phrase by the user to authenticate this purchase. As in the previous examples, this voice sample goes through a speaker and speech recognition system and, if successful in these checks, the speech sample can be segmented and sections “A”, “B” and “C” can be checked separately. Then the smartphone app contacts the store to send the transaction details. The realization of the presented scenarios is merely exemplary, not limiting the scope of the invention. Thus, access to the website of a bank, online store or equivalent institution may be through any communication device such as a computer, a smartphone, a tablet, a bank totem, or equivalent.

Outrossim, a introdução dos dados da operação poderá ser feita mediante digitação, fala, navegação em telas do site da instituição mediante o mouse ou tela de toque, comunicação nos padrões ZigBee, WiFi, bluetooth ou aproximação de um item contendo a etiqueta NFC, dentre outros.In addition, the operation data can be entered by typing, speaking, browsing the institution's website through mouse or touch screen, ZigBee, WiFi, bluetooth communication or approaching an item containing the NFC tag, among others. others.

Assim sendo, para um técnico na matéria, muitas variações da invenção serão possíveis, sem se afastar do escopo da invenção, o qual será definido pelas reivindicações que seguem.Accordingly, for one skilled in the art, many variations of the invention will be possible, without departing from the scope of the invention, which will be defined by the following claims.

REIVINDICAÇÕES

Claims

1. Biometric recognition transaction authentication method, comprising a prior registration step, in which one or more samples of the user's voice are captured and their biometric reference is created and stored, comprising both user authentication and the transaction itself by performing the following steps: - Enter the transaction data into a communication device; - Automatically generate a sentence based on transaction data; - Read aloud the generated sentence; - Recognize the speaker by comparison with said biometric reference; - Recognize speech by comparison with the generated sentence.

Method according to claim 1, characterized in that it further comprises the steps of: - Extracting dynamic and transaction-specific data, audio components; - Biometrically check the extracted data segments.

Method according to claim 1 or 2, characterized in that the generated sentence is displayed on the communication device screen.

Method according to claim 1, 2 or 3, characterized in that the speaker recognition comprises capturing the reading, aloud, of the sentence displayed on the screen containing the transaction data and, by means of computational algorithms, comparing the biometric characteristics of the voice captured with the previously registered biometric reference.

Method according to claim 1, 3 or 4, characterized in that speech recognition comprises the transcription by computational algorithms of the sentence read and its comparison with the sentence containing the transaction data.

Method according to any one of claims 1 to 5, characterized in that the answer given by the method is a binary decision.

Method according to any one of claims 1 to 5, characterized in that the answer given by the method is expressed as a measure of confidence.