CN109983491B

CN109983491B - Method and apparatus for applying artificial intelligence to money collection by using voice input

Info

Publication number: CN109983491B
Application number: CN201780071950.5A
Authority: CN
Inventors: 俞承学; 金旼序; 黄寅喆
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2016-11-21
Filing date: 2017-11-21
Publication date: 2023-12-29
Anticipated expiration: 2037-11-21
Also published as: CN109983491A; KR102457811B1; EP3533015A4; EP3533015A1; KR20180057507A

Abstract

An example apparatus includes a memory configured to store at least one program; a microphone configured to receive speech; and at least one processor configured to execute at least one program to control the apparatus to perform operations for collecting money to a payee. The operations include determining a payment intent of the user based on analyzing the received voice input; retrieving contact information from a stored contact list based on the name of the payee; transmitting the name of the payee and the contact information to the bank server together with the amount specified in the voice input; receiving money transfer details from a bank server; and approves the money transfer details. The device may analyze the received speech input by using an Artificial Intelligence (AI) algorithm.

Description

Method and apparatus for applying artificial intelligence to money collection by using voice input

Technical Field

The present disclosure relates generally to methods and apparatus for pooling money using voice input.

The present disclosure also relates to an Artificial Intelligence (AI) system and applications thereof that use machine learning algorithms to simulate functions such as identification and determination of a human brain.

Background

With the development of multimedia technology and network technology, users can receive various services using devices. In particular, with the development of speech recognition technology, a user may input his or her speech to a device, and the device may operate according to the user's speech (e.g., according to a command uttered by the user).

The user may access the financial service using a device that executes an application provided by the bank. For example, the user may collect money to the payee's account by using the device. The user may execute an application, enter an account number, a password, etc., and collect money to the payee's account.

Also, in recent years, an Artificial Intelligence (AI) system that realizes human-level intelligence has been used in various fields. The AI system is a machine learning system that, unlike existing rule-based systems, can learn itself, make decisions, and become "smarter". AI systems can provide improved recognition rates and more accurate understanding of user preferences when in use, and thus existing rule-based systems are increasingly being replaced by deep-learning based AI systems.

AI techniques include machine learning (e.g., deep learning) and element techniques that utilize machine learning (element technology).

Machine learning is an algorithmic technique in which the machine itself classifies/learns the characteristics of input data. Elemental technology is a technology that uses machine learning algorithms (such as deep learning) to simulate functions such as recognition and determination of the human brain, and includes technical fields such as language understanding, visual understanding, reasoning/prediction, knowledge expression, motion control, and the like.

AI technology has been applied in various fields. Language understanding is a technique for recognizing and applying/processing human language/characters and includes natural language processing, machine translation, dialog systems, query/response, speech recognition/synthesis, and the like. Visual understanding is a technique for identifying and processing objects as human vision and includes object identification, object tracking, image searching, human identification, scene understanding, spatial understanding, image enhancement, and the like. Inference/prediction is a technique for determining and logically reasoning and predicting information, and includes knowledge/probability-based reasoning, optimization prediction, preference-based planning, recommendation, and the like. Knowledge representation is a technique for automating human experience information into knowledge data, and includes knowledge construction (generation/classification of data), knowledge management (utilization of data), and the like. Motion control is a technique for controlling autonomous travel of a vehicle and movement of a robot, and includes motion control (navigation, collision, and travel), operation control (behavior control), and the like.

Disclosure of Invention

A method and apparatus for collecting money to a payee's account by using voice are provided.

Drawings

These and/or other aspects, features, and attendant advantages of the present disclosure will become apparent and more readily appreciated from the following detailed description when taken in conjunction with the accompanying drawings, wherein like reference numerals designate like elements, and wherein:

fig. 1 is a diagram illustrating a method in which a user collects money by using a user's voice according to an example embodiment;

FIG. 2 is a block diagram illustrating an apparatus according to an example embodiment;

FIG. 3 is a diagram illustrating an apparatus for learning modes according to an example embodiment;

FIG. 4 is a diagram illustrating a method of approving money transfer details according to an example embodiment;

FIG. 5 is a diagram illustrating a method of selecting one of a plurality of payees according to an example embodiment;

FIG. 6 is a diagram illustrating a method of selecting any one of a plurality of banks according to an example embodiment;

FIG. 7 is a flowchart illustrating a method of collecting money by using voice according to an example embodiment;

fig. 8 is a diagram illustrating a method of payment by using voice according to another example embodiment;

fig. 9 is a diagram illustrating an apparatus for learning a payment mode according to an example embodiment;

FIG. 10 is a flowchart illustrating a method of paying using voice according to an example embodiment;

FIG. 11 is a block diagram of a processor according to some example embodiments;

FIG. 12 is a block diagram of a data learner according to some example embodiments;

FIG. 13 is a block diagram of a data identifier according to some example embodiments;

FIG. 14 is a diagram illustrating an example of learning and identifying data through interactions between a device and a server according to some example embodiments; and

fig. 15 and 16 are flowcharts of network systems using data recognition models according to some example embodiments.

Detailed Description

Best mode for carrying out the invention

Additional aspects will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosed embodiments.

According to an aspect of the example embodiment, an apparatus comprises: a memory configured to store at least one program; a microphone configured to receive a voice input; and at least one processor configured to execute at least one program to perform operations for collecting money to a payee, wherein the operations include determining a user's intent to pay based on analyzing the received voice input; retrieving contact information from a stored contact list based on the name of the payee; transmitting the name of the payee and the contact information to the bank server together with the amount specified in the voice input; receiving money transfer details from a bank server; and approves the money transfer details.

According to an aspect of another example embodiment, a payment method includes receiving a voice input of a user; determining a user's intent to pay based on analysis of the received voice input; retrieving contact information from a stored contact list based on the name of the payee specified in the voice input; transmitting the payee's name and contact information to a bank server along with the amount specified in the voice input; receiving money transfer details from a bank server; and approves the money transfer details.

MODE OF THE INVENTION

Reference will now be made in detail to various non-limiting embodiments, examples of which are illustrated in the accompanying drawings. In the drawings, parts irrelevant to the description are omitted to clearly describe example embodiments, and like reference numerals denote like elements throughout the specification. In this regard, the example embodiments may take different forms and should not be construed as limited to the descriptions set forth herein. Accordingly, example embodiments are described below to explain aspects of the present disclosure by referring to the figures. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. An expression such as "at least one of …" modifies the entire list of elements without modifying individual elements of the list when it follows the list of elements.

Throughout this disclosure, when a certain component is described as being "connected" to another component, it should be understood that the certain component may be "directly connected" to another component or "electrically connected" to another component via another element in between. Furthermore, when an element is "comprising" an element, unless another description to the contrary is present, it should be understood that the element does not exclude another element, and may further comprise another element.

Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

Fig. 1 is a diagram illustrating a method in which a user collects money by using the user's voice according to an example embodiment. Referring to fig. 1, a user may input his or her voice to the device 10 by speaking (e.g., to a microphone) in order to transfer money to a payee. In particular, the user may transfer money to the payee by speaking only the name of the payee without speaking or inputting the account number of the payee.

The device 10 may receive voice input from a user. The apparatus 10 may include a microphone that receives user speech. The device 10 may receive voice input of a user via a microphone by executing, for example, a voice assistant application (such as "S-voice") and controlling the executed application.

As shown in item 1 of fig. 1, the apparatus 10 may recognize the voice of the user. The apparatus 10 may analyze the speech to determine the intent of the user. For example, if the device receives a voice input that the user says "sink 1 million korean to three stars," the device 10 may determine from the user's voice whether the user intends to sink money. In an example embodiment, the apparatus 10 may store in memory the entire user voice input at the time of the user money transfer and use the stored information to learn the pattern of the voice input at the time of the money transfer. The apparatus 10 may determine the user's intent more accurately through learning. At the beginning of learning, the device 10 may confirm whether to collect money when the user's voice is input. The apparatus 10 can more accurately determine the transmission intention of the user by repeating learning.

As an example, the apparatus 10 may compare the stored speech pattern with the pattern of the input speech to determine the user's intent. The stored voice patterns may include patterns of user voice input when the user intends to transfer money. If the stored voice pattern is similar or identical to the pattern of the input voice (e.g., the similarity equals or exceeds a threshold similarity), the device 10 may determine that the user intends to transfer money. The stored patterns of speech may be updated or added by learning.

The device 10 may confirm the payee's name or title and search for names or titles stored in the contact list. For example, if the user enters the payee as "samsung," the device 10 may search for "samsung" in the contact list. For example, the device 10 may confirm the telephone number of "samsung" in the contact list.

As shown in item 2 of fig. 1, the apparatus 10 may transmit the user information, the payee information, and the amount to the bank server 20. The user information includes, but is not limited to, the name of the user, an account number, etc. The payee information includes, but is not limited to, the payee's name, phone number, and the like. The payee information may not include the payee's account number. The amount indicates the amount specified in the user's voice input and is the amount the user will remit to the payee.

The device 10 may be, but is not limited to, a smart phone, tablet PC, smart television, mobile phone, personal Digital Assistant (PDA), laptop computer, media player, micro-server, global Positioning System (GPS) device, electronic book terminal, digital broadcast terminal, navigation system, self-service terminal (kiosk), MP3 player, digital camera, consumer electronics, and other mobile or non-mobile computing devices. The device 10 may also be a wearable device such as, but not limited to, a wristwatch, glasses, hair band, ring, etc. having communication and data processing functions. The device 10 may comprise any kind of device capable of receiving a user's voice input and providing a reply message to the user.

In addition, the device 10 may communicate with other devices (not shown) through a network in order to use various types of context information. The network may include a Local Area Network (LAN), a Wide Area Network (WAN), a Value Added Network (VAN), a mobile radio communication network, a satellite communication network, and/or a combination thereof, may be a data communication network for allowing respective network elements to smoothly communicate with each other in an integrated sense, and may include a wired internet, a wireless internet, and a mobile wireless communication network. The wireless communication may include, for example, wi-Fi, bluetooth low energy, zigBee, wi-Fi direct (WFD), ultra Wideband (UWB), infrared data association (IrDA), near Field Communication (NFC), and the like, but is not limited thereto.

As shown in item 3 of fig. 1, the bank server 20 may receive user information and payee information. The bank server 20 may search for an account number that matches the user information. The bank server 20 may search for the user's account number by using, for example, the user's name and telephone number. In addition, the bank server 20 may search for an account number assigned (or matched) to the unique identification information of the device 10. The device 10 may contain unique identification information and the bank server 20 may use the unique identification information of the device 10 to search an account database for the account number of the user of the device 10. The bank server 20 may also search for an account number that matches the payee information. For example, the bank server 20 may search for an account number that matches the payee's name and phone number.

As shown in item 4 of fig. 1, the bank server 20 may generate money transfer details. The bank server 20 may generate money transfer details including, but not limited to, the user's account number, the payee's name, the payee's account number, and the amount. For example, the bank server 20 may generate remittance details of "remittance 1 ten thousand korea from banks a,11-1111 (account number) and AAA (user name) to banks B,22-2222 (account number) and BBB (payee name)".

The bank server 20 may send money transfer details to the device 10.

The device 10 may display money transfer details. The apparatus 10 may display remittance details to allow the user to confirm whether the intent of the user's voice input and remittance details coincide with each other.

The user may approve the money transfer details. If the user wants to transfer money according to money transfer details, the user may input, for example, one or more of voice, fingerprint, iris scan, vein image, face image, and password. The device 10 may verify by determining whether the input voice, fingerprint, iris scan, vein image, facial image, and/or password match the user's personal information. Such verification of money transfer details is shown in item 5 of FIG. 1.

As shown in item 6 of fig. 1, the apparatus 10 may transmit the authentication result to the bank server 20.

As shown in item 7 of fig. 1, the bank server 20 may receive the verification result and collect money to the payee according to the received verification result of the verification of the money transfer details. If the user is authenticated as a legitimate user, the bank server 20 may collect money to the payee (and optionally send an acknowledgement to the device 10 that money was collected), and otherwise may not collect money and send an error message to the device 10.

Fig. 2 is a block diagram illustrating an apparatus 10 according to an example embodiment. Referring to fig. 2, the apparatus 10 may include a processor 11, a memory 12, a display 13, and a microphone 14.

The processor 11 (e.g., comprising processing circuitry such as a CPU and/or dedicated hardware circuitry) may control the overall operation of the device 10 including the memory 12, the display 13, and the microphone 14. The processor 11 may control the storage of data into the memory 12 and/or the reading of data from the memory 12. The processor 11 may determine an image to be displayed on the display 13 and may control the display 13 to display the image. The processor 11 may control the microphone 14 to be turned on/off and analyze (e.g., by executing a voice analysis application) the voice input through the microphone 14.

The memory 12 (e.g., ROM, RAM, memory card, nonvolatile, volatile, solid state, hard disk, etc.) may store personal information, biometric information, etc. of the user. For example, the memory 12 may store, but is not limited to, a user's voice, fingerprint, iris scan, vein image, facial image, and/or password. The memory 12 may store samples of the user's voice and/or prior voice inputs for analyzing patterns of the user's voice.

A display 13 (e.g., LCD, OLED, etc.) may display images and reproduce video content under the control of the processor 11.

Microphone 14 may receive voice input. Microphone 14 may contain circuitry to convert sound (e.g., voice input) generated in the periphery of device 10 into an electrical signal and output the electrical signal to processor 11.

Fig. 3 is a diagram of an apparatus 10 illustrating a learning mode according to an example embodiment. Referring to fig. 3, the apparatus 10 may, for example, execute a speech analysis application for analyzing various types of sentences and based on this learning mode.

The user may speak various types of sentences to collect money. For example, to sink 1 million korea from a user's bank account to three stars (payees), the user may say the following types of sentences:

1. from "A" bank account to Sanxinghui 1 Yi Hanyuan

2. 'Hui' three stars 1 Yi Hanyuan

3. Hui 1 Yi to three stars

The device 10 may analyze and learn patterns of the user's voice to identify sentences that contain the user's intent to transfer money.

When the user has multiple accounts, the device 10 may confirm with the user the account from which money was drawn from the multiple accounts. Once an account is designated, money transfer initiated using the device 10 may be from that point on to withdraw money from the designated account unless there is a different instruction from the user.

Fig. 4 is a diagram illustrating a method of approving money transfer details according to an example embodiment. The user may approve the money transfer details using, but not limited to, voice input, a fingerprint, a vein image, a facial image, or an iris scan.

The device 10 may receive money transfer details from the bank server 20 and display the money transfer details on its display 13. The money transfer details may include, but are not limited to, the user's account number, the payee's account number, and the amount.

The user may approve the money transfer details after visually confirming the displayed money transfer details. When the user approves the money transfer details, the user may use voice input, a fingerprint, a vein image, a face image, or an iris scan. If the input voice, fingerprint, vein image, facial image, or iris matches (e.g., has a similarity equal to or exceeding a predetermined similarity threshold), the user's voice input, fingerprint, vein image, facial image, or iris scan as reflected in the information stored in the device's memory 12, the device 10 may send a message to the bank server 20 indicating that the money transfer details are approved.

Fig. 5 is a diagram illustrating a method of selecting one of a plurality of payees according to an example embodiment. The user may select any one of the plurality of payees through, for example, voice input.

The device 10 may search the contact list stored in the memory 12 (or some other external memory) for a name identified as a payee. If multiple payees are found in the contact list that contain the identified names, the apparatus 10 may display the names of the multiple found payees on the display 13. The user can select any one of the displayed names through voice input.

The following two payee cases are found by way of example with the name of samsung.

1. Three-star electron

2. Sanxingjingsu Co Ltd

The device 10 may display two payees on the display 13. The user may select the first payee or the second payee through voice input. For example, the user may select the payee by inputting a voice such as "transfer money to the first" or "transfer money to samsung electronics". If the display 13 is configured as a touch screen, the user may select the payee using a touch input.

Fig. 6 is a diagram illustrating a method of selecting any one of a plurality of banks according to an example embodiment. The user may use voice input to select any bank (or account) from a plurality of banks (or accounts).

When sending money transfer details to the device 10, the bank server 20 may send a plurality of banks (or account numbers) registered in the name of the payee. For example, if there are multiple accounts registered in the payee's name, the device 10 may display the accounts to the user on the display 13 for the user to determine which account to send money to. As described above, the user can select any one of the displayed account numbers through voice or touch input.

For example, the following two accounts can be found under the name Samsung.

1. Bank A (33-3333)

2. Bank B (55-5555)

The device 10 may display two accounts on the display 13. The user may select the first account or the second account through voice input. For example, the user may select a bank or account number by speaking, for example, "collect money to bank A", "collect money to the first" or "collect money to the 55-5555 account" to provide voice input.

Fig. 7 is a flowchart illustrating a method of collecting money by using voice input according to an example embodiment. Referring to fig. 7, a user may input a name and an amount of a payee through voice input and collect money to the payee.

In operation 710, the apparatus 10 may receive a user voice input to the microphone 14.

In operation 720, the device 10 may analyze the received voice input to determine an intent of the user to transfer money. As a result of analyzing the received voice, if it is determined that there is no intention to collect money, the apparatus 10 does not perform a process for collecting money. The voice input may include the name and amount of the payee, etc. For example, the device 10 may analyze the voice input and determine that the user intends to transfer money if instructions, names, amounts, etc. are contained in the voice input.

In operation 730, the apparatus 10 may search the stored contact list for a contact corresponding to the name of the payee. If no contact corresponding to the payee name is found, the device 10 may display information on the display 13 indicating that the contact was lost/found. The user may input the payee's contact information through voice input. The apparatus 10 may store the name of the payee and the corresponding contact in a contact list based on the input voice.

In operation 740, the apparatus 10 may transmit the name of the payee and the contact information to the bank server 20 along with the amount included in the voice input. The contact information may be searched for by the name of the payee or entered via voice input by the user.

In operation 750, the apparatus 10 may receive money transfer details from the bank server 20. The bank server 20 may search the payee's account number by using the payee's name and contact information and send money transfer details to the device 10, including but not limited to the payee's name, account number, and amount.

In operation 760, the device 10 may approve (verify) the money transfer details. Device 10 may approve the money transfer details using, but not limited to, the user's voice input, fingerprint, iris scan, vein image, facial image, and/or password. The user may confirm the money transfer details and input to the device 10 using voice to approve the money transfer details or allow the device 10 to identify an iris, fingerprint, etc. Further, when a user wears a wearable device such as a smart watch, the user can verify by using the smart watch using veins on the back of the user's hand. For example, the user may manipulate the smart watch to identify veins in the back of the hand and verify to approve money transfer details.

Fig. 8 is a diagram illustrating a method of paying using voice input according to another example embodiment. Referring to fig. 8, a user may pay by using voice input.

The device 10 may display a screen on the display 13 for paying for goods or services purchased by the user over the internet. For example, when a user purchases Galaxy Note 7, device 10 may display a message "do you want to purchase Galaxy Note 7? "

After checking the payment details, the user may provide voice input to make the payment. For example, as shown in item 1 of FIG. 8, when a user inputs "pay with three star card," the device 10 may recognize the user's voice. The user may simply enter "pay" and the device 10 may proceed with the payment by using the card that the user previously used for the payment.

As shown in item 2 of fig. 8, the device 10 may transmit the card information and payment information of the user to the card issuer server 30. The user's card information may include a card number, a validity period of the card, a password, and the like. The payment information may include goods or services to be paid for, seller information, and the like.

As shown in item 3 of fig. 8, the issuer server 30 may confirm the card information and proceed with the payment. When payment is complete, issuer server 30 may send a payment complete message to device 10. The device 10 may display a payment complete message to inform the user that the payment has been completed normally.

As an example, if a user wears a smart watch and the user pays for goods or services, the smart watch may automatically perform biometric verification on the user. For example, if a user wears a smartwatch, the smartwatch may capture veins of the user's wrist and perform vein verification by the pattern of the captured veins. Thus, the user can automatically pay through the smart watch without separately inputting voice, password, etc. More specifically, when the user touches the pay button on the internet, the device 10 may determine whether the user is wearing a smart watch. If the user wears a smart watch, the device 10 may send a signal to the smart watch for vein verification. The smart watch may capture the vein of the user under the control of the device 10 and send the result of the vein verification to the device 10. In addition, the smart watch may send the captured image of the vein to the device 10, and the device 10 may perform vein verification. Vein verification may compare a registered vein image (or vein pattern) to a captured vein image (or vein pattern). While the user is wearing the wearable device, the device 10 may proceed with the payment without receiving separate input from the user.

Fig. 9 is a diagram illustrating an apparatus 10 for learning a payment mode according to an example embodiment. Referring to fig. 9, the apparatus 10 may analyze a payment pattern by analyzing various types of sentences. Learning a payment pattern may mean identifying and recording one type of voice input spoken by a user at the time of payment.

The user may speak various types of paid statements. For example, the user may speak the following types of sentences.

1. Payment using a samsung card

2. Please pay using a samsung card

3. Payment using my card

4. Continuing to pay

The device 10 may store in the memory 12 expressions that are mainly (most often) spoken by the user at the time of payment and determine whether the user speaks the same or similar statement as the stored statement and proceed with the payment.

The apparatus 10 may register or request card information of the user at the start of learning so as to obtain card information mainly used by the user. When registering the card information of the user, the apparatus 10 can continue payment by using the previously registered card information of the user even if the user simply "pays with my card".

Fig. 10 is a flowchart illustrating a method of paying through voice input according to an example embodiment. Referring to fig. 10, a user may pay for goods or services by using voice input.

In operation 1010, the apparatus 10 may display payment details on the display 13.

In operation 1020, the apparatus 10 may receive voice input of a user via the microphone 14. The user may check the payment details and may express whether to pay by providing a voice input. For example, the user may be said to be "pay" when paying and "not pay" when not paying.

In operation 1030, the apparatus 10 may analyze the received voice input to determine the intent of the user. The device 10 may analyze the voice input and determine whether the user wants to approve the displayed payment details.

In operation 1040, the apparatus 10 may perform user authentication through voice input. The device 10 may perform user authentication by determining whether the voice input matches the user's voice (e.g., by comparing with registered voice samples). The device 10 may determine whether the registered voice sample matches the input voice and if so proceed with the payment. The device 10 may perform user authentication not only by voice, but also by fingerprint, iris, vein, face, or password.

In operation 1050, the device 10 may send payment information to the card company. If the verification is successful, the device 10 may send payment information and card information to the card company. The payment information may include merchandise, seller information, money amount, and the like. The card information may contain a user's card number, password, expiration date, etc.

The device 10 may display a payment complete message upon completion of the payment.

As described above, when a user purchases goods or services via the internet, the user can purchase goods or services through voice input.

Fig. 11 is a block diagram of a processor 1300 according to some example embodiments.

Referring to fig. 11, a processor 1300 according to some example embodiments may include a data learner 1310 and a data identifier 1320.

The data learner 1310 may learn references for determining situations. The data learner 1310 may learn references to what data to use to determine a predetermined situation and how to determine the situation by using the data. The data learner 1310 may obtain data to be used for learning, and apply the obtained data to a data recognition model to be described below, thereby learning a reference for determining a situation.

The data learner 1310 may learn data recognition models by using voice inputs or sentences to generate a set of data recognition models that estimate the intent of the user. At this time, the voice input or sentence may include a voice uttered by the user of the apparatus 10 or a sentence for recognizing the voice of the user. Alternatively, the speech or sentence may comprise speech uttered by a third party or speech of a third party.

The data learner 1310 may learn the data recognition model by using a supervised learning method using voices or sentences and learning entities as learning data.

In an example embodiment, the data recognition model may be a set of models that estimate the intent of the user to transfer money. In this case, the learning entity may include, but is not limited to, at least one of user information, payee information, money transfer amount, and money transfer intent. The user information may include, but is not limited to, identification information of the user (e.g., name or nickname) or identification information of the user's account (e.g., account bank, account name, account nickname, or account number). The payee information may include, but is not limited to, identification information (e.g., name, nickname, or telephone number) or identification information of the payee's account (e.g., account bank, account name, account nickname, or account number). The money transfer intent may include whether the user is to transfer money. For example, money transfer intent may include, but is not limited to, money transfer progress, money transfer subscription, subscription cancellation, money transfer holding (holding), or money transfer confirmation.

In another aspect, the at least one learning entity value may have a value of "null". In this case, the value "null" may indicate that the voice input or sentence used as the learning data has no information about the entity value.

Specifically, if the voice input or sentence for learning is "1 million korean from a bank account to samsung", the learning entity is { user information: bank a, payee information: three stars, remittance amount: 1 billion Korean, remittance instruction: continue remittance }. As another example, if the voice input or sentence for learning is "sink 1 million korean to samsung", the learning entity may consist of { user information: empty, payee information: three stars, remittance amount: 1 billion Korean, remittance instruction: continue remittance }. As another example, if the voice or sentence used for learning is "is 1 million korean already remitted to samsung? ", the learning entity may consist of { user information: empty, payee information: three stars, remittance amount: 1 billion Korean, remittance instruction: confirm remittance }. As another example, if the voice or sentence for learning is "cancel subscription of sink 1 million korean to samsung", the learning entity may consist of { user information: empty, payee information: three stars, remittance amount: 1 billion Korean, remittance instruction: cancel subscription }.

In another example embodiment, the data recognition model may be a set of models that estimate the user's intent to pay. In this case, the learning entity may include, but is not limited to, at least one of a payment card, a payment item, a payment method, and a payment intention. The payment method may include, for example, a full payment or a monthly installment number. The intent of payment may include whether the user is to pay. For example, the intent-to-pay may include a payment process, a payment cancellation, a payment hold, a payment method change, or a payment confirmation.

Specifically, if the voice input or sentence for learning is "pay full using samsung card", the learning entity may consist of { payment means: three star cards, payment items: empty, payment method: full payment, payment instruction: continuing payment }. As another example, if the voice input or sentence for learning is "pay by 10 month period payment", the learning entity may consist of { payment means: empty, payment item: empty, settlement method: payment for 10 months, payment instruction: continuing payment }. As another example, if the voice input or sentence for learning is "cancel previous payment," the learning entity may consist of { payment means: empty, payment item: empty, payment method: empty, payment indication: cancel payment }.

The set of data recognition models that determine the user's money transfer intent and the set of data recognition models that determine the user's payment intent may be the same recognition model or different recognition models. Alternatively, each data recognition model may contain a plurality of data recognition models. For example, considering the use environment (e.g., use time or use place) of the user, the user's intention may be determined by using a plurality of data recognition models customized for each environment.

The data identifier 1320 may determine a situation based on the data. The data identifier 1320 may identify a situation from predetermined data by using the learned data identification model. The data identifier 1320 may determine the predetermined condition based on the predetermined data by using the obtained data as an input value, by learning and using a data identification model, by obtaining the predetermined data according to a predetermined reference. Further, by using the obtained data as an input value, a result value output by the data identification model can be used to update the data identification model.

The data recognizer 1320 may estimate the user's intention by applying the user's voice input or a sentence for recognizing the user's voice to the data recognition model. For example, the data recognizer 1320 may apply a user's voice input or a sentence for recognizing the user's voice to the data recognition model to obtain a recognition entity and provide the recognition entity to a processor of the device (e.g., the processor 11 of the device 10 of fig. 2). The processor 11 may determine the intention of the user by using the obtained recognition entity.

In an example embodiment, the data recognition model may be a set of models that estimate the intent of the user to transfer money. In this case, the data recognizer 1320 may estimate the intention of the user to transfer money by applying the user's voice input or a sentence for recognizing the user's voice to the data recognition model. For example, the data recognizer 1320 may obtain recognition entities from a user's voice input or sentences for recognizing the user's voice. The identification entity may contain, for example, at least one of user information, payee information, money transfer amount, and money transfer instructions. The data identifier 1320 may provide the obtained identification entity to the processor 11. Processor 11 (or a dialog management module of processor 11) may determine the user's intent based on identifying the entity.

If it is determined based on the recognition entity that the user's intention includes not intending to transfer money, the processor 11 may not perform a process for transferring money. On the other hand, if it is determined that the intention of the user is money transfer based on the recognition entity, the processor 11 may perform processing for money transfer.

At this time, if at least one of the values of the recognition entity is "null", the processor 11 may determine a value corresponding to the value "null" using the history information or the preset information of the user. For example, the processor 11 may determine a value corresponding to the value "null" by referring to the latest money transfer history. Alternatively, the processor 11 may determine a value corresponding to the value "null" by referring to information (e.g., account number, account bank, etc.) preset by the user in the preference setting.

If at least one of the values of the recognition entity is "null", the processor 11 may request a value corresponding to the value "null" from the user. For example, processor 11 may control display 13 to display a statement indicating that there is no information regarding at least one of user information, payee information, money transfer amount, or money transfer instructions. When the user inputs at least one piece of the above information through voice or other input (e.g., through a virtual keyboard displayed on the display 13), the processor 11 may perform a process for money transfer by using the recognition entity value obtained from the data recognizer 1320 and the user input information.

In another example embodiment, the data recognition model may be a set of models that estimate the user's intent to pay. In this case, the data identifier 1320 may estimate the user's intention of payment by applying the user's voice input or a sentence for identifying the user's voice to the data identification model. For example, the data recognizer 1320 may obtain recognition entities from a user's voice input or sentences for recognizing the user's voice. The identification entity may comprise, for example, at least one of a payment means, a payment item, a payment method, and a payment instruction. The data identifier 1320 may provide the obtained identification entity to the processor 11. Processor 11 (or a dialog management module of processor 11) may determine the user's intent based on identifying the entity.

If it is determined that the user's intention is not to pay based on the recognition entity, the processor 11 may not perform a process for payment. On the other hand, if it is determined that the user's intention is payment based on the recognition entity, the processor 11 may perform a process for payment.

On the other hand, if at least one of the values of the recognition entity is "null", the processor 11 may determine a value corresponding to the value "null" using the history information or preset information of the user. Alternatively, the processor 11 may request the user to input a value corresponding to the value "null".

At least one of the data learner 1310 and the data identifier 1320 may be manufactured as at least one hardware chip and mounted on an electronic device. For example, at least one of the data learner 1310 and the data identifier 1320 may be manufactured as a dedicated hardware chip for Artificial Intelligence (AI), or may be manufactured as a component of a conventional general-purpose processor (e.g., CPU or application processor) or a graphics processor (e.g., GPU), and may be mounted on various electronic devices as described above. In this case, the dedicated hardware chip for AI may be a dedicated processor dedicated to probability calculation and have higher parallel processing performance than a conventional general-purpose processor, thereby rapidly processing arithmetic operations (such as machine learning) in the AI field.

In this case, the data learner 1310 and the data identifier 1320 may be mounted on one electronic device or on separate electronic devices. For example, one of the data learner 1310 and the data identifier 1320 may be included in an electronic device, and the other may be included in a server. The data learner 1310 and the data identifier 1320 may provide model information constructed by the data learner 1310 to the data identifier 1320 via wired or wireless communication. The data input to the data identifier 1320 may be provided as additional learning data to the data learner 1310.

Meanwhile, at least one of the data learner 1310 and the data identifier 1320 may be implemented as a software module. When at least one of the data learner 1310 and the data identifier 1320 is implemented as a software module (or a program module containing instructions), the software module may be stored in a non-transitory computer readable medium. Further, in this case, at least one software module may be provided by an Operating System (OS) or a predetermined application. Alternatively, some of the at least one software modules may be provided by the OS, while others may be provided by the predetermined application.

Fig. 12 is a block diagram of a data learner 1310 according to some example embodiments.

Referring to fig. 12, a data learner 1310 according to some example embodiments may include a data acquirer 1310-1, a preprocessor 1310-2, a learning data selector 1310-3, a model learner 1310-4, and a model evaluator 1310-5. In some example embodiments, the data learner 1310 may essentially include the data acquirer 1310-1 and the model learner 1310-4, and may optionally include at least one of the pre-processor 1310-2, the learning data selector 1310-3, and the model evaluator 1310-5, or may not include all of the pre-processor 1310-2, the learning data selector 1310-3, and the model evaluator 1310-5.

The data acquirer 1310-1 may obtain the data needed for learning to determine the situation.

For example, the data acquirer 1310-1 may acquire voice data, image data, text data, biometric signal data, and the like. In particular, the data acquirer 1310-1 may acquire voice inputs or statements for money transfer or payment. Alternatively, the data acquirer 1310-1 may acquire voice data or text data containing a voice or sentence for money transfer or payment.

The data acquirer 1310-1 may receive data through an input device (e.g., microphone, camera, sensor, keyboard, etc.) of an electronic device. Alternatively, the data acquirer 1310-1 may acquire the data via an external device (e.g., a server) in communication with the device.

The preprocessor 1310-2 may preprocess the obtained data so that the data obtained by learning for determining the situation may be used. The preprocessor 1310-2 may process the obtained data in a predetermined format so that a model learner 1310-4, which will be described below, may use the obtained data for learning to determine a situation. For example, the pre-processor 1310-2 may extract the learning entity value from the voice data according to a predetermined format. For example, when the predetermined format is composed of { user information, payee information, money transfer amount, and money transfer instruction }, or when the predetermined format is composed of { payment means, payment item, payment method, payment instruction }, the preprocessor 1310-2 may extract the learning entity value from the voice data according to the format. At this time, if the learning entity value is not extracted, the preprocessor 1310-2 may cause the specific entity value to be displayed as "empty".

The learning data selector 1310-3 may select data required for learning from the preprocessed data. The selected data may be provided to model learner 1310-4. In this case, the data obtained by the data acquirer 1310-1 or the data processed by the preprocessor 1310-2 may be provided as learning data to the model learner 1310-4. The learning data selector 1310-3 may select data required for learning from the preprocessed data according to a predetermined reference for determining a situation. For example, the predetermined reference may be determined in consideration of at least one of an attribute of data, a generation time of data, a creator of data, reliability of data, a target of data, a generation area of data, and a size of data. Alternatively, the learning data selector 1310-3 may select data according to a predetermined reference by learning using a model learner 1310-4, which will be described below.

Model learner 1310-4 may learn references on how to determine situations based on the learning data. In addition, model learner 1310-4 may learn which learning data should be used to determine a reference for the situation. For example, model learner 1310-4 may learn a determination model according to a supervised learning method or an unsupervised learning method to generate a data identification model for prediction, determination, or estimation. The data recognition model may be, for example, a set of models for estimating a money transfer intent of the user or a set of models for estimating a payment intent of the user.

In addition, the model learner 1310-4 may learn a data identification model for determining a situation by using learning data. The data recognition model may be a pre-modeled model. For example, the data recognition model may be a pre-built model by receiving basic learning data (e.g., sample data, etc.).

The data recognition model may be constructed in consideration of the application field of the recognition model, learning purpose, or computer performance of the apparatus. The data recognition model may be, for example, a neural network based model. The data recognition model may be designed to simulate a human brain structure on a computer. The data recognition model may include a plurality of network nodes with weights to simulate neurons of a human neural network. Multiple network nodes may establish connection relationships to simulate synaptic activity of neurons transmitting and receiving signals via synapses. The data recognition model may comprise, for example, a neural network model or a deep learning model developed from the neural network model. In the deep learning model, multiple network nodes may be located at different depths (or layers) and may exchange data according to a convolution connection relationship. For example, a model such as a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), or a bi-directional recurrent deep neural network (BRDNN) may be used as the data recognition model, but the present disclosure is not limited thereto.

According to various example embodiments, when there are a plurality of data recognition models constructed in advance, the model learner 1310-4 may determine a data recognition model having a high correlation between input learning data and basic learning data as the data recognition model to be learned. In this case, the basic learning data may be pre-classified according to data type, and a data recognition model may be built in advance for each data type. For example, the basic learning data may be pre-classified by various references (such as an area in which the learning data is generated, a time in which the learning data is generated, a size of the learning data, a type of the learning data, a creator of the learning data, a type of an object in the learning data, and the like).

In addition, the model learner 1310-4 may learn the data identification model by using a learning algorithm including, for example, an error back propagation method or a gradient descent method.

Further, the model learner 1310-4 may learn the data identification model by supervised learning, for example, by using the learning data as an input value. Furthermore, the model learner 1310-4 may learn a data identification model through unsupervised learning to find references for determining conditions by, for example, learning one type of data required to determine self conditions. Further, the model learner 1310-4 may learn the data identification model through reinforcement learning (e.g., by using feedback regarding whether the results are correct based on the learning situation).

The learning data may include a voice input of the user or a voice input of a third party, a sentence via which the voice of the user or the voice of the third party is recognized, a sentence entered by the user or the third party, or the like. Further, the learning data may include learning entities associated with the speech input or the sentence. Various examples of the learning entity are described in detail with reference to fig. 11, and thus redundant descriptions thereof are omitted.

In addition, when learning the data recognition model, the model learner 1310-4 may store the learned data recognition model. In this case, the model learner 1310-4 may store the learned data identification model in a memory of an electronic device (e.g., the memory 12 of the device 10 described above) that includes the data identifier 1320. Alternatively, the model learner 1310-4 may store the learned data identification model in a memory of an electronic device including the data identifier 1320, which will be described below. Alternatively, the model learner 1310-4 may store the learned data identification model in a memory of a server connected to an electronic device (e.g., the device 10 described above) via a wired or wireless network.

In this case, the memory storing the learned data recognition model may also store instructions or data associated with at least one other component of the electronic device, for example. The memory may also store software and/or programs. A program may include, for example, a kernel, middleware (middleware), an Application Programming Interface (API), and/or an application program (or "application").

The model evaluator 1310-5 may input evaluation data to the data recognition model, and if a recognition result output from the evaluation data does not satisfy a predetermined reference, the model evaluator 1310-5 may allow the model learner 1310-4 to learn again. In this case, the evaluation data may be predetermined data for evaluating the data identification model.

For example, when the number or ratio of the evaluation data whose recognition result is incorrect exceeds a preset threshold value among the recognition results of the learned data recognition model for evaluating data, the model evaluator 1310-5 may evaluate that the learned data recognition model does not satisfy the predetermined reference. For example, when the predetermined reference is defined as a ratio of 2%, the model evaluator 1310-5 may evaluate that the learned data recognition model is unsuitable when the learned data recognition model outputs an erroneous recognition result of evaluation data of more than 20 pieces of evaluation data among 1000 pieces of evaluation data in total.

On the other hand, when there are a plurality of learning data recognition models, the model evaluator 1310-5 may evaluate whether each of the learned data recognition models satisfies a predetermined reference, and may determine a model satisfying the predetermined reference as a final data recognition model. In this case, when there are a plurality of models satisfying a predetermined reference, the model evaluator 1310-5 may determine any one model or a predetermined number of models previously set in descending order of evaluation score as the final data recognition model.

Meanwhile, at least one of the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5 included in the data learner 1310 may be manufactured in at least one hardware chip and mounted on the electronic device. For example, at least one of the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5 may be manufactured as a dedicated hardware chip for AI, or may be manufactured as a component of a conventional general-purpose processor (e.g., CPU or application processor) or a graphics processor (e.g., GPU), and may be mounted on various electronic devices as described above.

In addition, the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5 may be mounted on one electronic device or may be mounted on a separate electronic device. For example, some of the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5 may be included in the electronic device, and others may be included in the server.

In addition, at least one of the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5 may be implemented as a software module. When at least one of the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5 is implemented as a software module (or a program module containing instructions), the software module may be stored in a non-transitory computer readable medium. Further, in this case, at least one software module may be provided by an Operating System (OS) or a predetermined application. Alternatively, some of the at least one software modules may be provided by the OS, while others may be provided by the predetermined application.

Fig. 13 is a block diagram of a data identifier 1320, according to some example embodiments.

Referring to fig. 13, a data identifier 1320 according to some example embodiments may include a data acquirer 1320-1, a preprocessor 1320-2, an identification data selector 1320-3, an identification result provider 1320-4, and a model updater 1320-5. In some example embodiments, the data identifier 1320 may essentially comprise the data acquirer 1320-1 and the recognition result provider 1320-4, and may optionally comprise at least one of the preprocessor 1320-2, the recognition data selector 1320-2, and the model updater 1320-5.

The data acquirer 1320-1 may acquire data required for determining a situation. For example, the data acquirer 1320-1 may acquire a user's voice input or a sentence for recognizing the user's voice. Specifically, the data acquirer 1320-1 may acquire a voice input of the user or a sentence for making a money transfer or payment. Alternatively, the data acquirer 1320-1 may acquire voice data or text data that contains the voice or statement of the user for making the money transfer or payment.

The preprocessor 1320-2 may preprocess the obtained data so that the obtained data for determining the situation may be used. The preprocessor 1320-2 may process the obtained data into a predetermined format so that the recognition result provider 1320-4, which will be described below, may use the obtained data for determining a situation. For example, the pre-processor 1320-2 may extract the learning entity value from the voice data according to a predetermined format. For example, the pre-processor 1320-2 may extract the learning entity value according to the format of { user information, payee information, money transfer amount and money transfer instructions } or { payment means, payment items, payment methods, payment instructions }.

The identification data selector 1320-3 may select data required for determining a situation from the preprocessed data. The selected data may be provided to the recognition result provider 1320-4. The identification data selector 1320-3 may select a part or all of the pre-processed data according to a preset reference for determining a situation. For example, the predetermined reference may be determined in consideration of at least one of an attribute of the data, a generation time of the data, a creator of the data, reliability of the data, a target of the data, a generation area of the data, and a size of the data. Alternatively, the identification data selector 1320-3 may select the data according to a predetermined reference learned by the model learner 1310-4.

The recognition result provider 1320-4 may apply the selected data to the data recognition model to determine a situation. The recognition result provider 1320-4 may provide recognition results according to the data recognition purpose. The recognition result provider 1320-4 may apply the selected data to the data recognition model by using the data selected by the recognition data selector 1320-3 as an input value. Further, the recognition result may be determined by a data recognition model.

For example, when the data recognition model is a model set for estimating a money transfer intention of the user, the recognition result provider 1320-4 may apply a user's voice input to the data recognition model to estimate, infer or predict the money transfer intention of the user, the user's voice input being a sentence illustrating the voice input for recognizing the user. Alternatively, when the data recognition model is a model set for estimating the user's intention of payment, the recognition result provider 1320-4 may apply the user's voice input or a sentence for recognizing the user's voice input to the data recognition model to estimate (or infer or predict) the user's intention of payment.

The recognition result provider 1320-4 may obtain the recognition entity as a result of estimating the intention of the user. The recognition result provider 1320-4 may provide the obtained recognition entity to a processor (e.g., the processor 11 of the apparatus 10 of fig. 2). The processor may determine the user's intent based on the recognition entity and proceed with the process for money transfer or payment.

The model updater 1320-5 may update the data recognition model based on an evaluation of the recognition result provided by the recognition result provider 1320-4. For example, the model updater 1320-5 may provide the recognition results provided by the recognition result provider 1320-4 to the model learner 1310-4 so that the model learner 1310-4 may update the data recognition model.

Alternatively, the model updater 1320-5 may receive an evaluation (or feedback) regarding the recognition result from a processor (e.g., the processor 11 of the apparatus 10 of fig. 2). For example, device 10 may display money transfer details based on the money transfer intent of the user by applying the user's voice input to the data recognition model.

The user may approve the money transfer details or deny approval of the money transfer details. For example, if the user approves the money transfer details, the user may enter a voice, fingerprint, iris scan, vein image, facial image, or password. On the other hand, when the user refuses to approve the money transfer details, the user may select a cancel button, enter a voice requesting cancellation, or make no input for a predetermined period of time.

In this case, user feedback according to approval or rejection of the user may be provided to the model updater 1320-5 as an evaluation of the recognition result. In other words, the user feedback may contain information indicating that the determination result of the data identifier 1320 is false or information indicating that the determination result is true. The model updater 1320-5 may update the determination model by using the obtained user feedback.

Meanwhile, at least one of the data acquirer 1320-1, the preprocessor 1320-2, the recognition data selector 1320-3, the recognition result provider 1320-4, and the model updater 1320-5 in the data recognizer 1320 may be manufactured in at least one hardware chip and mounted on an electronic device. For example, at least one of the data acquirer 1320-1, the preprocessor 1320-2, the identification data selector 1320-3, the result provider 1320-4, and the model updater 1320-5 may be manufactured as a dedicated hardware chip for AI, or may be manufactured as a part of a conventional general-purpose processor (e.g., CPU or application processor) or a graphics processor (e.g., GPU), and may be mounted on various electronic devices as described above.

In addition, at least one of the data acquirer 1320-1, the preprocessor 1320-2, the recognition data selector 1320-3, the recognition result provider 1320-4, and the model updater 1320-5 may be mounted on one electronic device or may be mounted on a separate electronic device. For example, some of the data acquirer 1320-1, the preprocessor 1320-2, the recognition data selector 1320-3, the recognition result provider 1320-4, and the model updater 1320-5 may be included in an electronic device, and others may be included in a server.

In addition, at least one of the data acquirer 1320-1, the preprocessor 1320-2, the recognition data selector 1320-3, the recognition result provider 1320-4, and the model updater 1320-5 may be implemented as a software module. When at least one of the data acquirer 1320-1, the preprocessor 1320-2, the identification data selector 1320-3, the identification result provider 1320-4, and the model updater 1320-5 is implemented as a software module (or a program module containing instructions), the software module may be stored in a non-transitory computer readable medium. Further, in this case, at least one software module may be provided by the OS or a predetermined application. Alternatively, some of the at least one software modules may be provided by the OS, while others may be provided by the predetermined application.

Fig. 14 is a diagram illustrating an example of learning and identifying data through interaction between the apparatus 1000 and the server 2000, according to some non-limiting embodiments.

The apparatus 1000 may correspond to, for example, the apparatus 10 of fig. 2. The data acquirer 1320-1, the preprocessor 1320-2, the recognition data selector 1320-3, the recognition result provider 1320-4, and the model updater 1320-5 in the data identifier 1320 of the apparatus 1000 may correspond to the data acquirer 1320-1, the preprocessor 1320-2, the recognition data selector 1320-3, the recognition result provider 1320-4, and the model updater 1320-5 in the data identifier 1320 of fig. 13, respectively. Further, the data acquirer 2310, the preprocessor 2320, the learning data selector 2330, the model learner 2340, and the model evaluator 2350 in the data learner 2300 of the server 2000 correspond to the data acquirer 1310-1, the preprocessor 1310-2, the learning data selector 1310-3, the model learner 1310-4, and the model evaluator 1310-5, respectively.

The device 1000 may interact with the server 2000 by short-range or long-range communication. The device 1000 and the server 2000 being connected to each other means that the device 1000 and the server 2000 are connected to each other directly or through another component (e.g., at least one of an Access Point (AP), a hub (hub), a relay device, a base station, a router, and a gateway, as a third component).

Referring to fig. 14, the server 2000 may learn a reference for determining a situation, and the apparatus 1000 may determine the situation based on the learning result of the server 2000.

In this case, the model learner 2340 of the server 2000 may perform the functions of the data learner 1310 shown in fig. 12. Model learner 2340 of server 2000 may learn what data to use to determine predetermined conditions and how to determine conditions by using the data. The model learner 2340 may obtain data to be used for learning and apply the obtained data to a data recognition model to learn references for determining conditions. For example, the model learner 2340 may learn data recognition models by using voice inputs or sentences to generate a set of data recognition models that estimate the intent of the user. The generated data recognition model may be, for example, a set of models for estimating at least one of a money transfer intent and a payment intent of the user.

The recognition result provider 1320-4 of the apparatus 1000 may determine the situation by applying the data selected by the recognition data selector 1320-3 to the data recognition model generated by the server 2000. For example, the recognition result provider 1320-4 may transmit the data selected by the recognition data selector 1320-3 to the server 2000. The server 2000 may apply the data selected by the identification data selector 1320-3 to the data identification model to request a determination. Further, the recognition result provider 1320-4 may receive information about the situation determined by the server 2000 from the server 2000. For example, when the selected data contains a user's voice input or a sentence for recognizing the user's voice, the server 2000 may apply the selected data to a data recognition model set that estimates the user's intention to obtain a recognition entity containing the user's intention. The server 2000 may provide the obtained entity to the recognition result provider 1320-4 as information about the determined case.

As another example, the recognition result provider 1320-4 of the apparatus 1000 may receive the recognition model generated by the server 2000 from the server 2000 and determine the situation by using the received recognition model. In this case, the recognition result provider 1320-4 of the apparatus 1000 may apply the data selected by the recognition data selector 1320-3 to the data recognition model received from the server 2000 to determine the situation. For example, when the selected data contains a user's voice input or a sentence for recognizing the user's voice, the recognition result provider 1320-4 of the apparatus 1000 may apply the selected data to a data recognition model set received from a server estimating the user's intention to obtain a recognition entity containing the user's intention. The apparatus 1000 may then provide the obtained entity to a processor (e.g., the processor 11 of fig. 2) as information about the determined situation.

Processor 11 may determine the user's money transfer intent or payment intent based on the identified entity and may conduct a process for transferring money or payment.

The apparatus 10 according to example embodiments may collect money to a payee only through voice input.

The apparatus 10 according to example embodiments may collect money to the payee by transmitting the payee's name, contact, and amount to the bank server 20 without having to transmit the payee's account number.

The apparatus 10 according to example embodiments may pay through voice input only.

Fig. 15 and 16 are flowcharts of a network system using a data recognition model, according to some non-limiting example embodiments.

In fig. 15 and 16, a network system may include first components 1501 and 1601 and second components 1502 and 1602. Herein, the first components 1501 and 1601 may be the apparatus 1000, and the second components 1502 and 1602 may be the server 2000 storing the data analysis model. Alternatively, the first components 1501 and 1601 may be general-purpose processors, and the second components 1502 and 1602 may be AI-specific processors. Alternatively, the first components 1501 and 1601 may be at least one application, and the second components 1502 and 1602 may be an OS. In other words, the second components 1502 and 1602 may be more integrated and dedicated and less delayed components than the first components 1501 and 1601, and have better performance and more resources than the first components 1501 and 1601, and may process many of the operations required to create, update, or apply the data recognition model more quickly and efficiently than the first components 1501 and 1601.

In this case, an interface for transmitting/receiving data between the first components 1501 and 1601 and the second components 1502 and 1602 may be defined.

For example, an Application Program Interface (API) may be defined that has learning data to be applied to the data recognition model as a factor value (or median or transition value). An API may be defined as a set of subroutines or functions that may be invoked for any processing of any protocol (e.g., the protocol defined in apparatus 1000) to another protocol (e.g., the protocol defined in server 2000). In other words, an environment in which operations of another protocol can be performed in any one protocol through an API can be provided.

In fig. 15, a first component 1501 may analyze a money transfer intent of a user by using a data recognition model.

In operation 1511, the first component 1501 may receive an issued voice of a user having a money transfer intention.

In operation 1513, the first component 1501 may send the received voice input or a sentence for recognizing the received voice to the second component 1502. For example, the first component 1501 may apply a voice input or sentence as a factor value of an API function provided for using a data recognition model. In this case, the API function may send the speech input or statement to the second component 1502 as recognition data to be applied to the data recognition model. At this time, the voice input or sentence may be changed and transmitted according to the promised communication format.

In operation 1515, the second component 1502 may apply the received voice input or statement to a set of data recognition models that estimate the user's money transfer intent.

As a result of the application, in operation 1517, the second component 1502 may obtain an identification entity. For example, the identification entity may contain at least one of user information, payee information (e.g., a payee's name), a money transfer amount, and money transfer instructions.

In operation 1519, the second component 1502 may send the identifying entity to the first component 1501. At this time, the identification entity may be changed and transmitted according to the promised communication format.

In operation 1521, the first component 1501 may determine that the user's voice input has a money transfer intent based on the recognition entity. For example, if "continue money transfer, payee name, and money transfer amount" are included as money transfer instruction values for the recognition entity, the first component 1501 may determine that the user's voice has a money transfer intention.

Operations 1513 through 1521 may correspond, herein, to an embodiment of the process in operation 720 of fig. 2 in which device 10 analyzes the received voice to determine the user's money transfer intent.

If it is determined in operation 1521 that the user's voice has a money transfer intention, the first component 1501 may search the contact list for a contact corresponding to the name of the payee included in the recognition entity in operation 1523.

In operations 1525, 1527, and 1529, the first component 1501 may approve the details to collect money to the payee's account number based on the found payee's contact. The corresponding procedure corresponds to operations 740 to 760 of fig. 7, and redundant description thereof will be omitted.

In fig. 16, the first component 1601 may analyze the user's intent to pay by using a data recognition model.

In operation 1611, the first component 1601 may provide payment details. For example, the first component 1601 may display payment details on a screen or output payment details by voice.

The user can check the payment details displayed on the screen and can express whether to pay through voice input.

In operation 1613, the first component 1601 may receive a voice input of a user.

In operation 1615, the first component 1601 may send the received voice input or a sentence identifying the received voice to the second component 1602. For example, the first component 1601 may apply a speech input or statement as a factor value for an API function provided for using the data recognition model. In this case, the API function may send the speech input or statement to the second component 1602 as recognition data to be applied to the data recognition model. At this time, the voice input or sentence may be changed and transmitted according to the promised communication format.

In operation 1617, the second component 1602 may apply the received speech or sentence to a set of data recognition models that estimate the user's intent to pay.

As a result of the application, the second component 1602 may obtain the identified entity in operation 1619. For example, the identification entity may include, but is not limited to, at least one of a payment instrument, a payment item, a payment method, and a payment instruction.

In operation 1621, the second component 1602 may send the identification entity to the first component 1601. At this time, the identification entity may be changed and transmitted according to the promised communication format.

In operation 1623, the first component 1601 may determine that the user's voice has a payment intention based on the recognition entity. For example, if "cancel payment" is included as the payment instruction value of the recognition entity, the first component 1601 may determine that the user's voice has an intention to not proceed with the payment. On the other hand, if "continue payment" is included as the payment instruction value of the recognition entity, the first component 1601 may determine that the user's voice has an intention to continue payment.

Herein, operations 1615 through 1623 may correspond to embodiments of the process in operation 1030 of fig. 10 as described above, in which the device 10 analyzes the received voice to determine the user's intent to pay.

If it is determined that the voice input of the user has the intention of payment, the first component 1601 may transmit payment information to the card company if the user authentication by voice is successful in operations 1625 and 1627. The corresponding procedure corresponds to operations 1040 and 1050 of fig. 10, and redundant description thereof is omitted.

One or more example embodiments may be implemented using a recording medium containing computer-executable instructions, such as program modules, executed by a computer system. Non-transitory computer readable recording media can be any available media that can be accessed by the computer system and that contains all types of volatile and nonvolatile media, and separate and non-separate media. Furthermore, the non-transitory computer-readable recording medium may contain all types of computer storage media and communication media. Computer storage media includes all types of volatile and nonvolatile, and separation and non-separation media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Communication media typically embodies computer readable instructions, data structures, program modules, other data in a modulated signal, other transport mechanism, and any information delivery media.

Furthermore, the method according to embodiments may be provided as a computer program product.

The computer program product may comprise a software program, a computer readable storage medium storing the software program, or a product for a transaction between a seller and a buyer.

For example, the computer program product may comprise a product (e.g., downloadable application) in the form of a software program that is distributed electronically via the device 10 or a manufacturer of the device 10 or an electronic marketplace (e.g., google Play Store, app Store). For electronic distribution, at least a portion of the software program may be stored in a storage medium or may be created temporarily. In this case, the storage medium may be a storage medium of a manufacturer or a server of an electronic market, or a relay server.

Furthermore, in the description, a "unit" may be a hardware component such as a processor or a circuit and/or a software component executed by a hardware component such as a processor.

The above-described exemplary embodiments are merely illustrative, and it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without changing the technical spirit of the present disclosure. Accordingly, the exemplary embodiments should be considered in an illustrative sense only and not for the purpose of limitation. For example, each component described as a single type may be implemented by distribution, and likewise, components described as distributed types may also be implemented by coupling.

It should be understood that the example embodiments described herein should be considered in descriptive sense only and not for purposes of limitation. It is generally understood that the descriptions of features or aspects in each of the example embodiments may be used for other similar features or aspects in other example embodiments.

Although one or more example embodiments have been described with reference to the accompanying drawings, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope as defined by the following claims.

Claims

1. An apparatus, comprising:

a memory configured to store at least one program;

a microphone configured to receive a voice input; and

at least one processor configured to execute the at least one program to control the apparatus to perform operations of collecting money to a payee, the operations comprising:

analyzing the received voice input using a pre-built data recognition model customized to the environment of use of the device to estimate the user's intent to pay using an artificial intelligence AI algorithm;

obtaining a user's intent to pay based on the analyzed voice input, including at least one of a payee's name, money transfer amount, or money transfer instructions, using the data recognition model;

Obtaining contact information from a stored contact list based on the names of the recipients identified from the analyzed voice input;

transmitting the name of the payee and the contact information to the server together with the money transfer amount;

receiving money transfer information generated based on the transmitted money transfer amount from the server; and

receiving feedback input for approving the received money transfer information for transfer to the recipient, or rejecting the money transfer information to cancel transfer to the recipient;

wherein the feedback input for approving the money transfer information includes user information including at least one of a fingerprint, an iris scan, a facial image, a vein pattern image, or a voice of the user,

wherein money transfer information is identified as approved based on the user information having the same or greater similarity to corresponding user information stored in the memory, and

wherein the usage environment of the device includes at least one of a usage time or a usage place of the user.

2. The apparatus of claim 1, wherein obtaining a user's intent to pay comprises learning a pattern through received voice input when the user is pooling money.

3. The apparatus of claim 1, wherein the at least one processor is further configured to control the apparatus to:

Verify that the received voice input is voice of a user of the device,

wherein the user's intent to pay is obtained based on the received voice input being verified as voice of the user of the device.

4. The apparatus of claim 1, the at least one processor further configured to control the apparatus to:

displaying the remittance information

Wherein the money transfer information includes an account number of the payee.

5. The apparatus of claim 1, wherein the data recognition model is a model based on an artificial intelligence AI algorithm using learning entity values extracted from learning data including speech input or text, and

wherein the learning entity value includes a value of at least one of user information, payee information, money transfer amount, and money transfer instructions.

6. The device of claim 1, wherein obtaining the user's intent to pay is based on a recognition entity value obtained as a result of applying the received voice input to the data recognition model,

wherein the identification entity value includes a value of at least one of user information, payee information, money transfer amount, and money transfer instructions.

7. A payment method, comprising:

receiving voice input of a user;

analyzing the received voice input using a pre-built data recognition model customized to the environment of use of the device to estimate the intent of payment using an artificial intelligence AI algorithm;

receiving feedback input for approving the received money transfer information for transfer to the recipient, or rejecting the money transfer information to cancel the transfer to the recipient;

8. The payment method of claim 7, wherein obtaining the user's intent to pay comprises learning a pattern through received voice input when the user is pooling money.

9. The payment method of claim 7, further comprising:

verifying that the received voice input is voice of a user of the device, and

10. The payment method of claim 7, further comprising:

displaying the remittance information

Wherein the money transfer information includes an account number of the payee.

11. A computer storage medium comprising instructions stored thereon, wherein the instructions are configured to, when executed, cause an apparatus to: