CN113811895A

CN113811895A - Method and apparatus for artificial intelligence model personalization

Info

Publication number: CN113811895A
Application number: CN202080034927.0A
Authority: CN
Inventors: 朴致衍; 金在德; 孙泳哲; 崔寅权
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2019-07-18
Filing date: 2020-07-08
Publication date: 2021-12-17
Also published as: EP3915061A4; WO2021010651A1; EP3915061A1

Abstract

An electronic device is disclosed. The electronic device may include: a memory configured to store one or more training data generation models and artificial intelligence models; and a processor configured to generate a model using the one or more training data to generate personal training data reflecting characteristics of the user, train the artificial intelligence model using the personal learning data as the training data, and store the trained artificial intelligence model in the memory.

Description

Method and apparatus for artificial intelligence model personalization

Technical Field

The present disclosure relates to an electronic device updated based on training data and a control method thereof, and more particularly, to a method and apparatus for artificial intelligence model personalization through incremental learning (incremental learning).

Background

Artificial neural networks can be designed and trained to perform a wide range of functions, and application techniques thereof can be applied to speech recognition, object recognition, and the like. Artificial neural networks may exhibit improved performance due to training the artificial neural networks by using large amounts of training data from large databases. In particular, in the case where the artificial intelligence model recognizes elements (such as speech) that are different for each user, a large amount of data is required because the artificial intelligence model may need to be trained by using both personal training data including characteristics of the user of the electronic device and general training data including characteristics of the general user.

Accordingly, a method of continuously accumulating training data for updating an artificial intelligence model and training the artificial intelligence model based on the accumulated training data as shown in fig. 1 may be considered. However, such an approach has problems in that the storage capacity needs to be continuously increased according to the increase of training data, the computer resources for learning are increased, and the time required for the update of the model is increased.

Further, a method of training an artificial intelligence model by sampling only some of training data may be considered, but such a method has a problem that sampling is not easily performed efficiently.

Furthermore, the method of not maintaining or partially maintaining existing training data and training an artificial intelligence model based on new training data has the problem of catastrophic forgetting that the artificial intelligence model forgets previously learned knowledge.

Disclosure of Invention

Technical problem

Embodiments of the present disclosure overcome the above disadvantages and other disadvantages not described above. Furthermore, the present disclosure is not required to overcome the disadvantages described above, and embodiments of the present disclosure may not overcome any of the problems described above.

Technical scheme for solving problems

According to an aspect of the present disclosure, an electronic device may include: a memory configured to store one or more training data generation models and artificial intelligence models; and a processor configured to generate a model using the one or more training data to generate personal training data reflecting characteristics of the user, train the artificial intelligence model using the personal learning data as the training data, and store the trained artificial intelligence model in the memory.

The one or more training data generating models may include: a personal training data generation model trained to generate personal training data reflecting characteristics of a user; and a generic training data generation model trained to generate generic training data corresponding to usage data of a plurality of users.

The artificial intelligence model may be a model that is updated based on at least one of personal training data, generic training data, or actual user data obtained from the user.

The personal training data generation model may be updated based on at least one of the user data or the personal training data.

The artificial intelligence model may be a speech recognition model, a handwriting recognition model, an object recognition model, a speaker recognition model, a word recommendation model, or a translation model.

The generic training data may include first input data, the personal training data includes second input data, and the artificial intelligence model performs unsupervised learning based on the user data, the first input data, and the second input data.

The generic training data may include first input data, the personal training data may include second input data, and the artificial intelligence model may generate first output data corresponding to the first input data based on the first input data entered, generate second output data corresponding to the second input data based on the second input data entered, and train based on the user data, the first input data, the first output data, the second input data, and the second output data.

The generic training data generation model may be downloaded from a server and stored in memory.

The processor may upload the artificial intelligence model to a server.

The processor may train the artificial intelligence model based on the electronic device being in a charging state, the occurrence of a predetermined time, or no user manipulation of the electronic device being detected within the predetermined time.

According to an aspect of the present disclosure, a method of controlling an electronic device including one or more training data generation models and an artificial intelligence model may include: generating personal training data reflecting characteristics of the user using one or more training data generation models; training an artificial intelligence model using the personal training data as training data; and storing the trained artificial intelligence model.

The generic training data may include first input data, the personal training data may include second input data, and the artificial intelligence model may perform unsupervised learning based on the user data, the first input data, and the second input data.

The generic training data may include first input data, the personal training data may include second input data, and the artificial intelligence model may generate first output data corresponding to the first input data based on the first input data entered, may generate second output data corresponding to the second input data based on the second input data entered, and may be trained based on the user data, the first input data, the first output data, the second input data, and the second output data.

The generic training data generating model may be downloaded from a server and stored in a memory of the electronic device.

The artificial intelligence model may be uploaded to a server.

The method can comprise the following steps: the artificial intelligence model is trained based on the electronic device being in a charged state, the occurrence of a predetermined time, or the absence of detection of a user manipulation of the electronic device within the predetermined time.

According to an aspect of the present disclosure, a non-transitory computer-readable medium may store a program for executing a method of personalization of an artificial intelligence model in an electronic device, the method including: generating personal training data reflecting characteristics of a user; and training the artificial intelligence model using the personal training data as training data.

The method may include generating generic training data reflecting characteristics of a plurality of users.

The method may include collecting and storing actual usage data of the user obtained by an apparatus in the electronic device.

The method may include training an artificial intelligence model using at least one of personal training data, generic training data, or obtained actual usage data of the user.

Advantageous effects of the invention

As described above, according to various embodiments of the present disclosure, one or more of generic training data and personal training data are generated on an electronic device by a training data generation model, and thus it is not necessary to store a large amount of training data (e.g., an amount of 1TB to 2 TB) for training an artificial intelligence model. Therefore, even on an electronic apparatus having a small memory capacity, it is possible according to the present embodiment to efficiently train an artificial intelligence model by generating the model (for example, a model size of 10MB to 20 MB) using training data.

Further, since it is not necessary to download training data from an external server, the artificial intelligence model can be freely trained even in a state where a network connection is not established.

Furthermore, the artificial intelligence model is trained based on personal training data that reflects characteristics of a user of the electronic device, and thus the artificial intelligence model may have improved accuracy in recognition of user data (such as user speech).

Additional and/or other aspects and advantages of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The above and other aspects, features and advantages of certain embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings, in which:

fig. 1 is a diagram for describing a related art;

fig. 2 is a diagram for schematically describing a configuration of an electronic system according to an embodiment;

FIG. 3 is a block diagram for describing the operation of an electronic device according to an embodiment;

FIG. 4 is a flow diagram showing an exemplary process of training an artificial intelligence model and performing recognition, according to an embodiment;

FIG. 5 is a diagram for describing a process of training a personal training data generation model according to an embodiment;

FIG. 6 is a diagram for describing a process of training an artificial intelligence model according to an embodiment;

FIG. 7 is a diagram used to describe certain operations performed between a processor and memory, in accordance with an embodiment;

FIG. 8 is a diagram for describing a case where an artificial intelligence model is implemented by an object recognition model according to another embodiment; and

FIG. 9 is a diagram for describing a process of training an artificial intelligence model on an electronic device, according to an embodiment.

Detailed Description

Hereinafter, the present disclosure will be described in detail with reference to the accompanying drawings.

After briefly describing the terms used in the specification, the present disclosure will be described in detail.

General terms, which are currently widely used, are selected as terms used in the embodiments of the present disclosure in consideration of functions in the present disclosure, but may be changed according to the intention of those skilled in the art or judicial precedent, the emergence of new technology, and the like. Further, in certain instances, there may be terms that the applicant chooses arbitrarily. In this case, the meanings of these terms will be mentioned in detail in the corresponding description part of the present disclosure. Accordingly, terms used in the embodiments of the present disclosure should be defined based on the meanings of the terms and throughout the present disclosure, not based on the simple names of the terms.

Because the present disclosure may be modified and practiced in different but different embodiments, specific embodiments thereof are shown in the drawings and will be described herein in detail. It should be understood, however, that the present disclosure is not limited to the particular embodiments, but includes all modifications, equivalents, and alternatives without departing from the scope and spirit of the present disclosure. In a case where it is determined that detailed description of known technologies related to the present disclosure may obscure the gist of the present disclosure, detailed description of the known technologies will be omitted.

The singular forms of terms are intended to include the plural forms of terms unless the context clearly indicates otherwise. It will be understood that terms such as "comprises" or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, or components, or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof.

The expression "at least one of a and/or B" should be understood as indicating "a only", "B only", or "a and B".

Expressions such as "first", "second", etc. used in the specification may indicate various components, regardless of the order and/or importance of the components, and are only used to distinguish one component from other components, and do not limit the corresponding components.

When referring to any component (e.g., a first component) being (operatively or communicatively) coupled/coupled to another component (e.g., a second component) or to another component (e.g., a second component), it is understood that any component is directly coupled to the other component or may be coupled to the other component through the other component (e.g., a third component).

In the present disclosure, a "module" or "-device" may perform at least one function or operation, and may be implemented by hardware or software, or by a combination of hardware and software. Further, a plurality of "modules" or a plurality of "-processors" may be integrated in at least one module, and may be implemented by at least one processor (not shown) in addition to the "modules" or "-processors" implemented by specific hardware. In the present disclosure, the term "user" may be a person using the electronic device or a device using the electronic device (e.g., an Artificial Intelligence (AI) electronic device).

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present disclosure pertains may easily practice the present disclosure. However, the present disclosure may be modified in various different forms and is not limited to the embodiments described herein. Furthermore, in the drawings, portions irrelevant to the description will be omitted so as not to obscure the present disclosure, and like reference numerals will be used to describe like portions throughout the specification.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

Fig. 2 is a diagram for schematically describing a configuration of an electronic system according to an embodiment.

Referring to fig. 2, the electronic system 10 according to the embodiment includes an electronic device 100 and a server 200.

The electronic device 100 according to an embodiment may be configured to recognize user data by performing a specific operation using an artificial intelligence model (alternatively, a neural network model or a learning network model). Here, the user data is data reflecting unique characteristics of the user, such as the user's voice, the user's handwriting, an image captured by the user, the user's input character data, translation data, and the like. Further, the user data is actual usage data obtained from the user, and may also be referred to as actual user data.

Artificial intelligence related functions according to the present disclosure are performed by a processor and a memory. A processor may be implemented by one or more processors. Here, the one or more processors may be general-purpose processors such as a Central Processing Unit (CPU), an Application Processor (AP), or a Digital Signal Processor (DSP), graphics-specific processors such as a Graphics Processing Unit (GPU) or a Visual Processing Unit (VPU), or artificial intelligence-specific processors such as a Neural Processing Unit (NPU). The one or more processors may be configured to perform control to process the input data according to predefined operational rules or artificial intelligence models stored in the memory. Alternatively, where one or more processors are artificial intelligence specific processors, the artificial intelligence specific processors can be designed with a hardware architecture that is specific to the processing of a particular artificial intelligence model.

The predefined operation rules or artificial intelligence models are obtained by training. Here, obtaining the predefined operation rule or artificial intelligence model through training means: the basic artificial intelligence model is trained using training data and by using a training algorithm to obtain a predefined set of operational rules or artificial intelligence models to achieve a desired characteristic (or goal). The training may be performed by a device in which artificial intelligence is performed in accordance with the present disclosure, or may be performed by a separate server and/or system. Examples of training algorithms may include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

The artificial intelligence model can include a plurality of neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and the neural network calculation is performed by using a calculation result of a previous layer and by using a plurality of weight value calculations. The plurality of weight values of the plurality of neural network layers may be optimized (or improved) by the training results of the artificial intelligence model. For example, a plurality of weight values may be updated during a training process to reduce or minimize a loss value or cost value obtained by an artificial intelligence model. The artificial neural network may include a Deep Neural Network (DNN). For example, the artificial neural network may be a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Restricted Boltzmann Machine (RBM), a Deep Belief Network (DBN), a Bidirectional Recurrent Deep Neural Network (BRDNN), or a deep Q network, but is not limited thereto.

In case the artificial intelligence model is a speech recognition model according to an embodiment, the electronic device 100 may comprise a speech recognition model recognizing the speech of the user and may thus provide a virtual assistant functionality. For example, the electronic apparatus 100 may be implemented in various forms, such as a smart phone, a tablet Personal Computer (PC), a mobile phone, a video phone, an e-book reader, a desktop PC, a laptop PC, a netbook computer, a workstation, a server, a Personal Digital Assistant (PDA), a Portable Multimedia Player (PMP), an MP3 player, a medical device, a camera, and a wearable device.

The artificial intelligence model may include multiple neural network layers and may be trained to increase recognition efficiency.

The electronic device 100 according to the embodiment may include a training data generation model that generates training data by itself without reserving a large amount of training data for training the artificial intelligence model, and thus may update the artificial intelligence model in real time as needed.

The training data generation model may be installed in the electronic device in advance, or may be downloaded from the server 200 and installed in the electronic device.

As described above, the artificial intelligence model may be trained based on training data generated by a training data generation model installed in the electronic device.

Here, the training data generation model may include at least one individual training data generation model and a general training data generation model. The personal training data generation model may be a model that generates personal training data reflecting characteristics of a user of the electronic apparatus 100, and the general training data generation model may be a model that generates general training data reflecting characteristics of a general user.

The artificial intelligence model may be trained using personal training data generated from a personal training data generating model, generic training data generated from a generic training data generating model, or both personal training data and generic training data.

The artificial intelligence model may be trained by using both the personal training data and the generic training data because the characteristics of the user of the electronic device 100 may not be reflected in the case where the artificial intelligence model is trained only by using the generic training data, and the characteristics of a specific user may be excessively biased in performing training of the artificial intelligence model in the case where the artificial intelligence model is trained only by using the personal training data.

The server 200 is an apparatus configured to manage at least one of a personal training data generation model, a general training data generation model, or an artificial intelligence model, and may be implemented by a central server, a cloud server, or the like. The server 200 may transmit at least one of the personal training data generation model, the general training data generation model, or the artificial intelligence model to the electronic device 100 based on a request from the electronic device 100 or the like. Specifically, the server 200 may transmit the generic training data generation model pre-trained based on the generic user data to the electronic device 100. Further, the server 200 may transmit update information (for example, weight value information or bias information of each layer) for updating the already trained general training data generation model or the updated general training data generation model itself to the electronic apparatus 100 as needed.

The personal training data generating model and the artificial intelligence model transmitted to the electronic device 100 may be untrained models. The personal training data generation model and the artificial intelligence model may be installed in the electronic device in advance. In consideration of user privacy and the like, the personal training data generation model and the artificial intelligence model of the electronic device may be trained on the electronic device 100 without a process of transmitting user data related to training to a server. In some cases, user data may be sent to server 200, and personal training data generation models and artificial intelligence models may be trained on server 200.

The training data generation model may directly generate training data in the electronic device 100, and thus, the electronic device 100 may not need to store a large amount of actual training data or receive training data separately from the server 200. Further, in addition to the generic training data, the artificial intelligence model may be trained by using personal training data. Thus, the artificial intelligence model can be trained as a personalized artificial intelligence model that reflects the characteristics of the user, and can be incrementally updated.

In addition to the speech recognition model, the artificial intelligence model may be implemented by various models such as a handwriting recognition model, a visual object recognition model, a speaker recognition model, a word recommendation model, and a translation model. However, hereinafter, for convenience of explanation, the voice recognition model will be mainly described.

Fig. 3 is a block diagram for describing the operation of an electronic device according to an embodiment of the present disclosure.

Referring to fig. 3, the electronic device 100 includes a memory 110 and a processor 120.

The memory 110 is electrically connected to the processor 120 and may store data for implementing various embodiments of the present disclosure.

Depending on the data storage purpose, the memory 110 may be implemented in the form of a memory embedded in the electronic device 100 or in the form of a memory attachable to the electronic device 100 and detachable from the electronic device 100. For example, data for driving the electronic device 100 may be stored in a memory embedded in the electronic device 100, and data for extended functions of the electronic device 100 may be stored in a memory attachable to the electronic device 100 and detachable from the electronic device 100. The memory embedded in the electronic device 100 may be implemented by at least one of volatile memory (e.g., dynamic ram (dram), static ram (sram), or synchronous dynamic ram (sdram)), non-volatile memory (e.g., one-time programmable ROM (otprom), programmable ROM (prom), erasable and programmable ROM (eprom), electrically erasable and programmable ROM (eeprom), mask ROM, flash memory (e.g., NAND flash memory or NOR flash memory), a hard disk drive, or a Solid State Drive (SSD)), and the memory attachable to and detachable from the electronic device 100 may be implemented by a memory card (e.g., Compact Flash (CF), Secure Digital (SD), Micro-secure digital (Micro-SD), Mini-secure digital (Mini-SD), extreme digital (xD), multimedia card (MMC)), external memory connectable to a USB port (e.g., USB memory), etc.

According to an embodiment, the memory 110 may store one or more training data generation models and artificial intelligence models. Here, the one or more training data generating models may include a general training data generating model and a personal training data generating model.

Here, the general training data generation model may be a model trained to generate general training data corresponding to usage data of a general user. In other words, the generic training data may be data reflecting the characteristics of the general user. According to an embodiment, the common training data generation model includes a plurality of neural network layers, each of the plurality of neural network layers includes a plurality of parameters, and each layer may perform neural network calculation by using a calculation result of a previous layer and by calculation using the plurality of parameters. For example, based on the input random values, calculations are sequentially performed in a plurality of pre-trained neural network layers included in the generic training data generation model so that generic training data may be generated.

The personal training data generation model is a model trained to generate personal training data reflecting characteristics of the user of the electronic apparatus 100. For example, based on the input random values, calculations are sequentially performed in a plurality of pre-trained neural network layers included in the personal training data generation model so that personal training data can be generated.

According to an embodiment, the artificial intelligence model may be a model that recognizes data resulting from user operations, such as speech utterances or handwriting. The artificial intelligence model may be a speech recognition model, a handwriting recognition model, an object recognition model, a speaker recognition model, a word recommendation model, a translation model, and so forth. As an example, in the case where the artificial intelligence model is implemented by a speech recognition model, the artificial intelligence model may be implemented by an Automatic Speech Recognition (ASR) model that is a model that recognizes a user's speech and outputs text corresponding to the recognized speech. However, according to another embodiment, the artificial intelligence model may be a model that generates various outputs based on the user speech (e.g., outputs transformed speech corresponding to the user speech). The artificial intelligence model may be implemented by various types of models other than the above-described models as long as the artificial intelligence model can be trained to reflect the characteristics of the person by using the personal training data.

The personal training data generation model and the artificial intelligence model may be received from the server 200 and stored in the memory 110. However, the personal training data generating model and the artificial intelligence model may also be stored in the memory 110 when the electronic device 100 is manufactured, or received from other external devices other than a server and stored in the memory 110.

In addition, the memory 110 may store user data. Here, the user data is actual user data of the user. For example, the user data may be voice data spoken directly by the user, handwriting actually written by the user, images captured directly by the user, and so forth. The user data has a concept different from that of the individual training data or the general training data. For example, the user data may be speech data spoken directly by the user, and the personal training data may be data of speech similar to the speech spoken directly by the user, the similar speech being artificially generated by a personal training data generation model used to train the artificial intelligence model.

As an example, in the case where the user data is user voice data, the user voice data may be user voice data received through a microphone (not shown) provided in the electronic device 100 or user voice data received from an external device. For example, the user voice data may be data such as a WAV file or an MP3 file, but is not limited thereto.

As another example, where the user data is user handwriting data, the user handwriting data may be handwriting data that is touched by a user or input by a stylus through a display (not shown) provided in the electronic device 100.

The user data may be stored in a memory different from the memory storing the general training data generation model, the personal training data generation model, and the artificial intelligence model.

The processor 120 is electrically connected to the memory 110 and controls the overall operation of the electronic device 100. The processor 120 controls the overall operation of the electronic device 100 by using various instructions or programs stored in the memory 110. Specifically, according to an embodiment, a main Central Processing Unit (CPU) may copy a program to a RAM according to instructions stored in a ROM and access the RAM to execute the program. Here, the program may include an artificial intelligence model or the like.

According to an embodiment, processor 120 may train the artificial intelligence model based on personal training data reflecting characteristics of the user and generated by a personal training data generation model, and user data corresponding to actual usage data of the user. However, in the case where the artificial intelligence model is trained only by using personal training data, the characteristics of a specific user may be excessively biased when training of the artificial intelligence model is performed.

Accordingly, the processor 120 may train the artificial intelligence model based on the general training data, the personal training data, and the user data that are generated by the general training data generation model that reflects characteristics of the general user. In the case of using both generic and personal training data, the artificial intelligence model may be trained to be a model that is personalized to the user, and based on the generic training data, training of the artificial intelligence model is performed without over-biasing the particular user. In the following, embodiments will be described in which an artificial intelligence model is trained on the basis of general training data and personal training data.

FIG. 4 is a flow diagram illustrating an exemplary process for training an artificial intelligence model and performing recognition in accordance with an embodiment of the present disclosure.

The electronic device 100 may store the generic training data generation model, the personal training data generation model, and the artificial intelligence model in the memory 110.

The personal training data generation model may be trained based on the user data (operation S410). According to an embodiment, the user data may be data obtained in a training data obtaining mode of the electronic device 100, thereby obtaining actual usage data of the user as training data. For example, once the training data acquisition mode is activated, predetermined text may be displayed on the display and a message requesting an utterance of the displayed text may be provided. For example, the processor 120 may control a speaker (not shown) to output an audio output such as "please read the displayed text aloud" or may control a display to display a User Interface (UI) window showing a message "please read the displayed text aloud". Then, a user voice (such as "how is today's weather. Alternatively, in the training data acquisition mode, the user may input handwriting corresponding to a particular text depending on a predetermined message. The handwriting input described above may be used as user data.

However, the present disclosure is not limited thereto, and even in a case where the training data obtaining mode is not activated, the processor 120 may obtain the user voice through the microphone and use the obtained user voice as the training data. For example, a user voice query, a user voice command, or the like, input during regular use of the electronic device may be obtained in a predetermined period and used as training data.

As described in more detail below, after generating the personal training data, the personal training data generation model may be trained based on at least one of user speech data or personal training data. Operation S410 will be described in detail with reference to fig. 5.

The generic training data generation model is pre-trained on the server 200 and then transmitted to the electronic device 100, so it may not be necessary to separately train the generic training data generation model on the electronic device 100. However, the generic training data generation model may be updated on the server 200, and the updated generic training data generation model may be periodically transmitted to the electronic device 100 to replace or update the existing generic training data generation model.

The generic training data generation model may generate generic training data that reflects characteristics of a generic user. The personal training data generation model may generate personal training data reflecting characteristics of the user of the electronic device 100 (operation S420). The personal training data is training data including characteristics of the user, and may be data for training an artificial intelligence model into a personalized model.

According to an embodiment, an artificial intelligence model may be trained and updated based on user data obtained in a predetermined period of time, generated general training data, and generated personal training data (operation S430). However, according to another embodiment, the artificial intelligence model may also be trained based on the user data and the generated personal training data. The artificial intelligence model may be trained and updated based on at least one of user data, generated generic training data, or generated personal training data, or any combination thereof.

Unlike the generated general training data and the generated individual training data, the user data is actual usage data as raw data without processing. Accordingly, the artificial intelligence model can be trained by using the user data together with the general training data and the individual training data to improve the recognition accuracy of the artificial intelligence model. The user data may be used to train the personal training data generation model and may also be used to train the artificial intelligence model. Operation S430 will be described in detail with reference to fig. 6.

Then, recognition of the user data may be performed based on the trained (personalized and updated) artificial intelligence model (operation S440). Here, the user data is data to be recognized by the artificial intelligence model, and may be, for example, voice data for voice inquiry. User data to be identified by artificial intelligence may also be used to update the personal training data generation model and the artificial intelligence model.

Specifically, in the case where the user's voice is input to the artificial intelligence model, the artificial intelligence model may recognize the user's voice, convert the user's voice into text, and output the text.

FIG. 5 is a diagram for describing a process of training a personal training data generation model according to an embodiment of the present disclosure.

The processor 120 may load the user data stored in the memory 110 (operation S510). For example, the processor 120 may load user data stored in the memory 110 external to the processor 120 into an internal memory (not shown) of the processor 120.

Here, the user data may be data that is input according to a request from the processor 120. For example, in the case where the user data is user voice data, the processor 120 may request the user to speak a text including a predetermined word or sentence to obtain the user voice data as training data. As an example, the processor 120 may display a predetermined word or sentence on a display (not shown) to guide the user to speak the predetermined word or sentence, or may output a voice through a speaker (not shown) to request the user to speak the predetermined word or sentence. The processor 120 may store the user speech data input as described above in the memory 110 and then load the stored user speech data to train the personal training data generating model. Accordingly, the user voice data stored in the memory 110 may be loaded to the processor 120. However, the present disclosure is not limited thereto, and in the case where the user data is user handwriting data, characters or numbers handwritten by the user according to a request from the processor 120 may be input to the electronic device 100.

Further, the processor 120 may load the personal training data generation model stored in the memory 110 (operation S520). The order of operations S510 and S520 may be changed.

Then, the personal training data generation model may be trained based on the user data (operation S530). In particular, the personal training data generation model may be trained using a user data distribution. For example, the personal training data generation model may be updated using a frequency distribution in the user speech data. The frequency distribution in speech is different for each person. Therefore, in the case of training the personal training data generation model based on the frequency distribution characteristics, the voice characteristics of the user of the electronic apparatus 100 can be accurately reflected.

Thus, the personal training data generating model may be trained as a model that is personalized for the user.

Then, the trained personal training data generation model may generate personal training data (operation S540). The personal training data generating model is trained to reflect characteristics of a user of the electronic device 100, and thus, the personal training data generated by the personal training data generating model may be similar to the user data. As an example, the personal training data may be generated as speech that closely resembles the speech spoken directly by the user. As another example, the personal training data may be generated as text having a form that is very similar to the form of text that the user directly writes. The similarity between the user data and the personal training data may be determined based on results of the updating of the personal training data generation model.

The generated personal training data may be used to train the artificial intelligence model together with the generic training data and the user data (operation S550).

Further, the personal training data generation model may be trained and updated based on the generated personal training data in addition to the user data.

The processor 120 may then store the trained personal training data generating model in the memory 110 (operation S560).

The personal training data generation model may be updated by continuously repeating the above-described operations S510 to S560.

FIG. 6 is a diagram for describing a process of training an artificial intelligence model according to an embodiment.

The processor 120 may load the training data stored in the memory 110 (operation S610). In particular, processor 120 may load user data obtained from a user, as well as general training data and personal training data generated by a training data generation model, respectively.

The processor 120 may identify whether the artificial intelligence model is in a trainable state (operation S620). The processor 120 may identify whether the computational resources required to train the artificial intelligence model are adequately ensured. For example, because a large amount of computing resources are used, the artificial intelligence model may be identified as being in a trainable state based on a relatively low number of operations performed in addition to the operations of training the artificial intelligence model on the electronic device 100.

In particular, the trainable state of the artificial intelligence model may include a situation in which the electronic device 100 is in a charging state, a situation in which a predetermined time occurs, or a situation in which there is no user manipulation within a predetermined time. Because power is supplied to the electronic device 100 while the electronic device 100 is in a charged state, the electronic device 100 may not be turned off during training of the artificial intelligence model. Further, the predetermined time may be, for example, a time when the user starts sleeping. As an example, in a case where the predetermined time is 1 am, the artificial intelligence model may be trained based on training data of 1 am. The predetermined time may be obtained by monitoring a mode of using the electronic apparatus 100 or may be a time set by a user. Further, in the case where there is no user manipulation for a predetermined time, it is recognized that there may be almost no user manipulation even after, and thus the artificial intelligence model can be trained. As an example, the artificial intelligence model may be trained without user manipulation of the electronic device 100 within an hour.

Further, according to an embodiment, the training of the artificial intelligence model may be performed in a case where a learning start condition other than the trainable state of the artificial intelligence model is additionally satisfied. The learning start condition may include at least one of a case where a command for training the artificial intelligence model is input, a case where a recognition error of the artificial intelligence model occurs a predetermined number of times or more, or a case where a predetermined amount or more of training data is accumulated. For example, the learning start condition may be satisfied in a case where the recognition error occurs five or more times within a predetermined time, or in a case where 10MB or more of user voice data is accumulated.

In the event that it is recognized that the artificial intelligence model is not in a trainable state (operation S620 — no), the processor 120 may store the loaded training data (operation S625). For example, the processor 120 may store the loaded user voice data in the memory 110 again.

In the case where it is recognized that the artificial intelligence model is in a trainable state (operation S620 — yes), the processor 120 may load the artificial intelligence model stored in the memory 110 (operation S630).

The loaded artificial intelligence model may be trained based on the user speech data, the generic training data, and the personal training data (operation S640). In particular, the artificial intelligence model may be trained sequentially using user speech data, generic training data, and personal training data.

The processor 120 may store the trained artificial intelligence model in the memory 110 (operation S650).

Then, according to an example, the processor 120 may delete the training data from the memory 110 (operation S660). For example, processor 120 may delete user speech data stored in memory 110. In other words, user speech data used to train the personal training data generation model and the artificial intelligence model may be deleted.

According to another example, in the event that the amount of data stored in memory 110 exceeds a predetermined amount, the training data may be sequentially deleted starting with the oldest training data stored in memory 110.

The order of operations S650 and S660 may be changed.

In the above-described embodiment, the case of using personal (general) training data generated and stored in advance in the case where the learning start condition for the artificial intelligence model is satisfied has been described. However, it is also possible to generate and use personal (generic) training data by operating the training data generation module based on the satisfied conditions.

Fig. 7 is a diagram for describing a specific operation performed between a processor and a memory according to an embodiment.

Detailed description of a portion overlapping with the portion described with reference to fig. 4 to 6 will be omitted.

The general training data generating model and the personal training data generating model stored in the memory 110 may be loaded to the processor 120 (r) according to the control of the processor 120.

Then, the general training data generation model and the individual training data generation model loaded to the processor 120 may generate general training data and individual training data, respectively (②).

Further, the artificial intelligence model stored in the memory 110 may be loaded to the processor 120 according to the control of the processor 120 (c). For example, the processor 120 may load the artificial intelligence model stored in the memory 110 external to the processor 120 into an internal memory (not shown) of the processor 120. The processor 120 may then access the artificial intelligence model loaded into the internal memory.

The artificial intelligence model loaded into the processor 120 may be trained based on the user data, the generic training data, and the personal training data (r). Here, the user data may be data loaded from the memory 110.

The processor 120 may store the trained artificial intelligence model in the memory 110 (c). Further, the processor 120 may upload the trained artificial intelligence model to the server 200.

According to an embodiment, the generic training data may include first input data, the personal training data may include second input data, and the artificial intelligence model may perform unsupervised learning based on the user data, the first input data, and the second input data.

For example, assume that the artificial intelligence model is implemented by a speech recognition model. The generic training data may comprise first speech data (first input data) and the personal training data may comprise second speech data (second input data). In this case, the artificial intelligence model can perform unsupervised learning based on the user speech data, the first speech data, and the second speech data.

According to another embodiment, the generic training data may comprise first input data and the personal training data may comprise second input data. In this case, the artificial intelligence model may generate first output data corresponding to the first input data based on the input first input data, and the artificial intelligence model may generate second output data corresponding to the second input data based on the input second input data. The artificial intelligence model can then be trained based on the user data, the first input data, the first output data, the second input data, and the second output data.

For example, assume that the artificial intelligence model is implemented by a speech recognition model. The artificial intelligence model may generate first text data (first output data) corresponding to the first voice data (first input data) based on the input first voice data, and may generate second text data (second output data) corresponding to the second voice data (second input data) based on the input second voice data.

Here, the first text data and the second text data may be generated in the form of a probability distribution. For example, in the case where the first voice data includes voice "meeting room" or voice similar thereto, the first text data may be generated as probability vectors (y1, y2, and y 3). For example, y1 may be a probability that the text corresponding to the first voice data is "conference room", y2 may be a probability that the text corresponding to the first voice data is "lounge room", and y3 may be a probability that the text corresponding to the first voice data is neither "conference room" nor "lounge room". "

The artificial intelligence model may also be trained based on the user speech data, the first speech data and the second speech data, and the first text data and the second text data generated in the form of a probability distribution. Further, the personal training data generation model may also be trained based on second text data generated by the artificial intelligence model.

According to another embodiment, an artificial intelligence model may be trained based on first input data, first label data corresponding to the first input data, second input data, and second label data corresponding to the second input data. Here, the tag data may refer to explicit correct answer data of the input data. For example, in the case where the artificial intelligence model is a speech recognition model, the first tag data may refer to correct answer text data corresponding to the first speech data. As an example, in the case where the user is requested to speak the displayed text, the spoken speech may be user speech data, and the displayed text may be text data corresponding to the user speech data. Alternatively, text corresponding to the user's voice may be displayed as output on a display (not shown), and the electronic device 100 may receive feedback from the user for the text. For example, in the case where the output text corresponding to the user voice is "nail this to me" and the text received as feedback from the user is "mail this to me", the "mail this to me" may be text data corresponding to the user voice data as a tag.

Similarly, the generic training data may include first speech data and first label data corresponding to the first speech data.

In other words, the generic training data may include first speech data and first text data corresponding to the first speech data, and the personal training data may include second speech data and second text data corresponding to the second speech data. Hereinafter, for convenience of explanation, in the case where training data is configured as pairs, general training data is described as a general training data pair, and personal training data is described as a personal training data pair.

The artificial intelligence model may be trained based on a generic training data pair and a personal training data pair and a user speech data pair, wherein the generic training data pair and the personal training data pair each include speech data and text data corresponding to the speech data.

In particular, the artificial intelligence model can perform supervised learning, wherein first speech data is input to the artificial intelligence model and the output text data is compared with the first text data. Similarly, the artificial intelligence model can perform supervised learning, wherein second speech data is input to the artificial intelligence model and the output text data is compared to the second text data.

Although the case of user data and user voice data has been described, the present disclosure is not limited thereto as long as training data can be configured as a pair. For example, the personal training data may include handwriting data generated similarly to the user's handwriting and text data corresponding to the handwriting data. Here, the text data may be data input as a label of handwriting data.

Fig. 8 is a diagram for describing a case where an artificial intelligence model is implemented by an object recognition model according to another embodiment.

FIG. 8 is a diagram illustrating digital writing. For example, the handwriting shown in fig. 8 may be handwriting directly written by the user or handwriting generated by the personal training data generation model. Similar to speech, handwriting may be different for each user, and thus, in the case where an artificial intelligence model is trained to be personalized according to characteristics of the user, recognition accuracy of the artificial intelligence model may be improved. The process of training the object recognition model is the same as the process described above in fig. 4.

In particular, the personal training data generation model may be trained based on user handwriting data. Here, the user handwriting data may be data including handwriting input according to a request from the processor 120. For example, the processor 120 may request that the user write numbers from 0 to 9. In this case, the digital handwriting may be input to the touch display with a stylus, or an image including the digital handwriting written with the pen may be input to the electronic device 100.

The personal training data generation model may then be trained based on the user handwriting data. Thus, the personal training data generating model may be trained as a model that is personalized for the user. The trained generic training data generation model and the trained personal training data generation model may generate generic training data and personal training data, respectively. Here, the general training data may be training data reflecting handwriting characteristics of a general user, and the personal training data may be training data reflecting handwriting characteristics of a user of the electronic apparatus 100. The artificial intelligence model may be trained based on the general training data, the personal training data, and the user handwriting data. Therefore, the trained artificial intelligence model reflects the handwriting characteristics of the user, so that the handwriting recognition accuracy of the user can be improved.

However, the present disclosure is not so limited and the artificial intelligence model may be trained based on handwriting data and personal training data.

Although the case of using digital writing has been described in fig. 8, the embodiments may also be applied to a recognition model that recognizes writing such as characters of an alphabet and various other object recognition models.

The personal training data generation model may be trained based on user data stored in a user data store. The generic training data generative model and the trained personal training data generative model downloaded from the server may be stored in a training data generative model store. Here, the training data generation model store may be, but is not limited to, physically the same memory as the user data store, and may be different from the user data store. The generic training data generation model and the personal training data generation model may generate generic training data and personal training data, respectively. Here, each of the general training data generation model and the personal training data generation model may be implemented by a variational auto-encoder (VAE), a Generative Adaptive Network (GAN), or the like, but is not limited thereto.

The artificial intelligence model may then be updated based on the user data, the generic training data, and the personal training data stored in the user data store.

The updated artificial intelligence model may be periodically uploaded to the server 200. For example, the artificial intelligence model may be uploaded to the server 200 based on at least one of the elapse of a predetermined period of time or the amount of data learned by the artificial intelligence model being a predetermined amount.

A new (or updated) version of the artificial intelligence model may be sent from the server 200 to the electronic device 100. In this case, the existing trained artificial intelligence model may be replaced (or updated) by a new (or updated) version of the artificial intelligence model sent from the server 200. The new (or updated) version of the artificial intelligence model may be trained to reflect the personal characteristics of the user, and may be trained relatively quickly based on personal training data generated by the pre-trained personal training data generation model, even if user data (e.g., user speech data stored as a WAV file or MP3 file) is not stored in memory 110. In other words, since the individual training data generation model is trained based on the user data, even in the case where the artificial intelligence model is trained based on the individual training data generated by the individual training data generation model, it is possible to obtain an effect similar to that obtained in the case where the artificial intelligence model is trained based on the user data. Therefore, even in the case where the artificial intelligence model is replaced (or updated) by a new version of the artificial intelligence model, the electronic device 100 can be trained to reflect personal characteristics based on the generated personal training data without storing a large amount of user data.

In addition to the memory 110 and the processor 120, the electronic device 100 may also include a communication interface (not shown), a display (not shown), a microphone (not shown), and a speaker (not shown).

The communication interface includes a circuit system, and can perform communication with the server 200.

The communication interface may include a wireless fidelity (Wi-Fi) module (not shown), a bluetooth module (not shown), an Infrared (IR) module, a Local Area Network (LAN) module, an ethernet module, and the like. Here, each communication module may be implemented in the form of at least one hardware chip. In addition to the above-described communication manner, the wireless communication module may include at least one communication chip that performs communication according to various wireless communication protocols, such as Zigbee, Universal Serial Bus (USB), mobile industry processor interface camera serial interface (MIPI CSI), third generation (3G), third generation partnership project (3GPP), Long Term Evolution (LTE), LTE-advanced (LTE-a), fourth generation (4G), and fifth generation (5G). However, this is merely an example, and the communication interface may use at least one of various communication modules. In addition, the communication interface may communicate with the server in a wired manner.

The communication interface may receive the general training data generation model, the personal training data generation model, and the artificial intelligence model from the server 200 through wired or wireless communication. Further, the communication interface may send the trained artificial intelligence model to the server 200.

The display may be implemented by a touch screen in which the display and the touch panel are configured as an inter-layer structure (mutual layer structure). Here, the touch screen may be configured to detect a position and an area of a touch and a pressure of the touch. For example, the display may detect handwriting input by a stylus.

A microphone is a component configured to receive a user's voice. The received user speech may be used as training data or text corresponding to the user speech may be output according to a speech recognition model.

The method according to the various embodiments of the present disclosure described above may be implemented in the form of an application that is installable in an existing electronic device.

Furthermore, the methods according to the various embodiments of the present disclosure described above may be implemented by software upgrade or hardware upgrade of an existing electronic device.

Furthermore, the various embodiments of the present disclosure described above may also be implemented by an embedded server provided in the electronic device or at least one external server of the electronic device.

In accordance with embodiments of the present disclosure, the various embodiments described above may be implemented by software including instructions stored in a machine-readable storage medium (e.g., a non-transitory computer-readable storage medium). A machine is a device that can call stored instructions from a storage medium and can operate according to the called instructions. The machine may include an electronic device in accordance with the disclosed embodiments. In the case of a command being executed by a processor, the processor may directly execute a function corresponding to the command, or other components may execute a function corresponding to the command under the control of the processor. The commands may include code created or executed by a compiler or interpreter. The machine-readable storage medium may be provided in the form of a non-transitory storage medium. Herein, the term "non-transitory" refers to a storage medium that is tangible and does not distinguish whether data is stored semi-permanently or temporarily on the storage medium. By way of example, a "non-transitory storage medium" may include a buffer in which data is temporarily stored.

Furthermore, according to embodiments of the present disclosure, the method according to the various embodiments described above may be included and provided in a computer program product. The computer program product may be traded as a product between a seller and a buyer. The computer program product may be distributed in the form of a storage medium readable by a machine (e.g., compact disc read only memory (CD-ROM)), or through an application store (e.g., PlayStore)^TM) To be distributed online. In case of online distribution, at least a part of the computer program product may be at least temporarily stored in a storage medium (such as a memory of a server of a manufacturer, a server of an application store or a relay server) or temporarily created.

Further, according to the embodiments of the present disclosure, the various embodiments described above can be implemented in a computer or a computer-readable recording medium using software, hardware, or a combination of software and hardware. In some cases, the embodiments described in this disclosure may be implemented by the processor itself. According to a software implementation, embodiments such as procedures and functions described in the present disclosure may be implemented by separate software modules. Each of the software modules may perform one or more of the functions and operations described in this disclosure.

Computer instructions for performing processing operations of a machine according to various embodiments of the present disclosure described above may be stored in a non-transitory computer readable medium. According to various embodiments described above, computer instructions stored in a non-transitory computer readable medium, when executed by a processor of a particular machine, allow the particular machine to perform processing operations in the machine.

A non-transitory computer readable medium may be a medium in which data is semi-permanently stored and readable by a machine. Specific examples of the non-transitory computer readable medium may include Compact Discs (CDs), Digital Versatile Discs (DVDs), hard disks, blu-ray discs, Universal Serial Buses (USB), memory cards, ROMs, and the like.

Further, each of the components (e.g., modules or programs) according to the various embodiments described above may include a single entity or multiple entities, and some of the respective sub-components described above may be omitted, or other sub-components may be further included in various embodiments. Alternatively or additionally, some of the components (e.g., modules or programs) may be integrated into one entity, and the functions performed by the respective components before being integrated may be performed in the same or similar manner. Operations performed by modules, programs, or other components according to various embodiments may be performed in a sequential manner, in a parallel manner, in an iterative manner, or in a heuristic manner, at least some of the operations may be performed in a different order, or at least some of the operations may be omitted, or other operations may be added.

Although embodiments of the present disclosure have been shown and described hereinabove, the present disclosure is not limited to the specific embodiments described above, and may be variously modified by those skilled in the art to which the present disclosure pertains without departing from the gist of the present disclosure disclosed in the appended claims. Such modifications should also be understood to fall within the scope and spirit of the present disclosure.

Claims

1. An electronic device, comprising:

a memory configured to store one or more training data generation models and artificial intelligence models; and

a processor configured to:

generating personal training data reflecting characteristics of the user using one or more training data generation models;

training an artificial intelligence model using the personal learning data as training data; and

the trained artificial intelligence model is stored in a memory.

2. The electronic device of claim 1, wherein the one or more training data generating models comprise:

a personal training data generation model trained to generate personal training data reflecting characteristics of the user, an

The general training data generation model is trained to generate general training data corresponding to usage data of a plurality of users.

3. The electronic device of claim 2, wherein the artificial intelligence model is a model updated based on at least one of personal training data, generic training data, or actual user data obtained from a user.

4. The electronic device of claim 3, wherein the personal training data generation model is updated based on at least one of user data or personal training data.

5. The electronic device of claim 1, wherein the artificial intelligence model is a speech recognition model, a handwriting recognition model, an object recognition model, a speaker recognition model, a word recommendation model, or a translation model.

6. The electronic device of claim 3, wherein the generic training data comprises first input data,

the personal training data comprises second input data, an

The artificial intelligence model performs unsupervised learning based on the user data, the first input data, and the second input data.

7. The electronic device of claim 3, wherein the generic training data comprises first input data,

the personal training data comprises second input data, an

The artificial intelligence model generates first output data corresponding to the first input data based on the first input data entered, generates second output data corresponding to the second input data based on the second input data entered, and is trained based on the user data, the first input data, the first output data, the second input data, and the second output data.

8. The electronic device of claim 2, wherein the generic training data generating model is downloaded from a server and stored in memory.

9. The electronic device of claim 8, wherein the processor is configured to upload the artificial intelligence model to a server.

10. The electronic device of claim 1, wherein the processor is configured to train the artificial intelligence model based on the electronic device being in a charging state, the occurrence of a predetermined time, or the absence of detection of user manipulation of the electronic device within the predetermined time.

11. A control method of an electronic device including one or more training data generation models and an artificial intelligence model, the control method comprising:

training an artificial intelligence model using the personal training data as training data; and

the trained artificial intelligence model is stored.

12. The control method of claim 11, wherein the one or more training data generating models comprise:

13. The control method of claim 12, wherein the artificial intelligence model is a model updated based on at least one of personal training data, general training data, or actual user data obtained from a user.

14. The control method of claim 13, wherein the personal training data generation model is updated based on at least one of user data or personal training data.

15. The control method of claim 11, wherein the artificial intelligence model is a speech recognition model, a handwriting recognition model, an object recognition model, a speaker recognition model, a word recommendation model, or a translation model.