WO2024038941A1

WO2024038941A1 - Data providing method and device

Info

Publication number: WO2024038941A1
Application number: PCT/KR2022/012640
Authority: WO
Inventors: 리앙인하오; 주롱차오; 장웨이; 장춘바오; 푸팡규; 쳉웬동
Original assignee: 쿠팡 주식회사
Priority date: 2022-08-17
Filing date: 2022-08-24
Publication date: 2024-02-22
Also published as: KR20240024585A

Abstract

Provided are an electronic device and an operation method thereof, the electronic device: identifying a payment request for at least one item; generating information on an order including order identification information in response to the identified payment request; identifying the state of a circuit breaker and using data acquisition configuration information to collect at least one piece of data from a business system; identifying a feature on the basis of the at least one piece of data by using feature configuration information; storing the feature in a feature database; identifying a feature group including at least one feature relating to a first machine learning model from the feature database; training the first machine learning model on the basis of the feature group; and providing first data by using the trained first machine learning model.

Description

Data provision method and device

This disclosure relates to a data provision method and device.

With the advancement of information and communication technology, the e-commerce market has developed rapidly and has become a field of shopping. Customers can purchase goods online using electronic devices and have them delivered to a desired location. Accordingly, sales brokerage services that broker transactions between sellers and buyers and provide delivery services are becoming active.

In such trading brokerage services, attempts to provide more efficient services to service users by using machine learning are increasing. Machine learning may require the process of acquiring features necessary for learning from a vast amount of data related to services.

In order to calculate the features needed for machine learning, it is necessary to select the data for calculating the features among the service data and calculate the features based on the selected data. However, as services become more complex, the amount of data that can be obtained from the service increases, and the amount of information provided to service users increases, more machine learning is needed to provide services, and machine learning As the number increases, the type and number of features required for machine learning also increase. In these cases, a method is needed so that data scientists or data engineers can manage features more efficiently and utilize them for machine learning.

In relation to the present invention, prior documents such as KR 10-2018-0039013 A and KR 10-2021-0124377 A may be referred to.

The object of the present invention is to manage features used in machine learning based on settings, group one or more features necessary for machine learning, and provide a method and device that can be used for machine learning.

The technical problem to be achieved by the present invention is not limited to the problems described above, and other technical problems can be inferred from the following examples.

According to one embodiment, a method of providing data by an electronic device includes collecting at least one data from a business system using data acquisition setting information; Confirming a feature based on at least one data using feature setting information and storing the feature in a feature database; identifying a feature group including at least one feature related to a first machine learning model from a feature database; And it may include training a first machine learning model based on the feature group and providing first data using the learned first machine learning model.

Additionally, the data acquisition setting information may include information about the repository and information about at least one target field of the repository, and the data providing method may include confirming at least one data from the at least one target field of the repository. You can.

Additionally, the data acquisition setting information includes information about at least one source field and mapping information, each of the at least one source fields corresponds to a path to load each of at least one data in the business system, and the mapping information includes at least one includes information mapping each of the source fields to each of at least one target field, and the data providing method includes: loading each of at least one data from the business system based on a path corresponding to each of the at least one source field; It may include storing each of the at least one data in at least one target field of the storage based on the mapping information.

Additionally, the path may include a JSON (JavaScript Object Notation) path.

Additionally, the repository includes a first detailed repository and a second detailed repository, and the data providing method includes checking at least one data from the first detailed repository if the feature is of a first type, and checking at least one data from the first detailed repository if the feature is of a second type. 2 It may include the step of checking at least one data from the detailed repository.

Additionally, the feature setting information includes a script, and the data providing method may include calculating a feature based on at least one data using a script.

Additionally, the feature setting information includes filtering condition information and a calculation function, and the data providing method includes filtering at least one data using the filtering condition information and confirming the filtered data; and calculating features based on the filtered data using a calculation function.

Additionally, the feature setting information includes information about a specific time section, and the data providing method may include calculating a feature based on data of a specific time section among at least one piece of data.

Additionally, if the feature is of a first type, the storing step may be performed in a first cycle, and if the feature is of a second type, the storing step may be performed in a second cycle.

Additionally, the electronic device may include a memory storing at least one program; and executing at least one program to collect at least one data from the business system using the data acquisition setting information, identify a feature based on the at least one data using the feature setting information, and store the feature in a feature database. Store, check a feature group including at least one feature related to the first machine learning model from the feature database, train the first machine learning model based on the feature group, and use the learned first machine learning model. It may include a processor that provides first data.

Additionally, a non-transitory computer-readable recording medium can record a program for executing the above-described operation method on a computer.

Details of other embodiments are included in the detailed description and drawings.

According to the present invention, users can create and manage features without separate coding and use them for machine learning, which has the effect of saving programmer or developer resources.

Additionally, according to the present invention, features can be created and managed centrally, which has the effect of eliminating the need for users to directly manage various types of databases used to store various types of features.

In addition, according to the present invention, the user can directly write a script and check or edit the written script, which has the effect of easily managing the calculation logic of features.

Additionally, according to the present invention, tens to thousands of features required for machine learning for one business scenario can be grouped, which has the effect of allowing the features required for machine learning to be managed more conveniently.

Additionally, according to the present invention, it is possible to recycle features required for machine learning for other business scenarios, which has the effect of saving hardware resources or computing resources.

In addition, according to the present invention, features are divided into three types: realtime features, near-realtime features, and non-realtime features, and features are calculated at different intervals depending on the type of feature. Alternatively, there is an effect of optimizing the use of hardware resources by using another database.

The effect of the invention is not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the claims.

1 shows an embodiment of an electronic device according to the present disclosure.

Figure 2 shows an embodiment of a data provision system according to the present disclosure.

Figure 3 shows an embodiment of a data providing system according to the present disclosure.

Figure 4 shows an embodiment of a data provision system according to the present disclosure.

Figure 5 shows an embodiment of a data provision system according to the present disclosure.

Figure 6 shows an embodiment of a data providing system according to the present disclosure.

Figure 7 shows an example of a data provision method according to the present disclosure.

The embodiments described in this disclosure are illustrative rather than limiting, and those skilled in the art may design many alternative embodiments without departing from the scope of the disclosure as defined by the appended claims. there is. The terms used in the embodiments are general terms that are currently widely used as much as possible while considering the functions in the present disclosure, but this may vary depending on the intention or precedent of a person working in the art, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the relevant description. Therefore, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of this disclosure, rather than simply the name of the term.

As used herein, the singular expressions singular and plural include both the singular and the plural, unless the context clearly states otherwise.

Throughout the specification, when a part is said to “include” certain elements or certain steps, this does not necessarily mean that any part must include all of the elements or steps, unless specifically stated to the contrary, and is not included in the claims. Additionally, it does not exclude the inclusion of components or steps other than those listed throughout the specification, but only means that they may be further included.

Additionally, terms containing ordinal numbers, such as first, second, etc., used in this specification may be used to describe various components, but the components should not be limited by terms containing the ordinal numbers. The above terms are used in context only to distinguish one element from another element in one part of the specification. For example, without departing from the scope of the present invention, a first element may be referred to as a second element in other parts of the specification, and conversely, the second element may also be referred to as a first element in other parts of the specification. It can be.

In this specification, terms such as “mechanism,” “element,” “means,” and “configuration” may be used broadly and are not limited to mechanical and physical configurations. The term may include the meaning of a series of software routines in connection with a processor, etc.

In this specification (particularly in the claims), the use of the term “above” and similar referential terms may refer to both the singular and the plural. In addition, when a range is described, it includes individual values within the range (unless there is a statement to the contrary), which is the same as describing each individual value constituting the range in the detailed description. Lastly, unless the order of the steps constituting the method is clearly stated or stated to the contrary, the steps may be rearranged and performed in an appropriate order, and are not necessarily limited to the order of description of the steps. The use of any examples or illustrative terms (e.g., etc.) is merely for illustrating the technical idea in detail, and the scope is not limited by the examples or illustrative terms unless limited by the claims. A person skilled in the art can add various modifications, combinations, and changes to the embodiments disclosed in this specification according to design conditions and factors to construct new embodiments that fall within the scope of the patent claims or their equivalents.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.

1 illustrates an example, simplified block diagram of an electronic device 100 that may be used to practice at least one embodiment of the present disclosure. In various embodiments, electronic device 100 may be used to implement any system or method described in this disclosure. For example, electronic device 100 may include any data server, web server, portable computing device, personal computer, tablet computer, workstation, mobile phone, smart phone, or any other device described below. It can be configured to be used as an electronic device.

Electronic device 100 may include memory 120 and one or more processors 110 having one or more cache memories and a memory controller that may be configured to communicate with memory 120 . Additionally, the electronic device 100 may be connected to the electronic device 100 through one or more ports (e.g., Universal Serial Bus (USB), headphone jack, Lightning connector, Thunderbolt connector, etc.). May include devices. A device that can be connected to electronic device 100 can include a plurality of ports configured to receive fiber optic connectors. The configuration of electronic device 100 shown is intended as a specific example only for the purpose of illustrating preferred embodiments of the device. In the illustrated electronic device 100, only components related to the present embodiments are shown. Accordingly, it is obvious to those skilled in the art that the electronic device 100 may further include other general-purpose components in addition to the components shown.

Processor 110 may be used to cause electronic device 100 to provide the steps or functions of any embodiment described in this disclosure. For example, the processor 110 generally controls the electronic device 100 by executing programs stored in the memory 120 within the electronic device 100. The processor 110 may be implemented as a central processing unit (CPU), a graphics processing unit (GPU), an application processor (AP), etc. provided in the electronic device 100, but is not limited thereto.

The memory 120 is hardware that stores various data processed within the electronic device 100. The memory 120 can store data processed through the processor 110 and data to be processed in the electronic device 100. there is. In addition, the memory 120 stores basic programming and data structures that can provide the functions of at least one embodiment of the present disclosure, as well as applications (programs, code modules) that can provide the functions of the embodiments of the present disclosure. , commands), drivers, etc. can be saved. The memory 120 includes random access memory (RAM) such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), CD- It may include ROM, Blu-ray or other optical disk storage, a hard disk drive (HDD), a solid state drive (SSD), or flash memory.

The method and each step according to the present disclosure can be performed by the electronic device 100 or the processor 110. The electronic device 100 operates under the control of the processor 110. To simplify the explanation, the following description will be made on the assumption that the electronic device 100 is the entity that performs the method and each step according to the present disclosure.

Figure 2 shows an embodiment of a data providing system 2000 according to the present disclosure. The data provision system 2000 may be included in the electronic device 100.

In one embodiment, the electronic device 100 collects at least one data about a feature from a business system using data acquisition setting information, verifies a feature based on the at least one data using feature setting information, Store the feature in a feature database, identify a feature group including at least one feature related to the first machine learning model from the feature database, train the first machine learning model based on the feature group, and use the learned first machine First data can be provided using a learning model.

In one embodiment, the data provision system 2000 may include a business system 2100, a feature system 2200, and a machine learning platform 2300.

Business system 2100 may include data about e-commerce services or businesses. For example, the business system 2100 may include data such as expected delivery time, delivery route, or item price. This may be data about features.

The feature system 2200 uses model learning 2312 or online model service 2311 of the machine learning platform 2300 based on data about the business or data about features (in other words, feature data) of the business system 2100. You can check the features used.

The machine learning platform 2300 can use features as input to a machine learning model, and can use the features to learn a model (2312) or service the model online (2311).

In one embodiment, the user 2600 may set the data provision system 2000 based on the requirements 2500. The requirements 2500 may be defined to apply a machine learning algorithm to various data of the business system 2100.

In one embodiment, data engineer 2610 may manage collection 2231 of feature data. That is, the data engineer 2610 can manage settings necessary for collecting feature data, such as creating, deleting, and changing them. Settings necessary for collection of feature data may be included in feature setting information. Settings required for collection of feature data will be described later.

In one embodiment, a data engineer or scientist 2620 may manage the configuration of features. In other words, the data engineer or scientist 2620 can manage settings such as creating, deleting, and changing settings necessary for calculating features using feature data. Settings required to calculate features using feature data may be included in feature setting information. The settings required to calculate features will be explained later.

In one embodiment, scientist 2630 may set up 2321 a machine learning model. In other words, the scientist 2630 can use features to manage settings necessary for training a machine learning model, such as creating, deleting, and changing them.

Fisher system 2210 may include feature data collection 2210, feature runtime 2220, and feature management 2230. The feature data collection 2210, feature runtime 2220, and feature management 2230 may be modules physically included in the electronic device 100 or logical modules stored in the memory 120 of the electronic device 100. .

Feature data of the business system 2100 is collected by feature data collection 2210. At this time, feature data of the business system 2100 may include non-real-time data, near-real-time data, and real-time data. Although not limited, non-real-time data refers to data updated on a daily basis, near-real-time data refers to data updated in minutes to tens of minutes, and real-time data refers to data updated in several milliseconds to several seconds. It can mean.

Non-real-time data may be stored in storage 2214 via non-real-time channel 2211. At this time, an event streaming platform such as Apache Kafka or a data warehouse solution such as Hive can be used.

Near-real-time data may be stored in storage 2214 via near-real-time channel 2212. At this time, an event streaming platform such as Apache Kafka or a column-based database such as ClickHouse can be used.

Real-time data may be stored in storage 2214 through real-time channel 2213. At this time, an event streaming platform such as Apache Kafka or a data engine such as Apache Spark can be used.

Feature data may be stored in storage 2214. For example, the storage 2214 may include elastic, graphDB, Apache HBase, etc. In one embodiment, repository 2214 includes a first detailed repository and a second detailed repository, and electronic device 100 determines at least one data from the first detailed repository if the feature is of a first type, and If is the second type, at least one data can be confirmed from the second detailed storage. For example, storage 2214 includes a non-real-time feature data store, a near-real-time data store, and a real-time data store, and electronic device 100 may store at least one of the non-real-time data stores if the feature is a non-real-time feature. Identify one non-real-time data, and if the feature is a near-real-time feature, identify at least one near-real-time data from a near-real-time data store, and if the feature is a real-time feature, at least one real-time data from a real-time data store You can check. In this way, there is an effect of increasing the efficiency of data input and output by using different storage depending on the type of data.

The feature runtime 2220 may check the feature data stored in the storage 2214 and calculate the feature using the calculation engine 2221. For example, the compute engine may include Apache Spark or Hive. The feature engine 2221 may store calculated non-real-time features, near-real-time features, and real-time features in the feature database 2224 and provide them to the feature service 2223. For example, the feature database 2224 may include databases such as redis and Cassandra. The feature service 2223 may provide features to the machine learning platform 2300.

In one embodiment, the settings or data acquisition settings information required for collection of feature data include information about the storage 2214 and information about at least one target field of the storage 2214, and the electronic device 100 includes the storage 2214. At least one data or feature data can be confirmed from at least one target field. The target field may refer to a field of feature data stored in the storage 2214.

In one embodiment, settings or data acquisition setting information required for collection of feature data include information and mapping information for at least one source field, and each of the at least one source field is at least one data in the business system 2100. each corresponds to a path to be loaded, the mapping information includes information mapping each of at least one source field to each of at least one target field, and the electronic device 100 configures the path corresponding to each of at least one source field. Based on this, each of the at least one data can be loaded from the business system 2100, and each of the at least one data can be stored in at least one target field of the storage 2214 based on the mapping information. The source field may refer to a field of feature data stored in the business system 2100. Data acquisition setting information may be included in feature data collection management 2231. The data engineer 2610 creates or modifies data acquisition setting information to determine which target field of the storage 2214 to store at least one data or feature data stored in which source field of the business system 2100. can be decided. The electronic device 100 may transfer and store at least one data or feature data from the source field of the business system 2100 to the target field of the storage 2214 using the mapping information of the data acquisition setting information.

In one embodiment, the path may include a JavaScript Object Notation (JSON) path, but is not limited to this, and the path may include any representation of the storage location of data stored in a database or computer system.

In one embodiment, the electronic device 100 may use feature setting information to identify a feature based on at least one piece of data. Specifically, the calculation engine 2221 of the electronic device 100 may use feature setting information that may be included in the feature setting management 2232 to calculate a feature based on at least one data or feature data. The data engineer or scientist 2620 can create and edit feature setting information to preset which feature data will be used to calculate which feature and the calculation method.

In one embodiment, feature setting information includes a script, and the electronic device 100 may calculate a feature based on at least one data using the script. A data engineer or scientist 2620 may write a script related to feature calculation and include it in feature setting management 2232. The electronic device 100 may calculate features using a script entered into feature setting management 2232. A script can be anything that contains statements related to calculations, such as program commands or calculation formulas. The data engineer or scientist 2620 inputs a script related to feature calculation through a separate interface, etc., and the electronic device 100 includes the input script in the feature setting management 2232 and sets the feature using the script. It can be calculated. This has the advantage of not wasting the programmer's human resources because the programmer does not have to separately code the script.

In one embodiment, the feature setting information includes filtering condition information and a calculation function, and the electronic device 100 filters at least one data using the filtering condition information to check the filtered data and filters the calculation function. Features can be calculated based on the data.

Filtering condition information may include conditions for filtering feature data based on specific criteria. For example, filtering condition information may be set to filter only feature data with an age of 20 years or older. The data engineer or scientist 2620 can set filtering condition information through a separate interface, etc. to set the feature setting information to extract only feature data that matches specific conditions.

Calculation functions can contain simple calculation logic such as sum or average. A data engineer or scientist 2620 may select a calculation function for calculating features based on feature data through a separate interface. The selected calculation function is included in the feature setting information.

In one embodiment, feature setting information includes information about a specific time period, and the electronic device 100 may calculate a feature based on data of a specific time period among at least one piece of data. For example, information about a specific time section may be a time section from 20 minutes before the current time to 10 minutes before the current time. The electronic device 100 checks the specific time section information of the feature setting information and uses only at least one data or data corresponding to the time section from 20 minutes before to 10 minutes before the current time among the feature data to set the feature. can be calculated. A data engineer or scientist 2620 can set specific time section information of feature setting information through a separate interface.

In one embodiment, if the feature is of a first type, the electronic device 100 checks the feature in a first cycle and stores the feature in the feature database, and if the feature is of a second type, checks the feature in a second cycle. Features can be stored in a feature database. Specifically, if the feature is a non-real-time feature, the electronic device 100 may calculate the non-real-time feature in a first cycle (preferably once a day) and store the non-real-time feature in the feature database. there is. Also specifically, when the feature is a near-real-time feature, the electronic device 100 calculates the near-real-time feature in a second period (preferably once every several hours, several tens of minutes, or several minutes), and Real-time features can be stored in the feature database. Also specifically, when the feature is a real-time feature, the electronic device 100 calculates the real-time feature in a third cycle (preferably once every several seconds or tens to hundreds of milliseconds) and stores the real-time feature in the feature database. You can save it.

In one embodiment, the electronic device 100 may check a feature group including at least one feature related to the first machine learning model from the feature database. At least one feature may be managed by being included in a feature group. A feature group may correspond to a machine learning model. For example, the first machine learning model corresponds to the first feature group, and at least one feature included in the first feature group will be provided for training 2312 and online model service 2311 of the first machine learning model. You can.

A data engineer or scientist 2620 can manage a feature group through feature setting management 2232. Specifically, the data engineer or scientist 2620 can create, edit, or delete feature groups. A data engineer or scientist 2620 can define which features will be included in the feature group through a separate interface, etc. At this time, information about the source field or target field corresponding to the feature included in the feature group may be included in the feature group information.

In one embodiment, the electronic device 100 may train a first machine learning model based on a feature group and provide first data using the learned first machine learning model. Model settings 2321 may include setting information about the first machine learning model and information about features or feature groups to be used for learning and providing the first machine learning model. For example, the model setting 2321 may include identification information of the first feature group, and the electronic device 100 may check the identification information of the first feature group of the model setting 2321 to create a first machine learning model. At least one feature included in the first feature group can be used for learning and providing.

Figure 3 shows an embodiment of a data providing system 3000 according to the present disclosure. The data provision system 3000 may be included in the electronic device 100.

The data provision system 3000 may include a management function 3100. The management function 3100 may include a feature meta management function 3110 and a data collection meta management function 3120. The feature meta management function 3110 may include a feature management function 3111, a group management function 3112, a view management function 3113, a version management function 3114, or a permission control management function 3115. The data collection meta management function 3120 may include data source management 3121, business model management 3122, collection rule management 3123, or DSL 3124.

The data providing system 3000 may include a data source 3200. The data source 3200 may include an order data source 3210, a delivery data source 3220, an automatically allocated data source 3230, or a delivery driver data source 3240.

The data providing system 3000 may include a channel 3300. Channel 3300 may include Kafka/Canal (3310) or Amazon Se (3320). The electronic device 100 may retrieve feature data from the data source 3200 through the channel 3300.

The data providing system 3000 may include a storage 3400. Storage 3400 may include Redis (3410), Clickhouse (3420), HBase (3430), or Hive (3440). Feature data retrieved through the channel 3300 may be stored in the storage 3400.

The data providing system 3000 may include a calculation engine 3500. Computation engine 3500 may include a real-time engine pipeline 3510, a near-real-time engine pipeline 3520, or a non-real-time engine pipeline 3530. The real-time engine pipeline 3510, the near-real-time engine pipeline 3520, and the non-real-time engine pipeline 3530 may each be used to calculate real-time features, near-real-time features, and non-real-time features, respectively. The calculation engine 3500 may calculate features based on feature data stored in the storage 3400.

The data provision system 3000 may include a serving layer 3600. Serving layer 3600 may include feature service 3610. The serving layer 3600 may provide features calculated by the calculation engine 3500. Features may be used to train or provide machine learning models utilized in arrival time scenarios 3810, automatic allocation scenarios 3820, dynamic pricing scenarios 3830, or other scenarios 3840.

The data provision system 3000 may include a governance function 3700. The governance function (3700) includes data assertion function (3710), blood function (3720), data quality function (3730), life cycle function (3740), access control function (3750), alert/alarm function (3760), and metric function. It may include a function 3770 or a monitoring function 3780.

Figure 4 shows an embodiment of a data providing system 4000 according to the present disclosure. The data provision system 4000 may be included in the electronic device 100.

Among the feature collection and calculation processes 4410, the near-real-time process 4410 is as follows. The near-real-time feature data of the business system 4100 may then be format converted 4411 and dimensionally mapped 4412 through Kafka 4210. Format conversion 4411 may mean conversion such as normalization of data. Dimensional mapping 4412 may refer to mapping a source field to a target field. The dimensionally mapped (4412) near-real-time feature data (4413) may be stored in Clickhouse (4414). The job executor 4415 may cause near-real-time features to be calculated based on near-real-time feature data stored in Clickhouse 4414. At this time, the scheduler 4416 may trigger the job executor 4415 at regular time intervals.

In the real-time process 4420 of the feature collection and calculation process 4410, the Spark Streaming (4421) module can acquire real-time feature data from the business system 4100 through Kafka (4210) and then calculate and output the features. there is.

Near-real-time and real-time features can be stored in a feature database (4600) such as Redis (4610) or Cassandra (4620) through Kafka (4510) and used in business scenarios (4800) through the feature API server (4700). It can be used for learning machine learning models.

In the non-real-time process 4430 of the feature collection and calculation process 4410, non-real-time data of the business system 4100 may be temporarily stored in the database 4220. Non-real-time data may be stored back in the data warehouse 4431 and non-real-time features may be calculated through learning 4432.

Non-real-time features may be stored in a feature database 4600 such as Redis 4610 or Cassandra 4620 through the SDK 4520, or stored in Amazon S3 4630.

Figure 5 shows an embodiment of a data providing system 5000 according to the present disclosure. The data provision system 5000 may be included in the electronic device 100.

The data provision system 5000 may include a business system 5100 and other systems 5200. The business system 5100 may include delivery information 5110, consumer information 5120, seller information 5130, management data warehouse 5140, and other information 5150.

The data providing system 5000 may include a feature setting function 5300. The feature setting function 5300 includes the collection meta information management function 5310, near-real-time feature management function 5320, real-time/non-real-time feature management function 5330, feature life cycle management function 5340, or feature lineage management. It may include functions (5350), etc.

In data collection/storage 5400, non-real-time flows 5410, near-real-time flows 5420, and real-time flows 5430 can be distinguished.

In the non-real-time flow 5410, non-real-time feature data may be stored in Hive (5413) via Canel (5411) and Kafka (5412). Spark/Spark Graph (5414) can create a graph from data stored in Hive (5413) and save it in GraphDB (5416). Job Scheduler / Airflow (5415) can pass non-real-time feature data from Hive (5413) to the offline feature engine (5510) of the calculation engine (5500).

In the near-real-time flow 5420, near-real-time feature data may be delivered to ClickHouse 5422 through Kafka 5421 or may be delivered to the streaming framework 5432. ClickHouse 5422 may pass the near-real-time feature data to the near-real-time feature engine 5520 of the calculation engine 5500.

In the real-time flow 5430, real-time feature data may be delivered to the real-time consumer 5433 through the feature service API 5431 and then again to ES 5434, Hbase 5435, or GraphDB 5436. The real-time feature engine 5430 can check real-time features using real-time feature data stored in ES 5434, Hbase 5435, or GraphDB 5436.

The calculation engine 5500 may calculate features using feature data identified in the data collection/storage 5400 and store them in the feature database 5600. The feature database 5600 may include Redis (5610), Cassandra (5620), or Hbase (5630). Features stored in the feature database 5600 may be provided to the machine learning platform 5800 through the feature service 5700.

The machine learning platform 5800 can learn an offline model based on features provided through the feature service 5700 (5810) and service the learned online model (5820). The online model can be utilized in business scenarios 5900. The business scenario 5900 can be used for automatic allocation (5910), delivery route planning (5920), intelligent pricing (5930), intelligent promotion (5940), or intelligent recommendation (5950).

Figure 6 shows an embodiment of a data providing system 6000 according to the present disclosure. The data provision system 6000 may be included in the electronic device 100.

The data provision system 6000 may include an event platform 6100. The event platform 6100 may include an event collection function 6110 and a metadata management function 6120.

Data provision system 6000 may include a fundamental calculation-storage system 6200. The fundamental calculation-storage system 6200 can dimensionally transform or store feature data, event data, or metadata.

Data providing system 6000 may include feature system 6300. The feature system 6300 may calculate features 6360 based on feature data stored in the fundamental calculation-storage system 6200. The feature system 6300 may include a data view management function 6310, a system management function 6320, a feature management 6330 function, a feature service function 6340, and a feature analysis function 6350. Feature system 6300 may include engine 6370.

The data provision system 6000 may include a model service platform 6400 and a machine learning platform 6500. Machine learning models can be used in business scenarios 6600.

FIG. 7 shows a method of operating the electronic device 100 according to an embodiment. Description of each step of the operation method of FIG. 7 that overlaps with the operation of the electronic device 100 described in FIGS. 1 to 6 will be omitted.

In step S710, the electronic device 100 may acquire at least one data from the business system using data acquisition setting information.

In step S720, the electronic device 100 may identify a feature based on at least one piece of data using feature setting information and store the feature in a feature database.

In step S730, the electronic device 100 may check a feature group including at least one feature related to the first machine learning model from the feature database.

In step S740, the electronic device 100 may train a first machine learning model based on the feature group and provide first data using the learned first machine learning model.

Embodiments according to the present disclosure described above may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium or non-transitory recording medium. The computer-readable recording medium or non-transitory recording medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the computer-readable recording medium or non-transitory recording medium may be specially designed and constructed for the present invention or may be known and usable by those skilled in the computer software field. Examples of computer-readable recording media or non-transitory recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. Includes magneto-optical media and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device or electronic device may be configured to operate as one or more software modules to perform processing according to the present disclosure, and vice versa.

This embodiment can be represented by functional block configurations and various processing steps. These functional blocks may be implemented as various numbers of hardware, software, or combinations thereof that execute specific functions. For example, embodiments include integrated circuit configurations such as memory, processing, logic, look-up tables, etc. that can execute various functions under the control of one or more microprocessors or other control devices. can be hired. Similar to how the components can be implemented as software programming or software elements, the present embodiments include various algorithms implemented as combinations of data structures, processes, routines or other programming constructs, such as C, C++, Java ( It can be implemented in a programming or scripting language such as Java), assembler, etc. Functional aspects may be implemented as algorithms running on one or more processors. Additionally, this embodiment may employ conventional technologies for electronic environment setting, signal processing, data processing, or a combination thereof.

Claims

In a method of providing data from an electronic device,

collecting at least one data about a feature from a business system using data acquisition setting information;

Confirming the feature based on the at least one data using feature setting information and storing the feature in a feature database;

identifying a feature group including at least one feature related to a first machine learning model from the feature database; and

A data providing method comprising training the first machine learning model based on the feature group and providing first data using the learned first machine learning model.
According to paragraph 1,

The data acquisition setting information includes information about a storage and information about at least one target field of the storage,

The collecting step is,

A method of providing data, comprising: identifying the at least one data from the at least one target field in the repository.
According to paragraph 2,

The data acquisition setting information includes information about at least one source field and mapping information,

Each of the at least one source field corresponds to a path to load each of the at least one data in the business system,

The mapping information includes information mapping each of the at least one source field to each of the at least one target field,

The collecting step is,

Loading each of the at least one data from the business system based on a path corresponding to each of the at least one source field; and

A method for providing data, comprising storing each of the at least one data in the at least one target field of the storage based on the mapping information.
According to paragraph 3,

A method of providing data, wherein the path includes a JSON (JavaScript Object Notation) path.
According to paragraph 2,

The repository includes a first detailed repository and a second detailed repository,

The step of checking the at least one data includes,

Confirming the at least one data from the first detailed repository if the feature is of a first type, and confirming the at least one data from a second detailed repository if the feature is of a second type. How to provide.
According to paragraph 1,

The feature setting information includes a script,

The saving step is,

A method of providing data, comprising calculating the feature based on the at least one data using the script.
According to paragraph 1,

The feature setting information includes filtering condition information and calculation function,

The saving step is,

filtering the at least one data using the filtering condition information and confirming the filtered data; and

A method of providing data, comprising calculating the feature based on the filtered data using the calculation function.
According to paragraph 1,

The feature setting information includes information about a specific time period,

The saving step is,

A data providing method comprising calculating the feature based on data of the specific time interval among the at least one data.
According to paragraph 1,

If the feature is of a first type, the storing step is performed in a first cycle,

If the feature is of a second type, the storing step is performed in a second cycle.
As an electronic device,

a memory in which at least one program is stored; and

By executing the at least one program,

Collect at least one data from the business system using data acquisition setting information,

Confirming a feature based on the at least one data using feature setting information and storing the feature in a feature database,

Identifying a feature group including at least one feature related to a first machine learning model from the feature database,

An electronic device comprising a processor that trains the first machine learning model based on the feature group and provides first data using the learned first machine learning model.
A non-transitory computer-readable recording medium that records a program for executing a data provision method of an electronic device on a computer,

The method of providing the above data is,

collecting at least one data from a business system using data acquisition setting information;

Confirming a feature based on the at least one data using feature setting information and storing the feature in a feature database;

identifying a feature group including at least one feature related to a first machine learning model from the feature database; and

A non-transitory recording medium comprising training the first machine learning model based on the feature group and providing first data using the learned first machine learning model.