CN113850271A - Online feature library construction method and device and electronic equipment - Google Patents

Online feature library construction method and device and electronic equipment Download PDF

Info

Publication number
CN113850271A
CN113850271A CN202110952910.XA CN202110952910A CN113850271A CN 113850271 A CN113850271 A CN 113850271A CN 202110952910 A CN202110952910 A CN 202110952910A CN 113850271 A CN113850271 A CN 113850271A
Authority
CN
China
Prior art keywords
feature library
feature
stored
features
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110952910.XA
Other languages
Chinese (zh)
Inventor
谢奕
张阳
周炜
杨双全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110952910.XA priority Critical patent/CN113850271A/en
Publication of CN113850271A publication Critical patent/CN113850271A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and a device for constructing an online feature library and electronic equipment, and relates to the technical field of artificial intelligence such as data production, big data and cloud service. The specific scheme is as follows: when an online feature library is constructed, firstly determining the updating frequency of features to be stored; the updating frequency comprises a first updating frequency and a second updating frequency, and the first updating frequency is higher than the second updating frequency; respectively determining a first feature library for storing the features updated based on the first updating frequency and a second feature library for storing the features updated based on the second updating frequency according to the updating frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library; therefore, the requirement of high updating rate can be met through the first feature library, the storage efficiency of the online feature library is improved, the requirement of large storage space can be met through the second feature library, and the problem of insufficient storage space of the online feature library is solved.

Description

Online feature library construction method and device and electronic equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a method and an apparatus for constructing an online feature library, and an electronic device, and in particular, to the fields of artificial intelligence technologies such as data production, big data, and cloud services.
Background
With the increasing of network data volume, how to obtain important information from massive data is a necessary skill for all people.
Since the existing artificial intelligence algorithm depends on features extracted from data, and the extracted features have very important significance for data mining, after important information is acquired from mass data, the features can be further extracted from the important information and the extracted features can be stored for subsequent use.
Therefore, how to build an online feature library for storing features is crucial.
Disclosure of Invention
The disclosure provides a method and a device for constructing an online feature library and electronic equipment.
According to a first aspect of the present disclosure, there is provided a method for constructing an online feature library, which may include:
determining an update frequency of a feature to be stored; wherein the update frequency comprises a first update frequency and a second update frequency, and the first update frequency is higher than the second update frequency.
Respectively determining a first feature library and a second feature library according to the updating frequency; wherein the first feature library is configured to store features that are updated based on the first update frequency, and the second feature library is configured to store features that are updated based on the second update frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
And constructing an online feature library for storing the features based on the first feature library and the second feature library.
According to a second aspect of the present disclosure, there is provided an online storage method of a feature, which may include:
the update frequency of the features to be stored is determined.
And determining a target feature library corresponding to the updating frequency from an online feature library according to the updating frequency.
And storing the features to be stored in the target feature library.
According to a third aspect of the present disclosure, there is provided an online feature library construction apparatus, which may include:
a determination unit configured to determine an update frequency of a feature to be stored; wherein the update frequency comprises a first update frequency and a second update frequency, and the first update frequency is higher than the second update frequency.
The processing unit is used for respectively determining a first feature library and a second feature library according to the updating frequency; wherein the first feature library is configured to store features that are updated based on the first update frequency, and the second feature library is configured to store features that are updated based on the second update frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
A first constructing unit, configured to construct an online feature library for storing the features based on the first feature library and the second feature library.
According to a fourth aspect of the present disclosure, there is provided an online storage device of a feature that may include:
and the determining unit is used for determining the updating frequency of the features to be stored.
And the processing unit is used for determining a target feature library corresponding to the updating frequency from an online feature library according to the updating frequency.
And the storage unit is used for storing the features to be stored in the target feature library.
According to a fifth aspect of the present disclosure, there is provided an electronic apparatus, which may include:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of constructing an online feature library according to the first aspect; or an online storage method enabling the at least one processor to perform the features of the second aspect.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the method for constructing an online feature library according to the first aspect; alternatively, an online storage method implementing the features of the second aspect is described above.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect.
According to the technical scheme, the online feature library can be constructed based on the first feature library and the second feature library, the first feature library can meet the requirement of high updating rate, the storage efficiency of the online feature library is effectively improved, the second feature library can meet the requirement of large storage space, and the problem of the storage space of the online feature library is effectively solved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of a method for constructing an online feature library according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a design of an online feature library provided by an embodiment of the present disclosure;
FIG. 3 is a schematic design diagram of an online feature library and an offline feature library provided by an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart diagram of an online storage method of features provided in accordance with a second embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an online feature library construction device provided according to a third embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an online storage device according to features provided in a fourth embodiment of the present disclosure;
fig. 7 is a schematic block diagram of an electronic device provided in accordance with an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In embodiments of the present disclosure, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present disclosure, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, in the embodiments of the present disclosure, "first", "second", "third", "fourth", "fifth", and "sixth" are only used to distinguish the contents of different objects, and have no other special meaning.
The technical scheme provided by the embodiment of the disclosure can be applied to scenes such as big data. With the increasing amount of network data, the massive network data usually includes some redundant data, so that important information can be extracted from the massive data. Since the existing artificial intelligence algorithm depends on features extracted from data, and the extracted features have very important significance for data mining, the features can be further extracted from the extracted important information, and the extracted features can be stored for later use.
When building an online feature library, the update frequency typically includes daily updates and weekly updates, given that existing features are all updated in the form of aggregated tables. Wherein, the mode of updating the characteristics every day has higher requirement on the writing rate of the online characteristic library, and the mode of updating the characteristics every week has higher requirement on the storage space of the online characteristic library, therefore, in consideration of the existing characteristic updating requirements, namely the requirement of high writing rate and the requirement of large storage space, when constructing the online characteristic library for storing the characteristics, the online characteristic library can be constructed by combining two different characteristic updating requirements, wherein, one online characteristic library is a database with high updating and inquiring speed and is used for meeting the requirement of the writing rate of the online characteristic library when storing the characteristics, and the other online characteristic library is a database with low price and low storage price and is used for meeting the requirement of large storage space when storing the characteristics, thus, the online characteristic library meeting the updating requirement can be constructed in a targeted manner based on the updating requirement of the existing characteristics, so that the constructed online characteristic library can meet the requirement of the writing rate of the online characteristic library when storing the characteristics, but also can meet the requirement of large storage space.
Based on the technical concept, the embodiment of the present disclosure provides a method for constructing an online feature library, and the method for constructing an online feature library provided by the present disclosure will be described in detail through specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.
Example one
Fig. 1 is a flowchart illustrating a method for constructing an online feature library according to a first embodiment of the present disclosure, where the method for constructing an online feature library can be executed by software and/or a hardware device, for example, the hardware device can be a terminal or a server. For example, referring to fig. 1, the method for constructing the online feature library may include:
s101, determining the updating frequency of the features to be stored; the updating frequency comprises a first updating frequency and a second updating frequency, and the first updating frequency is higher than the second updating frequency.
The update frequency of the features to be stored can be understood as the online feature library which needs to be constructed at this time, and the update frequency of the features to be stored when the features are subsequently used for storing the features can indicate the construction requirement of the online feature library to a certain extent.
For example, the first update frequency may be an existing update frequency for updating the feature every day, the second update frequency may be an existing update frequency for updating the feature every week, and the first update frequency may also be an update frequency for updating the feature every week, and the second update frequency may also be an update frequency for updating the feature every half month, which may be specifically set according to actual service requirements. Here, the embodiment of the present disclosure is described by taking as an example that the first update frequency may be an update frequency of an existing daily update feature, and the second update frequency may be an update frequency of an existing weekly update feature, but the embodiment of the present disclosure is not limited thereto.
Considering that the features to be stored include the features updated based on the first update frequency and the features updated based on the second update frequency when the feature storage is subsequently performed on the online feature library to be constructed, when the online feature library is constructed, the online feature library to which two different update frequencies are applied may be determined according to the two different update frequencies, that is, the following S102 is performed:
s102, respectively determining a first feature library and a second feature library according to the updating frequency; the first feature library is used for storing the features updated based on the first updating frequency, the second feature library is used for storing the features updated based on the second updating frequency, the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
When the first feature library and the second feature library are respectively determined according to the update frequency, considering that the first update frequency is higher than the second update frequency, when the features updated based on the first update frequency are stored, the feature update needs to be completed in a shorter time compared with the features updated based on the second update frequency, and the requirement on the update rate of the online feature library is higher, so that when a first feature library for storing the features updated based on the first update frequency is designed, the first feature library needs to meet the requirement on the high update rate; in addition, when storing the feature updated based on the second update frequency, since the second update frequency is lower than the first update frequency, in general, the feature amount updated based on the second update frequency is larger than the feature amount updated based on the first update frequency in the feature amount to be updated, and the demand for the storage space of the online feature library is high, and therefore, when designing the second feature library for storing the feature updated based on the second update frequency, the second feature library needs to satisfy the demand for the storage space.
Taking the first update frequency as daily update and the second update frequency as weekly update as an example, considering that the redis database has the characteristics of update and fast query speed, the redis database can be used as the first feature library for storing features based on daily update, so as to meet the requirement of high update rate through the redis database, and thus, the storage efficiency of the online feature library can be effectively improved. In addition, since the hbase database has the characteristic of low storage cost, only a plurality of storage devices with poor configuration are needed to operate, the hbase database can be used as a second feature library for storing features based on weekly updating, so that the requirement of large storage space can be met through the hbase database, and the problem of storage space of an online feature library can be effectively solved.
It is to be understood that the first feature library may be, in addition to the redis database, other storage media, for example, an ssdb database, or a storage medium that is self-developed by an enterprise and has a characteristic of fast update and query speed, and may be specifically set according to actual needs. Similarly, the second feature library may be other storage media besides the hbase database, for example, storage media such as cassandra or mongodb that have a characteristic of storing data cheaply, and may be specifically set according to actual needs.
After the first feature library and the second feature library for satisfying different update frequency requirements are respectively determined according to the update frequency, an online feature library for storing features may be constructed based on the first feature library and the second feature library, that is, the following S103 is performed:
s103, constructing an online feature library for storing features based on the first feature library and the second feature library.
When an online feature library for storing features is constructed based on a first feature library and a second feature library, the first feature library and the second feature library can be directly determined as the online feature library, namely the online feature library adopts a design scheme of the first feature library and the second feature library, so that the first feature library in the constructed online feature library can meet the requirement of high updating rate, the storage efficiency of the online feature library is effectively improved, the second feature library in the online feature library can meet the requirement of large storage space, and the problem of storage space of the online feature library is effectively solved.
Taking the first feature library as a redis database and the second feature library as an hbase database as an example, the constructed online feature library adopts a design scheme of the redis database and the hbase database, for example, please refer to fig. 2, fig. 2 is a design schematic diagram of the online feature library provided by the embodiment of the present disclosure, wherein the redis database is used for storing features based on daily update, so as to meet the requirement of higher update rate through the redis database, thereby effectively improving the storage efficiency of the online feature library; the hbase database is used for storing the characteristics updated on a weekly basis so as to meet the requirement of larger storage space through the hbase database, thus effectively solving the problem of storage space of the online characteristic database; therefore, when a prediction task needs to acquire the features, the features can be acquired from the redis database and the hbase database so as to execute the prediction task.
Through tests, the online feature library adopts a design scheme of a redis database and a hbase database, compared with the existing design scheme of the online feature library, the storage time based on the feature updated every day is reduced from 3 hours to 30 minutes, and the storage space based on the feature updated every week is reduced from 1T memory requirement to 300g memory requirement and 1T hard disk requirement.
It can be seen that, in the embodiment of the present disclosure, when constructing an online feature library for storing features, an update frequency of the features to be stored may be determined first; the updating frequency comprises a first updating frequency and a second updating frequency, and the first updating frequency is higher than the second updating frequency; respectively determining a first feature library for storing the features updated based on the first updating frequency and a second feature library for storing the features updated based on the second updating frequency according to the updating frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library; in the online feature library constructed based on the first feature library and the second feature library, the first feature library can meet the requirement of high updating rate, the storage efficiency of the online feature library is effectively improved, the second feature library can meet the requirement of large storage space, and the problem of storage space of the online feature library is effectively solved.
Based on the embodiment shown in fig. 1, after an online feature library for storing features is constructed based on the first feature library and the second feature library, the features may be stored into the corresponding first feature library or the second feature library according to the update frequency of the features when feature storage is subsequently performed. It can be understood that after an online feature library for storing features is constructed, for a first feature library and a second feature library in the online feature library, feature storage logics corresponding to the first feature library and the second feature library respectively need to be determined; in this way, when feature storage is subsequently performed, feature storage can be performed based on the storage logics corresponding to the first feature library and the second feature library respectively.
For example, when determining the corresponding feature storage logic of the first feature library, considering that objects, such as people or objects, to which features to be stored belong are different, when storing the features through the first feature library, a keyword needs to be set in the first feature library for a target object to which the features to be stored belong, and when setting the keyword of the target object in the first feature library, the keyword may be an identifier of the target object and used to uniquely identify the target object, and may further include a preset feature prefix, where the preset feature prefix is used to indicate whether the features of the target object are stored in the first feature library, that is, the keyword of the target object in the first feature library includes the preset feature prefix and the identifier of the target object.
In general, the features to be stored include attributes of the features and feature values of the features, and when the features to be stored are stored in the first feature library, the features may be stored in a first preset data format, where a key field included in the first preset data format is used for storing the attributes of the features, and a value field included in the first preset data format is used for storing the feature values of the features, so that after a feature storage logic corresponding to the first feature library is determined, the features updated based on the first update frequency may be accurately stored in the first feature library according to the feature storage logic, so as to meet a requirement of a higher update rate through the first feature library, thereby effectively improving the storage efficiency of the online feature library.
It can be understood that, when the features are stored in the first preset data format of the first feature library, the features to be stored are single-dimensional features or multi-dimensional features, for example, if the features are single-dimensional features, the key fields in the first preset data format are slot identifiers corresponding to the features, such as slot ids; if the feature is a multi-dimensional feature, the key field in the first preset data format is a slot identifier corresponding to the feature and a dimension of the feature, such as a slot id and a dimension.
Taking the first feature library as a redis database as an example, when storing the features to be stored through the redis database, a preset feature prefix and a unique identifier of a target object may be used as a key in the redis database for uniquely identifying the target object, and for the features to be stored including the attributes of the features and the feature values of the features, a data format of a redis hash may be used as the first preset data format for storing the features, wherein a hash key field included in the data format of the redis hash is used for storing the attributes of the features, and a hash value field included in the data format of the redis hash is used for storing the feature values of the features. In addition, when the feature is a single-dimensional feature, slot id can be used as a hash key field in a data format of the redis hash; when the feature is a multi-dimensional feature, the slot id and the dimension may be used as a hash key field in the data format of the redis hash.
It should be noted that, in the embodiment of the present disclosure, when the features are stored in the data format of the redis hash, and when the features are stored in the redis database, the redis database supports incremental feature update; and when the characteristic value is read from the redis database, the redis database supports partial characteristic value return according to requirements.
It is understood that, when the updated feature is updated in the form of a day-level aggregation table, that is, updated every day, the feature value generated by each day-level aggregation table is always updated together in the object dimension, and therefore, for each day-level aggregation table, an update time is saved, which indicates the update date of the feature corresponding to the day-level aggregation table.
For example, when determining the corresponding feature storage logic of the second feature library, considering that the objects, such as people or objects, to which the features to be stored belong are different, when storing the features through the second feature library, a keyword needs to be set in the second feature library for the target object to which the features to be stored belong, and when setting the keyword of the target object in the second feature library, the keyword may be an identifier of the target object and used to uniquely identify the target object, and may further include a preset feature prefix, where the preset feature prefix is used to indicate whether the features of the target object are stored in the second feature library, that is, the keyword of the target object in the second feature library includes the preset feature prefix and the identifier of the target object.
In general, the features to be stored include attributes of the features and feature values of the features, and when the features to be stored are stored in a second feature library, the features may be stored in a second preset data format, where a column field included in the second preset data format is used for storing the attributes of the features, and a value field included in the second preset data format is used for storing the feature values of the features, so that after a feature storage logic corresponding to the second feature library is determined, the features updated based on a second update frequency may be accurately stored in the second feature library according to the feature storage logic, so as to meet a requirement of a higher update rate through the second feature library, thereby effectively improving the storage efficiency of the online feature library.
It can be understood that, when the features are stored in the second preset data format of the second feature library, the features to be stored are single-dimensional features or multi-dimensional features, for example, if the features are single-dimensional features, the column field in the second preset data format is a slot identifier corresponding to the features, for example, slot id; if the feature is a multi-dimensional feature, the column field in the second preset data format is a slot identifier corresponding to the feature and a dimension of the feature, such as a slot id and a dimension.
Taking the second feature library as an hbase database as an example, when the feature to be stored is stored through the hbase database, and when column storage is supported, a preset feature prefix and a unique identifier of a target object may be used as a rowkey in the hbase database to uniquely identify the target object, and for the feature to be stored including a feature attribute and a feature value of the feature, data formats of column and value may be used as a second preset data format to store the feature, where a column field is used for storing the feature attribute, and a value field is used for storing the feature value of the feature. Furthermore, when the feature is a single-dimensional feature, slot id may be used as a column field in the data format of column and value; when the feature is a multi-dimensional feature, slot id and dimension may be used as column fields in the data formats for column and value.
It should be noted that, in the embodiment of the present disclosure, when the features are saved in the data formats of column and value, and when the features are stored in the hbase database, the hbase database supports incremental feature update; and when the characteristic value is read from the hbase database, the hbase database supports partial characteristic value return according to requirements.
With the above description, after the feature storage logic corresponding to the first feature library is determined, the feature updated based on the first update frequency can be accurately stored in the first feature library according to the feature storage logic corresponding to the first feature library, so that the requirement of high update rate can be met through the first feature library, and thus, the storage efficiency of the online feature library can be effectively improved. And after the feature storage logic corresponding to the second feature library is determined, the features updated based on the second updating frequency can be accurately stored in the second feature library according to the feature storage logic corresponding to the second feature library, so that the requirement of high updating rate can be met through the second feature library, and the storage efficiency of the online feature library can be effectively improved.
Based on the above embodiment, when the online feature library constructed based on the first feature library and the second feature library stores features corresponding to data, considering that daily data all need to correspondingly store features of a time slice, and considering the scarcity of storage space in the online feature library, it may be considered that, in the case of constructing the online feature library, the offline feature library is constructed again, and the constructed offline feature library is used as a backup database of the online feature library, that is, the online feature library and the offline feature library are used as feature storage schemes. For example, please refer to fig. 3, where fig. 3 is a schematic design diagram of an online feature library and an offline feature library provided in an embodiment of the present disclosure, and features may be stored in the online feature library and the offline feature library respectively in a double-write manner. In addition, in consideration of the scarcity of the storage space in the online feature library, the online feature library can only store the latest updated features, and the offline feature library can store the updated features within a preset time period according to the service, wherein the updated features within the preset time period can also comprise the updated features within the previous time period besides the latest updated features, so that the features updated within the previous time period can be stored by the offline feature library on the premise of not occupying the storage space of the online feature library, so that the features updated within the previous time period can be used for the subsequent use, and a feature basis is provided for the subsequent use. The duration of the preset time period may be set according to actual needs, and the specific duration of the preset time period is not specifically limited in the embodiments of the present disclosure.
For example, the offline feature library may be a hive database, or another database, such as a presto database, which may be specifically set according to actual needs.
Taking an offline feature library as an hive database as an example, when the constructed hive database stores features of data, it is also necessary to determine a feature storage logic corresponding to the hive database, which can be shown in table 1 below:
TABLE 1
Figure BDA0003219217760000111
As can be seen from table 1, id represents the identification of the person/object to which the feature belongs, source represents the table name of the aggregation table in which the data corresponding to the computed feature is located, fea represents feature storage, success represents the identification of whether the data is successfully stored in the online feature library, ext represents a reserved field, event _ day represents partition time, and event _ action represents partition task. Wherein id, source, fea, success and event _ day are required to be set when storing features in the offline feature library; ext and event _ action are further expanded when storing features in the offline feature library, i.e. no setting may be needed.
After the online feature library of the first feature library and the second feature library is constructed through the above embodiments, the online feature library can be put into application. In the application process of the online feature library, when there is a feature to be stored, the feature to be stored may be stored in the first feature library or the second feature library in the online feature library in a targeted manner according to the update frequency of the feature to be stored, which may be specifically referred to as the second embodiment shown in fig. 4 described below.
Example two
Fig. 4 is a flowchart of an online storage method for features provided according to a second embodiment of the present disclosure, where the online storage method for features may be performed by software and/or a hardware device, for example, the hardware device may be a terminal or a server. For example, referring to fig. 4, the online storage method of the feature may include:
s401, determining the updating frequency of the features to be stored.
For example, the update frequency may be updated daily, may also be updated weekly, and may also be updated every half month, and may be specifically set according to actual needs, where, as for the update frequency, the embodiment of the present disclosure is not specifically limited.
For example, when the feature to be stored is obtained, the data to be stored may be obtained first, and a pluggable feature operator corresponding to the data to be stored is determined, and then the data to be stored is calculated based on the pluggable feature operator to obtain the feature of the data to be stored, which may be recorded as the feature to be stored, so as to obtain the feature to be stored.
It can be understood that, in the embodiment of the present disclosure, when calculating data to be stored based on a pluggable feature operator, if data newly added every day or every week does not relate to previous window data, only features of newly added data may be calculated; storing the characteristics of the newly added data into an online characteristic library; if the newly added data relates to the previous window data every day or every week, the characteristics of the annual window data need to be recalculated, and the characteristics of the recalculated annual window data are stored in the online characteristic library to replace the characteristics stored before.
For example, when determining the update frequency of the feature to be stored, the input update frequency identifier of the feature to be stored may be received, the update feature of the feature to be stored is determined by the update frequency identifier, the update frequency of the feature to be stored may also be obtained in other manners, and the setting may be specifically performed according to actual service needs.
After the update frequency of the features to be stored is determined, a target feature library corresponding to the update frequency may be determined from the online feature library according to the update frequency, that is, the following S402 is executed:
s402, according to the updating frequency, determining a target feature library corresponding to the updating frequency from the online feature library.
For example, the online feature library may include two update frequencies, each corresponding to a feature library, and the database corresponding to each update frequency may satisfy the storage requirement of the feature when storing the feature updated based on the update frequency.
For example, when the update frequency is daily update, the corresponding target feature library may be a redis database, or an ssdb database, or a storage medium that is self-developed by an enterprise and has a characteristic of fast update and query speed, so as to meet the requirement of a high update rate through the databases, which may effectively improve the storage efficiency of the online feature library. When the update frequency is daily update, the corresponding target feature library can be an hbase database, or a storage medium with the characteristic of low storage price, such as cassandra, mongodb and the like, so that the requirement of large storage space can be met through the databases, and the problem of storage space of the online feature library can be effectively solved.
After the target feature library corresponding to the update frequency is determined from the online feature library, the features to be stored may be stored in the target feature library in a targeted manner, that is, the following S403 is executed:
and S403, storing the features to be stored in a target feature library.
For example, when the update frequency is daily update and the corresponding target feature library is a redis database, the features to be stored may be stored in the redis database, so as to meet a requirement of a higher update rate through the redis database, thereby effectively improving the storage efficiency of the online feature library. When the updating frequency is daily updating and the corresponding target feature library is the hbase database, the features to be stored can be stored in the hbase database so as to meet the requirement of large storage space through the hbase database, and therefore the problem of storage space of the online feature library can be effectively solved.
It can be seen that in the embodiment of the present disclosure, when storing the feature, the update frequency of the feature to be stored may be determined first; and according to the updating frequency, determining a target feature library corresponding to the updating frequency from the online feature library, and then storing the features to be stored in the target feature library. Therefore, the features to be stored can be stored into the target feature library corresponding to the updating frequency in a targeted manner through the updating frequency of the features to be stored, so that the storage requirement of the features to be stored can be met through the target feature library.
Based on the embodiment shown in fig. 4, the update frequency of the feature to be stored may include a first update frequency or a second update frequency, and the first update frequency is higher than the second update frequency. For example, when determining the target feature library corresponding to the update frequency from the online feature library according to the update frequency, the method may include:
and if the updating frequency is the first updating frequency, determining the first feature library in the online feature library as a target feature library. If the updating frequency is the second updating frequency, determining a second feature library in the online feature library as a target feature library; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
It should be noted that, the descriptions of the first update frequency, the second update frequency, the first feature library and the second feature library are similar to the descriptions of the first update frequency, the second update frequency, the first feature library and the second feature library in the embodiment shown in fig. 1, which can be referred to in the embodiment shown in fig. 1 for the relevant descriptions of the first update frequency, the second update frequency, the first feature library and the second feature library, and therefore, the embodiments of the present disclosure are not repeated herein.
Correspondingly, if the target feature library is a first feature library, when the features to be stored are stored in the target feature library, the preset feature prefix and the identification of the target object to which the features to be stored belong can be determined as the keywords of the target object in the first feature library; the preset feature prefix is used for indicating whether the first feature library stores features to be stored of the target object or not; storing the to-be-stored features of the target object into a first feature library by adopting a first preset data format, wherein key fields included in the first preset data format are used for storing attributes of the to-be-stored features, and value fields included in the first preset data format are used for storing feature values of the to-be-stored features.
Therefore, when the features to be stored are stored in the first feature library, the keywords of the target object in the first feature library and the first preset data format can be adopted, the features updated based on the first updating frequency can be accurately stored in the first feature library, and the requirement of high updating rate can be met through the first feature library, so that the storage efficiency of the online feature library can be effectively improved.
Taking the first feature library as a redis database as an example, when storing the features to be stored through the redis database, a preset feature prefix and a unique identifier of a target object may be used as a key in the redis database for uniquely identifying the target object, and for the features to be stored including the attributes of the features and the feature values of the features, a data format of a redis hash may be used as the first preset data format for storing the features, wherein a hash key field included in the data format of the redis hash is used for storing the attributes of the features, and a hash value field included in the data format of the redis hash is used for storing the feature values of the features. In addition, when the feature is a single-dimensional feature, slot id can be used as a hash key field in a data format of the redis hash; when the feature is a multi-dimensional feature, the slot id and the dimension may be used as a hash key field in the data format of the redis hash.
It should be noted that, in the embodiment of the present disclosure, when the features are stored in the data format of the redis hash, and when the features are stored in the redis database, the redis database supports incremental feature update; and when the characteristic value is read from the redis database, the redis database supports partial characteristic value return according to requirements.
If the target feature library is a second feature library, when the features to be stored are stored in the target feature library, the preset feature prefix and the identification of the target object to which the features to be stored belong can be determined as keywords of the target object in the second feature library; the preset feature prefix is used for indicating whether the second feature library stores the features to be stored of the target object or not; and storing the features to be stored of the target object into a second feature library by adopting a second preset data format, wherein column fields included in the second preset data format are used for storing the attributes of the features to be stored, and value fields included in the second preset data format are used for storing the feature values of the features to be stored.
When the features to be stored are stored in the second feature library, the keywords of the target object in the second feature library and a second preset data format can be adopted, and the features updated based on the second updating frequency can be accurately stored in the second feature library, so that the requirement of high updating rate can be met through the second feature library, and the storage efficiency of the online feature library can be effectively improved.
Taking the second feature library as an hbase database as an example, when the feature to be stored is stored through the hbase database, and when column storage is supported, a preset feature prefix and a unique identifier of a target object may be used as a rowkey in the hbase database to uniquely identify the target object, and for the feature to be stored including a feature attribute and a feature value of the feature, data formats of column and value may be used as a second preset data format to store the feature, where a column field is used for storing the feature attribute, and a value field is used for storing the feature value of the feature. Furthermore, when the feature is a single-dimensional feature, slot id may be used as a column field in the data format of column and value; when the feature is a multi-dimensional feature, slot id and dimension may be used as column fields in the data formats for column and value.
It should be noted that, in the embodiment of the present disclosure, when the features are saved in the data formats of column and value, and when the features are stored in the hbase database, the hbase database supports incremental feature update; and when the characteristic value is read from the hbase database, the hbase database supports partial characteristic value return according to requirements.
In addition, it should be noted that, in the embodiment of the present disclosure, when storing the feature to be stored in the online feature library, if the storage fails, and only part of the feature and the feature value are successfully stored in the online feature library, the feature that has failed to be stored may be recorded, and the feature that has failed to be stored may be stored again. If the downstream task urgently needs to use the features in the online feature library, the updating date of the successfully stored features is stored in the online feature library, and the downstream task judges whether the features in the online feature library can meet the task requirements or not according to the timeliness of the features.
In addition, in view of the fact that data of each day needs to be correspondingly stored with a time-sliced feature, and in view of the shortage of storage space in the online feature library, it may be considered that, in the case of constructing the online feature library, the offline feature library is constructed again, and the constructed offline feature library is used as a backup database of the online feature library, that is, the online feature library and the offline feature library are used as feature storage schemes. When the features are stored, the features to be stored can be written into the online feature library and the offline feature library respectively in a double-writing mode, and compared with the mode that the features to be stored are written into the offline feature library firstly and then synchronized into the online feature library, the feature storage efficiency can be further improved.
It can be understood that when some features stored in the online feature library are not needed by downstream tasks any more subsequently, a deletion mark may be marked on the part of the features in both the online database and the offline feature library, so that when the next feature is stored, the part of the features and the feature values corresponding to the part of the features may be deleted in both the online database and the offline feature library, thereby updating the online feature library and the offline feature library, avoiding storing invalid features in the online feature library and the offline feature library, and saving storage space of the online feature library and the offline feature library.
EXAMPLE III
Fig. 5 is a schematic structural diagram of an online feature library construction apparatus 50 provided according to a third embodiment of the present disclosure, and for example, referring to fig. 5, the online feature library construction apparatus 50 may include:
a determining unit 501, configured to determine an update frequency of a feature to be stored; the updating frequency comprises a first updating frequency and a second updating frequency, and the first updating frequency is higher than the second updating frequency.
A processing unit 502, configured to determine a first feature library and a second feature library according to the update frequency, respectively; the first feature library is used for storing features which are updated based on a first updating frequency, and the second feature library is used for storing features which are updated based on a second updating frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
A first constructing unit 503, configured to construct an online feature library for storing features based on the first feature library and the second feature library.
Optionally, the keyword of the target object to which the feature belongs in the first feature library includes a preset feature prefix and an identifier of the target object; the preset feature prefix is used for indicating whether the first feature library stores the features of the target object.
The characteristics of the target object are stored in a first preset data format, wherein key fields included in the first preset data format are used for storing the attributes of the characteristics, and value fields included in the first preset data format are used for storing the characteristic values of the characteristics.
Optionally, if the feature is a single-dimensional feature, the key field in the first preset data format is a slot identifier corresponding to the feature; and if the characteristic is a multi-dimensional characteristic, the key field in the first preset data format is the slot position identification corresponding to the characteristic and the dimension of the characteristic.
Optionally, the keyword of the target object to which the feature belongs in the second feature library includes a preset feature prefix and an identifier of the target object; the preset feature prefix is used for indicating whether the second feature library stores the features of the target object.
And the characteristics of the target object are stored in a second preset data format, wherein column fields included in the second preset data format are used for storing the attributes of the characteristics, and value fields included in the second preset data format are used for storing the characteristic values of the characteristics.
Optionally, if the feature is a single-dimensional feature, the column field in the second preset data format is a slot identifier corresponding to the feature; and if the characteristic is the multi-dimensional characteristic, the column field in the second preset data format is the slot position identification corresponding to the characteristic and the dimension of the characteristic.
Optionally, the online feature library construction apparatus 50 further includes a second construction unit.
And the second construction unit is used for constructing an offline feature library, wherein the offline feature library is a backup database of the online feature library, and the offline feature library is used for storing the updated features in the preset time period.
The online feature library construction apparatus 50 provided in the embodiment of the present disclosure may execute the technical solution of the online feature library construction method shown in any one of the above embodiments, and its implementation principle and beneficial effects are similar to those of the online feature library construction method, and the implementation principle and beneficial effects of the online feature library construction method may be implemented, and are not described herein again.
Example four
Fig. 6 is a schematic structural diagram of an online storage device 60 according to a feature provided in a fourth embodiment of the present disclosure, for example, referring to fig. 6, the online storage device 60 of the feature may include:
a determining unit 601, configured to determine an update frequency of the feature to be stored.
A processing unit 602, configured to determine, according to the update frequency, a target feature library corresponding to the update frequency from the online feature libraries.
The storage unit 603 is configured to store the feature to be stored in the target feature library.
Optionally, the update frequency includes a first update frequency or a second update frequency, and the first update frequency is higher than the second update frequency; the processing unit 602 includes a first processing module and a second processing module.
And the first processing module is used for determining a first feature library in the online feature library as a target feature library if the updating frequency is the first updating frequency.
The second processing module is used for determining a second feature library in the online feature library as a target feature library if the updating frequency is the second updating frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
Optionally, if the target feature library is a first feature library, the storage unit 603 includes a first storage module and a second storage module.
The first storage module is used for determining a preset feature prefix and an identifier of a target object to which a feature to be stored belongs as keywords of the target object in a first feature library; the preset feature prefix is used for indicating whether the first feature library stores the features to be stored of the target object.
The second storage module is used for storing the features to be stored of the target object into the first feature library by adopting a first preset data format, wherein key fields included in the first preset data format are used for storing attributes of the features to be stored, and value fields included in the first preset data format are used for storing feature values of the features to be stored.
Optionally, if the target feature library is the second feature library, the storage unit 603 includes a third storage module and a fourth storage module.
The third storage module is used for determining the preset feature prefix and the identification of the target object to which the feature to be stored belongs as the keyword of the target object in the second feature library; the preset feature prefix is used for indicating whether the second feature library stores the features to be stored of the target object.
And the fourth storage module is used for storing the features to be stored of the target object into the second feature library by adopting a second preset data format, wherein column fields included in the second preset data format are used for storing attributes of the features to be stored, and value fields included in the second preset data format are used for storing feature values of the features to be stored.
Optionally, the storage module further includes a fifth storage module.
And the fifth storage module is used for respectively storing the features to be stored into the target feature library and the off-line feature library.
The online storage device 60 for features provided in the embodiment of the present disclosure may implement the technical solution of the online storage method for features shown in any one of the above embodiments, and its implementation principle and beneficial effects are similar to those of the online storage method for features, and reference may be made to the implementation principle and beneficial effects of the online storage method for features, which are not described herein again.
The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.
According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: the computer program is stored in a readable storage medium, at least one processor of the electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program to enable the electronic device to execute the method for constructing the online feature library provided by any one of the above embodiments or the scheme of the online storage method for the features.
Fig. 7 is a schematic block diagram of an electronic device provided in accordance with an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 70 includes a computing unit 701, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 70 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 70 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 70 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as the construction method of the online feature library, or the online storage method of the features. For example, in some embodiments, the method of building the online feature library, or the online storage method of features, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 70 via the ROM 702 and/or the communication unit 709. When loaded into RAM 703 and executed by the computing unit 701, the computer program may perform one or more steps of the method for building an online feature library, or the method for storing features online, as described above. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of building an online feature library, or an online storage method of features.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (25)

1. A method for constructing an online feature library comprises the following steps:
determining an update frequency of a feature to be stored; wherein the update frequency comprises a first update frequency and a second update frequency, and the first update frequency is higher than the second update frequency;
respectively determining a first feature library and a second feature library according to the updating frequency; wherein the first feature library is configured to store features that are updated based on the first update frequency, and the second feature library is configured to store features that are updated based on the second update frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library;
and constructing an online feature library for storing the features based on the first feature library and the second feature library.
2. The method of claim 1, wherein,
keywords of a target object to which the characteristics belong in the first characteristic library comprise preset characteristic prefixes and identifications of the target object; the preset feature prefix is used for indicating whether the first feature library stores the features of the target object or not;
the characteristics of the target object are stored in a first preset data format, wherein key fields included in the first preset data format are used for storing the attributes of the characteristics, and value fields included in the first preset data format are used for storing characteristic values of the characteristics.
3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,
if the characteristic is a single-dimensional characteristic, the key field in the first preset data format is a slot position identification corresponding to the characteristic;
and if the characteristic is a multi-dimensional characteristic, the key field in the first preset data format is a slot position identification corresponding to the characteristic and the dimension of the characteristic.
4. The method according to any one of claims 1 to 3,
keywords of the target object to which the characteristics belong in the second characteristic library comprise preset characteristic prefixes and the identification of the target object; the preset feature prefix is used for indicating whether the second feature library stores the features of the target object or not;
and the characteristics of the target object are stored by adopting a second preset data format, wherein column fields included in the second preset data format are used for storing the attributes of the characteristics, and value fields included in the second preset data format are used for storing the characteristic values of the characteristics.
5. The method of claim 4, wherein the first and second light sources are selected from the group consisting of,
if the characteristic is a single-dimensional characteristic, the column field in the second preset data format is a slot position identification corresponding to the characteristic;
and if the characteristic is a multi-dimensional characteristic, column fields in the second preset data format are the slot position identification corresponding to the characteristic and the dimension of the characteristic.
6. The method according to any one of claims 1-5, further comprising:
and constructing an offline feature library, wherein the offline feature library is a backup database of the online feature library and is used for storing the updated features within a preset time period.
7. An online storage method of features, comprising:
determining the updating frequency of the features to be stored;
determining a target feature library corresponding to the updating frequency from an online feature library according to the updating frequency;
and storing the features to be stored in the target feature library.
8. The method of claim 7, the update frequency comprising a first update frequency or a second update frequency, and the first update frequency being higher than the second update frequency;
the determining, according to the update frequency, a target feature library corresponding to the update frequency from the online feature libraries includes:
if the updating frequency is the first updating frequency, determining a first feature library in the online feature library as the target feature library;
if the updating frequency is a second updating frequency, determining a second feature library in the online feature library as the target feature library; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
9. The method of claim 8, wherein if the target feature library is the first feature library, the storing the feature to be stored in the target feature library comprises:
determining a preset feature prefix and an identifier of a target object to which the feature to be stored belongs as keywords of the target object in the first feature library; the preset feature prefix is used for indicating whether the feature to be stored of the target object is stored in the first feature library or not;
and storing the features to be stored of the target object into the first feature library by adopting a first preset data format, wherein a key field included in the first preset data format is used for storing the attributes of the features to be stored, and a value field included in the first preset data format is used for storing the feature values of the features to be stored.
10. The method according to claim 8 or 9, wherein if the target feature library is the second feature library, the storing the feature to be stored in the target feature library comprises:
determining a preset feature prefix and an identifier of a target object to which the feature to be stored belongs as keywords of the target object in the second feature library; the preset feature prefix is used for indicating whether the feature to be stored of the target object is stored in the second feature library or not;
and storing the features to be stored of the target object into the second feature library by adopting a second preset data format, wherein column fields included in the second preset data format are used for storing the attributes of the features to be stored, and value fields included in the second preset data format are used for storing the feature values of the features to be stored.
11. The method according to any one of claims 7-10, wherein storing the features to be stored to the target feature library comprises:
and respectively storing the features to be stored into the target feature library and the off-line feature library.
12. An apparatus for constructing an online feature library, comprising:
a determination unit configured to determine an update frequency of a feature to be stored; wherein the update frequency comprises a first update frequency and a second update frequency, and the first update frequency is higher than the second update frequency;
the processing unit is used for respectively determining a first feature library and a second feature library according to the updating frequency; wherein the first feature library is configured to store features that are updated based on the first update frequency, and the second feature library is configured to store features that are updated based on the second update frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library;
a first constructing unit, configured to construct an online feature library for storing the features based on the first feature library and the second feature library.
13. The apparatus of claim 12, wherein,
keywords of a target object to which the characteristics belong in the first characteristic library comprise preset characteristic prefixes and identifications of the target object; the preset feature prefix is used for indicating whether the first feature library stores the features of the target object or not;
the characteristics of the target object are stored in a first preset data format, wherein key fields included in the first preset data format are used for storing the attributes of the characteristics, and value fields included in the first preset data format are used for storing characteristic values of the characteristics.
14. The apparatus of claim 13, wherein the first and second electrodes are disposed in a substantially cylindrical configuration,
if the characteristic is a single-dimensional characteristic, the key field in the first preset data format is a slot position identification corresponding to the characteristic;
and if the characteristic is a multi-dimensional characteristic, the key field in the first preset data format is a slot position identification corresponding to the characteristic and the dimension of the characteristic.
15. The apparatus of any one of claims 12-14,
keywords of the target object to which the characteristics belong in the second characteristic library comprise preset characteristic prefixes and the identification of the target object; the preset feature prefix is used for indicating whether the second feature library stores the features of the target object or not;
and the characteristics of the target object are stored by adopting a second preset data format, wherein column fields included in the second preset data format are used for storing the attributes of the characteristics, and value fields included in the second preset data format are used for storing the characteristic values of the characteristics.
16. The apparatus as set forth in claim 15, wherein,
if the characteristic is a single-dimensional characteristic, the column field in the second preset data format is a slot position identification corresponding to the characteristic;
and if the characteristic is a multi-dimensional characteristic, column fields in the second preset data format are the slot position identification corresponding to the characteristic and the dimension of the characteristic.
17. The apparatus according to any one of claims 12-16, further comprising a second building unit;
the second construction unit is configured to construct an offline feature library, where the offline feature library is a backup database of the online feature library, and the offline feature library is used to store features updated within a preset time period.
18. An online storage device of features, comprising:
the determining unit is used for determining the updating frequency of the features to be stored;
the processing unit is used for determining a target feature library corresponding to the updating frequency from an online feature library according to the updating frequency;
and the storage unit is used for storing the features to be stored in the target feature library.
19. The device of claim 18, the update frequency comprises a first update frequency or a second update frequency, and the first update frequency is higher than the second update frequency;
the processing unit comprises a first processing module and a second processing module;
the first processing module is configured to determine a first feature library in the online feature library as the target feature library if the update frequency is a first update frequency;
the second processing module is configured to determine a second feature library in the online feature library as the target feature library if the update frequency is a second update frequency; the updating rate of the first feature library is higher than that of the second feature library, and the storage space of the second feature library is larger than that of the first feature library.
20. The apparatus of claim 19, wherein the storage unit comprises a first storage module and a second storage module if the target feature library is the first feature library;
the first storage module is used for determining a preset feature prefix and an identifier of a target object to which the feature to be stored belongs as keywords of the target object in the first feature library; the preset feature prefix is used for indicating whether the feature to be stored of the target object is stored in the first feature library or not;
the second storage module is configured to store the feature to be stored of the target object into the first feature library by using a first preset data format, where a key field included in the first preset data format is used to store an attribute of the feature to be stored, and a value field included in the first preset data format is used to store a feature value of the feature to be stored.
21. The apparatus according to claim 19 or 20, wherein the storage unit comprises a third storage module and a fourth storage module if the target feature library is the second feature library;
the third storage module is configured to determine a preset feature prefix and an identifier of a target object to which the feature to be stored belongs as a keyword of the target object in the second feature library; the preset feature prefix is used for indicating whether the feature to be stored of the target object is stored in the second feature library or not;
the fourth storage module is configured to store the feature to be stored of the target object into the second feature library by using a second preset data format, where a column field included in the second preset data format is used to store an attribute of the feature to be stored, and a value field included in the second preset data format is used to store a feature value of the feature to be stored.
22. The apparatus of any of claims 18-21, the storage module further comprising a fifth storage module;
and the fifth storage module is used for respectively storing the features to be stored into the target feature library and the off-line feature library.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory storing instructions executable by the at least one processor to enable the at least one processor to perform the method of building an online feature library of any of claims 1-6; or an online storage method enabling said at least one processor to perform the features of any of claims 7-11.
24. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method of constructing an online feature library of any one of claims 1-6; or an online storage method performing the features of any of claims 7-11.
25. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of building an online feature library of any of claims 1-6; or, implementing the steps of an online storage method of the features of any of claims 7-11.
CN202110952910.XA 2021-08-19 2021-08-19 Online feature library construction method and device and electronic equipment Pending CN113850271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110952910.XA CN113850271A (en) 2021-08-19 2021-08-19 Online feature library construction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110952910.XA CN113850271A (en) 2021-08-19 2021-08-19 Online feature library construction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113850271A true CN113850271A (en) 2021-12-28

Family

ID=78976004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110952910.XA Pending CN113850271A (en) 2021-08-19 2021-08-19 Online feature library construction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113850271A (en)

Similar Documents

Publication Publication Date Title
CN113656501A (en) Data reading method, device, equipment and storage medium
CN113364877A (en) Data processing method, device, electronic equipment and medium
CN111259090A (en) Graph generation method and device of relational data, electronic equipment and storage medium
CN114064925A (en) Knowledge graph construction method, data query method, device, equipment and medium
CN113868273A (en) Metadata snapshot method and device
CN116028517A (en) Fusion database system and electronic equipment
CN115329150A (en) Method and device for generating search condition tree, electronic equipment and storage medium
CN113850271A (en) Online feature library construction method and device and electronic equipment
CN115544010A (en) Mapping relation determining method and device, electronic equipment and storage medium
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN113868254B (en) Method, device and storage medium for removing duplication of entity node in graph database
CN115328917A (en) Query method, device, equipment and storage medium
CN114661736A (en) Electronic map updating method and device, electronic equipment, storage medium and product
CN113190718A (en) Data processing method and device for graph database, electronic equipment and storage medium
CN111737593A (en) Method, device, equipment and storage medium for acquiring cross-group communication relationship diagram
CN113495891A (en) Data processing method and device
CN114327293B (en) Data reading method, device, equipment and storage medium
CN114820079B (en) Crowd determination method, device, equipment and medium
CN113569144B (en) Method, device, equipment, storage medium and program product for searching promotion content
CN115391052B (en) Robot task processing method and device, electronic equipment and storage medium
CN115525659A (en) Data query method and device, electronic equipment and storage medium
CN113569027A (en) Document title processing method and device and electronic equipment
CN115421665A (en) Data storage method, device, equipment and storage medium
CN115687529A (en) Data synchronization method and device, electronic equipment and storage medium
CN115454977A (en) Data migration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination