CN116955821A - Data processing method, device, equipment and storage medium for recommending scene - Google Patents

Data processing method, device, equipment and storage medium for recommending scene Download PDF

Info

Publication number
CN116955821A
CN116955821A CN202310934615.0A CN202310934615A CN116955821A CN 116955821 A CN116955821 A CN 116955821A CN 202310934615 A CN202310934615 A CN 202310934615A CN 116955821 A CN116955821 A CN 116955821A
Authority
CN
China
Prior art keywords
data
field
sample
data record
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310934615.0A
Other languages
Chinese (zh)
Inventor
杜春鹏
张晓亮
林能
杨舜尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Volcano Engine Technology Co Ltd
Original Assignee
Beijing Volcano Engine Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Volcano Engine Technology Co Ltd filed Critical Beijing Volcano Engine Technology Co Ltd
Priority to CN202310934615.0A priority Critical patent/CN116955821A/en
Publication of CN116955821A publication Critical patent/CN116955821A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to embodiments of the present disclosure, a method, apparatus, device, and storage medium for data processing are provided, which may be used to recommend a scene. The method includes presenting an input control for specifying data fields for at least one data record, each of the at least one data record including a set of data fields; receiving, by the input control, user input specifying a target data field in the set of data fields; and generating at least one sample respectively corresponding to the at least one data record based on the target data field, the samples in the at least one sample having an encoded representation of the corresponding data record and a sample field corresponding to the target data field of the corresponding data record. Therefore, the configuration efficiency of the auxiliary information can be improved, the period of sample production can be shortened, and the period of model training can be shortened.

Description

Data processing method, device, equipment and storage medium for recommending scene
Technical Field
Example embodiments of the present disclosure relate generally to the field of computers and, more particularly, relate to data processing methods, apparatuses, devices, and computer-readable storage media.
Background
There is a need for sample production in data processing. That is, the data needs to be processed into samples for later use, for example, for training a machine learning model. For example, in a recommendation scenario (such as merchandise recommendation, content recommendation), raw data needs to be processed into samples for training a recommendation model. For various possible reasons, such as data storage requirements or model training requirements, in sample production, raw data is typically processed into an implicitly coded representation, which makes the original values in the data and their meanings invisible in the sample. For example, after processing, there is no explicit indication of the date of merchandise production in the sample. It is therefore desirable to be able to retain one or more original values in the produced samples as sample side information.
Disclosure of Invention
In a first aspect of the present disclosure, a method of a data processing method is provided. The method comprises the following steps: including presenting an input control for specifying data fields for at least one data record, each of the at least one data record including a set of data fields; receiving, by the input control, user input specifying a target data field in the set of data fields; and generating at least one sample respectively corresponding to the at least one data record based on the target data field, the samples in the at least one sample having an encoded representation of the corresponding data record and a sample field corresponding to the target data field of the corresponding data record.
In a second aspect of the present disclosure, an apparatus for data processing is provided. The device comprises: an interface presentation module configured to present an input control for specifying data fields for at least one data record, each of the at least one data record comprising a set of data fields; an input module configured to receive user input specifying a target data field of a set of data fields through an input control; and a sample generation module configured to generate at least one sample respectively corresponding to the at least one data record based on the target data field, the samples in the at least one sample having an encoded representation of the corresponding data record and a sample field corresponding to the target data field of the corresponding data record.
In a third aspect of the present disclosure, an electronic device is provided. The apparatus comprises at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit. The instructions, when executed by at least one processing unit, cause the apparatus to perform the method of the first aspect.
In a fourth aspect of the present disclosure, a computer-readable storage medium is provided. The computer readable storage medium has stored thereon a computer program executable by a processor to implement the method of the first aspect.
It should be understood that what is described in this section of the disclosure is not intended to limit key features or essential features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure may be implemented;
fig. 2 shows a schematic diagram of one example of adding a field as auxiliary information in one scheme;
FIG. 3 illustrates a flow chart of a process of data processing according to some embodiments of the present disclosure;
FIG. 4 illustrates a schematic diagram of one example of a configuration interface for sample production, according to some embodiments of the present disclosure;
FIG. 5 illustrates a schematic diagram of one example of an interface for specifying a target field, according to some embodiments of the present disclosure;
FIG. 6 illustrates a block diagram of an apparatus for data processing according to some embodiments of the present disclosure; and
fig. 7 illustrates a block diagram of an apparatus capable of implementing various embodiments of the present disclosure.
Detailed Description
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been illustrated in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather, these embodiments are provided so that this disclosure will be more thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that any section/subsection headings provided herein are not limiting. Various embodiments are described throughout this document, and any type of embodiment may be included under any section/subsection. Furthermore, the embodiments described in any section/subsection may be combined in any manner with any other embodiment described in the same section/subsection and/or in a different section/subsection.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions are also possible below. The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
As used herein, the term "model" may learn the association between the respective inputs and outputs from training data so that, for a given input, a corresponding output may be generated after training is completed. The generation of the model may be based on machine learning techniques. Deep learning is a machine learning algorithm that processes inputs and provides corresponding outputs through the use of multiple layers of processing units. The "model" may also be referred to herein as a "machine learning model," "machine learning network," or "network," and these terms are used interchangeably herein. A model may in turn comprise different types of processing units or networks.
Example Environment
FIG. 1 illustrates a schematic diagram of an example environment 100 in which embodiments of the present disclosure may be implemented. In environment 100, an application 120 is running in an electronic device 110. The user 130 may interact with the application 120 via the terminal device 110 and/or its attached device. The application 120 may be a data processing application, such as a sample production application. The application 120 may be implemented in any suitable form, such as a stand-alone application, a component or plug-in built into an application as part thereof, and the like. The present disclosure is not limited in this respect.
In embodiments of the present disclosure, the electronic device 110 may generate the corresponding at least one sample from the at least one data record. As an example, data records 101-1, 101-2, …, 101-N, also referred to individually or collectively as data records 101, are shown in FIG. 1, and N is a positive integer. FIG. 1 also shows samples 102-1, 102-2, …, 102-N, also referred to individually or collectively as sample 102, corresponding to data records 101-1, 101-2 … 101-N, respectively. The disclosed embodiments do not limit the number of data records and samples.
The data record 101 may relate to any of a variety of types of data suitable for producing a sample, and may relate to a variety of suitable types of objects. As one example, the data record 101 may include an image and its descriptive information, such as the size, style, etc. of the image. As another example, the data record 101 may include text in natural language and its descriptive information, such as what language the text is in, the number of words of the text, the source, etc. As another example, the data record 101 may include a description of an event, such as what behavior the object a made at a certain time or what behavior the object a made to the object B, etc. Embodiments of the present disclosure are not limited in the type of data records.
Each data record 101 may include one or more fields, also referred to as a set of fields. Each field may be used to describe an aspect related to the data record 101 or an attribute of an object to which the data record 101 relates. By way of example, if the data record 101 relates to an image, the data record 101 may include a field for recording the size of the image, a field for recording the style of the image, and so on. Multiple data records for the same sample production task may be the same type of data record and include the same set of fields. It should be understood that in embodiments of the present disclosure, a data record includes a field, which does not mean that the field may have a significant value. For example, this field may be identified as not applicable "N/A". In some embodiments, the data record 101 may be data acquired in a recommended scenario.
In some embodiments, electronic device 110 may present a user with various interfaces related to data processing, such as interface 121. Through this interface 121, the user can realize various settings for data processing. For example, a user may configure a sample production task.
In environment 100, electronic device 110 may be any type of device having computing capabilities, including a terminal device or a server device. The terminal device may be any type of mobile terminal, fixed terminal, or portable terminal, including a mobile handset, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, media computer, multimedia tablet, personal Communication System (PCS) device, personal navigation device, personal Digital Assistant (PDA), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination of the preceding, including accessories and peripherals for these devices, or any combination thereof. The server devices may include, for example, computing systems/servers, such as mainframes, edge computing nodes, electronic devices in a cloud environment, and so forth.
It should be understood that the structure and function of environment 100 are described for illustrative purposes only and are not meant to suggest any limitation as to the scope of the disclosure.
As mentioned briefly above, it is desirable in data processing to be able to retain the original values of some fields of interest in the produced samples as sample side information. The sample side information is a non-characteristic field in the sample that retains the characteristics of the original value of the field. The sample assistance information can provide additional value in subsequent uses of the sample. For example, in training of machine learning models, sample assistance information may enable an algorithm engineer to use raw values for sample screening, editing labels, assistance debugging (debug), etc. operations during the model training phase. Therefore, how to efficiently generate the sample auxiliary information plays an important role in sample production and subsequent sample use.
The current solution is based on an Interface Description Language (IDL) file in Protocol Buffer (PB) format. The IDL file in PB format has a fixed semantic and structure, which defines the names and data types of all auxiliary information. Since the IDL file in the PB format cannot be dynamically modified online, whenever new auxiliary information is to be added, a new field (for example, "a") needs to be defined in the IDL file in the PB format, and this modification is issued online. Thereafter, upon creation of the sample selection sample auxiliary information, the desired retention corresponding value may be stored into the predefined field "a". Accordingly, the field "A" may also be used in the downstream model training process for filtering and analysis.
Reference is made to fig. 2. In the interface 200, "page" is a preset item of auxiliary information. Various fields included in the data record currently to be processed are displayed in the drop down menu 210. If it is desired to add new sample side information, a field corresponding to the side information "page" needs to be selected from the drop-down menu 210.
Such a scheme for sample assistance information has a number of problems. First, the update (e.g., modification or addition) of the sample auxiliary information requires development of an online flow. The fields in the IDL file in PB format all need to be preset to be used when creating the samples. Once the new field is needed to be used as auxiliary information in the service, the configuration of the IDL file in PB format needs to be modified, and strict testing and online procedures are performed. Secondly, such development online process is long in period and high in risk. Once in error, the generation of samples and subsequent model training are hindered, and development progress and usability of the machine learning model are affected. Third, in this scheme, the auxiliary information is preset, and the preset sample auxiliary information cannot meet diversified service requirements. When producing samples for different application scenarios, the fields for the sample side information have a very large uncertainty, and this kind of IDL file in the preset PB format is difficult to meet all requirements.
Embodiments of the present disclosure propose a data processing scheme. In this approach, an input control is presented for specifying data fields for at least one data record, each of the at least one data record including a set of data fields. Thereafter, user input specifying a target data field in the set of data fields is received via the input control. Further, at least one sample respectively corresponding to the at least one data record is generated based on the target data field, the samples in the at least one sample having an encoded representation of the corresponding data record and a sample field corresponding to the target data field of the corresponding data record.
In accordance with embodiments of the present disclosure, a user (e.g., a sample producer) is allowed to specify a field desired as auxiliary information from fields included in a data record according to actual requirements. In this way, the sample production is no longer subject to the preset auxiliary information. On the one hand, the configuration efficiency of the auxiliary information can be improved, the period of sample production can be shortened, and the period of model training can be shortened. On the other hand, the field as the sample auxiliary information can be flexibly set according to the need, so that any required field can be reserved in the sample as the auxiliary information.
Some example embodiments of the present disclosure will be described below with continued reference to the accompanying drawings.
Example procedure
Referring to fig. 3, fig. 3 illustrates a flow chart of a process 300 of data processing according to some embodiments of the present disclosure. The process 300 may be implemented at the electronic device 110, for example, by the application 120. For example only, the process 300 is described below with reference to fig. 1.
At block 310, the electronic device 110 presents an input control for specifying a data field for at least one data record 101. Each data record 101 includes a set of data fields. For example, each data record 101 may include the same set of data fields. Hereinafter, the values of the respective fields (also referred to as field values) in the data record 101 are also referred to as original values.
The input controls may be any form of control that enables a user to specify data fields in a data record. For example, the input control may include an input box through which at least a portion of the name of a field may be entered. The user may input the complete field name, or may input a portion of the field name, and may determine the data field matching the entered portion. As another example, the input control may be a selection control that includes a plurality of options by which a selection may be made from one or more fields of the data record.
In some embodiments, the data records 101 may be organized in multiple data tables. For example, each data table may include one or more fields in the set of fields. It will be appreciated that two different data tables may include the same field. In such an embodiment, for the user to specify the data field as auxiliary information, a selection control, also referred to as a first selection control, for multiple data tables may be presented first. If a selection of a particular one of the plurality of data tables is received through a particular selection control, the selection control for one or more fields in the selected data table, also referred to as a second selection control, may be further presented.
An example is described below with reference to fig. 4 and 5. FIG. 4 illustrates an example interface 400 for configuring sample production tasks. An area 410 in the interface 400 is used to present configuration information and controls related to the auxiliary information. For example, a control 420 for adding a field as auxiliary information is shown in fig. 4. If control 420 is triggered, an interface for specifying data fields, such as interface 500 shown in FIG. 5, may be presented.
In the example interface 500, a control 510 is displayed. In response to control 510 being triggered, a plurality of selection controls 501, 502, 503, and 504 are presented, which are directed to the first table, the second table, the third table, and the other tables, respectively. Illustratively, if the selection control 501 is triggered, i.e., a selection of the first table is received, a selection control 520 is presented for the field "item_source" in the first table. Thus, the field "item_source" may be selected.
The data tables may be partitioned in any suitable basis or criteria. For example, the division of the data table may be determined based on the contents described in the data record 101. In some embodiments, at least one data record 101 is used to record the behavior made by a first object on a second object. As an example and not by way of any limitation, what is recorded by the data record is that company a has down-regulated the selling price of product B. In this example, the first object is company A, the second object is product B, and the action is to down-regulate the selling price. As another example, in a recommended scenario, it may be that the first object has selected a recommended second object as recorded by the data record.
In such an embodiment, the plurality of data tables may include a first data table for recording information related to the behavior. For example, the first data table may also be referred to as a behavior table, which may include any suitable fields related to a behavior, such as a field for a specific type of recording behavior, a field for a time of occurrence of a recording behavior, a field for an address at which a recording behavior occurs, etc.
Alternatively or additionally, the plurality of data tables may comprise a second data table for recording information related to the first object. Illustratively, the second data table may also be referred to as a first object table, which may include any suitable fields capable of describing the first object, such as fields for recording various properties of the first object.
Alternatively or additionally, the plurality of data tables may comprise a third data table for recording information relating to the second object. Illustratively, the third data table may also be referred to as a second object table (such as an item table), which may include any suitable fields capable of describing the second object, such as fields for recording various attributes of the second object. For the above example, the third data table may include a name field, a type field, a price field, and the like for the good.
For the user to specify (e.g., select) the data fields, electronic device 110 may obtain information for the data fields in any suitable manner. For example, the electronic device 110 may extract or read information of the data fields from the data record. Embodiments of the disclosure are not limited in this respect.
With continued reference to fig. 3. At block 320, the electronic device 110 receives user input specifying a target data field in a set of data fields through an input control. It should be understood herein that the data field specified by the user through the input control is referred to as the target data field. That is, the value of the user desired target data field may be reserved as auxiliary information.
As an example, the user may directly input the name of the target data field. As another example, the user may select the target data field through a selection control. For example, for the example of FIG. 5 above, if the user triggers select control 520, the field "Item_Source" may be determined to be the target data field.
It should be understood that there may be multiple target data fields specified. The embodiments described below with reference to one target data field may be applied to any of a plurality of target data fields.
With continued reference to fig. 3, at block 330, the electronic device 110 generates at least one sample 102 corresponding to at least one data record, respectively, based on the target data field. Each sample 102 has an encoded representation of the corresponding data record 101 and a sample field corresponding to a target data field of the corresponding data record 101. For example, sample 102-1 has an encoded representation of data record 101-1 and a sample field corresponding to a target data field of data record 101-1.
The encoded representation of the data record 101 may be a numerical representation generated by converting the original value of the data record 101 in any suitable form (e.g., hash, embedded). It will be appreciated that the coded representation is an implicit representation, relative to the original value, that is machine friendly (e.g., a machine learning model) and not user friendly or difficult to intuitively understand. In contrast, the sample field retained in the sample has a displayed or plain text value, as is readily understood by the user. For example, if the target data field is a commodity price, the sample field may also be a commodity price.
In some embodiments, the sample field has the same value as the target data field for a corresponding pair of samples 102 and data records 101. I.e. the original value of the target data field is preserved by the sample field. Alternatively, in some embodiments, the original value of the target data field may be transformed with a physical meaning, and the transformed value is taken as the value of the sample field. It will be appreciated that this transformation with physical meaning is different from the transformation that generates the encoded representation in that the transformed values are also user-understandable. Such a transformation having a physical meaning may include, for example, a unit transformation (for example, a temperature unit, a length unit, and a mass unit), a coordinate transformation of a position, a representation transformation of time (for example, a transformation between a representation of 12 hours and a representation of 24 hours), and the like.
In some embodiments, the electronic device 110 may determine the configuration of the target data field based on the at least one data record 101. Based on the configuration of the target data field, the configuration of the sample field is determined. The configuration of the field may include attributes of the name, data type (such as string type, bignit type, int type, etc.), length, precision, etc. of the field. Referring to the example of fig. 4, as shown in region 410, a field name "_meta_entity_id" in a sample and a field type "string" in a sample are determined based on the name and field type of a data field "meta_entity_id" in a data record.
In some embodiments, the name of the sample field may include the name of the corresponding target data field. In this way, in downstream use of the samples, the user can intuitively understand the meaning of each sample field for further processing, such as sample screening and the like.
In particular, in some embodiments, a preset symbol may be added as the name of the sample field at a preset location of the name of the target data field. In the example of fig. 4, the symbol "_" is added before the name of the target data field, thereby generating the name of the sample field. For example, a symbol "_id" is added before the name "meta_entity_id" of the data field to generate the name "_meta_entity_id" of the sample field. In such an embodiment, the original names of the fields in the data record are preserved while being distinguished. In this way, the user can easily understand the meaning of each original field.
In some embodiments, the identification information of the target data field and the configuration information of the sample field may also be presented in association prior to generating the sample, such as when performing a sample production task configuration. Continuing with the example of FIG. 4, in region 410, the identification information of the target data field is displayed by a column named "field name (Table name)"; in the same row, the configuration information of the sample field is displayed by columns named "field name in sample", "field type in sample", and the like. In this way, the user may be facilitated to view the configuration of the sample generation task performed.
In some embodiments, the identification information of the target data field includes at least one of: the name of the target data field, or the name of a data table that includes the target data field. For example, in fig. 4, "meta_entity_id" is the name of a data field, and "day_window" in parentheses thereafter is the name of a data table including the data field.
In some embodiments, the sample fields and corresponding values are stored as part of the sample as naming features. That is, in such an embodiment, a naming Feature (Named Feature) may be utilized to save sample side information. In machine learning, named features may refer to data elements with specific names and definitions that may be used for data preprocessing and model training in machine learning algorithms.
Referring to table 1, the processing speed of saving auxiliary information is compared with the above-described scheme of the PB format-based IDL file, the scheme of Example metadata (Example Meta), and the scheme of name Feature as follows. Example Meta is metadata information, such as data items, fields, data types, etc., that instructs the machine learning model how to handle the data. In machine learning, we typically provide data to a model in the form of samples, where each sample includes several features and a label. While Example Meta provides further description and explanation of these features and labels. From Table 1, it can be seen that the Named Feature approach has significant performance advantages.
TABLE 1
Different schemes Processing speed
IDL file in PB format 0.02913
Example Meta 0.083564
Named Feature 0.024884
Through table 1, comparison finds that if auxiliary information is obtained from the Named Feature, the processing time is greatly reduced, and the root cause is that the Named Feautre scheme does not need to perform deserialization when in use, and the data can be directly obtained through the memory after traversing. The PB format IDL file scheme and the sample Meta scheme require the data to be deserialized before the data is fetched. In particular, sample Meta requires a further traversal through the iterator, with more additional consumption.
Thus, in such embodiments the auxiliary information may be obtained based on passing the naming feature. In this way, a wide selection can be made from the fields of the various tables involved in the sample, and the original names of the fields can be used.
In the embodiments of the present disclosure, as long as a field appearing in a sample data source can be defined as auxiliary information. Compared with the traditional mode, the period of the newly added auxiliary information is shortened to 0 days from a few days or even longer according to the complexity, namely, the online development is not needed at all. When new auxiliary information is used, the auxiliary information can be flexibly configured according to service needs almost without waiting for online configuration time. Thus, the period of generating samples and training models can be greatly shortened, and the related cost can be saved.
Example apparatus and apparatus
Fig. 6 illustrates a schematic block diagram of an apparatus 600 for data processing according to some embodiments of the present disclosure. The apparatus 600 may be implemented as or included in the terminal device 110. The various modules/components in apparatus 600 may be implemented in hardware, software, firmware, or any combination thereof.
As shown, the apparatus 600 includes an interface presentation module 610 configured to present an input control for specifying data fields for at least one data record, each of the at least one data record including a set of data fields.
The apparatus 600 further includes an input module 620 configured to receive user input specifying a target data field of a set of data fields via an input control.
The apparatus 600 further comprises a sample generation module 630 configured to generate at least one sample corresponding to the at least one data record, respectively, based on the target data field. The samples of the at least one sample have an encoded representation of the corresponding data record and a sample field corresponding to a target data field of the corresponding data record.
In some embodiments, the sample field has the same value as the target data field for the corresponding data record and sample.
In some embodiments, the apparatus 600 further comprises a determination module configured to determine a configuration of the target data field based on the at least one data record; and determining a configuration of the sample field based on the configuration of the target data field.
In some embodiments, the configuration includes a name. And the determination module is further configured to add a preset symbol as the name of the sample field at a preset position of the name of the target data field.
In some embodiments, the sample fields and corresponding values are stored as part of the sample as naming features.
In some embodiments, the interface presentation module 610 is further configured to present the first selection control for a plurality of data tables, wherein each data table of the plurality of data tables includes one or more data fields of a set of data fields. The input module 620 is further configured to receive a selection of a data table of the plurality of data tables via the first selection control. The interface presentation module 610 is further configured to present a second selection control for one or more data fields in the selected data table.
In some embodiments, at least one data record is used to record behavior made by the first object on the second object, and the plurality of data tables includes the following: and the first data table is used for recording information related to the behaviors. A second data table for recording information related to the first object, and a third data table for recording information related to the second object.
In some embodiments, the interface presentation module 610 is further configured to present the identification information of the target data field and the configuration information of the sample field in association.
In some embodiments, the identification information includes at least one of: the name of the target data field, or the name of a data table that includes the target data field.
Fig. 7 illustrates a block diagram that shows an electronic device 700 in which one or more embodiments of the disclosure may be implemented. It should be understood that the electronic device 700 illustrated in fig. 7 is merely exemplary and should not be construed as limiting the functionality and scope of the embodiments described herein. The electronic device 700 shown in fig. 7 may be used to implement the electronic device 110 of fig. 1.
As shown in fig. 7, the electronic device 700 is in the form of a general-purpose electronic device. Components of electronic device 700 may include, but are not limited to, one or more processors or processing units 710, memory 720, storage 730, one or more communication units 740, one or more input devices 750, and one or more output devices 760. The processing unit 710 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 720. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel to improve the parallel processing capabilities of electronic device 700.
Electronic device 700 typically includes a number of computer storage media. Such a medium may be any available media that is accessible by electronic device 700, including, but not limited to, volatile and non-volatile media, removable and non-removable media. The memory 720 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (e.g., read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory), or some combination thereof. Storage device 730 may be a removable or non-removable media and may include machine-readable media such as flash drives, magnetic disks, or any other media that may be capable of storing information and/or data and that may be accessed within electronic device 700.
The electronic device 700 may further include additional removable/non-removable, volatile/nonvolatile storage media. Although not shown in fig. 7, a magnetic disk drive for reading from or writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk may be provided. In these cases, each drive may be connected to a bus (not shown) by one or more data medium interfaces. Memory 720 may include a computer program product 725 having one or more program modules configured to perform the various methods or acts of the various embodiments of the disclosure.
The communication unit 740 enables communication with other electronic devices through a communication medium. Additionally, the functionality of the components of the electronic device 700 may be implemented in a single computing cluster or in multiple computing machines capable of communicating over a communication connection. Thus, the electronic device 700 may operate in a networked environment using logical connections to one or more other servers, a network Personal Computer (PC), or another network node.
The input device 750 may be one or more input devices such as a mouse, keyboard, trackball, etc. The output device 760 may be one or more output devices such as a display, speakers, printer, etc. The electronic device 700 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., through the communication unit 740, with one or more devices that enable a user to interact with the electronic device 700, or with any device (e.g., network card, modem, etc.) that enables the electronic device 700 to communicate with one or more other electronic devices, as desired. Such communication may be performed via an input/output (I/O) interface (not shown).
According to an exemplary implementation of the present disclosure, a computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions are executed by a processor to implement the method described above is provided. According to an exemplary implementation of the present disclosure, there is also provided a computer program product tangibly stored on a non-transitory computer-readable medium and comprising computer-executable instructions that are executed by a processor to implement the method described above.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus, devices, and computer program products implemented according to the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various implementations of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of implementations of the present disclosure has been provided for illustrative purposes, is not exhaustive, and is not limited to the implementations disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various implementations described. The terminology used herein was chosen in order to best explain the principles of each implementation, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand each implementation disclosed herein.

Claims (12)

1. A data processing method, comprising:
presenting an input control for specifying data fields for at least one data record, each of the at least one data record comprising a set of data fields;
receiving, by the input control, user input specifying a target data field of the set of data fields; and
at least one sample corresponding to the at least one data record, respectively, is generated based on the target data field, the samples in the at least one sample having an encoded representation of the corresponding data record and a sample field corresponding to the target data field of the corresponding data record.
2. The method of claim 1, wherein the sample field has the same value as the target data field for a corresponding data record and sample.
3. The method of claim 1, further comprising:
determining a configuration of the target data field based on the at least one data record; and
based on the configuration of the target data field, a configuration of the sample field is determined.
4. The method of claim 3, wherein the configuration comprises a name, and determining the configuration of the sample field comprises:
and adding a preset symbol at a preset position of the name of the target data field as the name of the sample field.
5. The method of claim 1, wherein the sample field and corresponding value are stored as part of the sample as a naming feature.
6. The method of claim 1, wherein presenting the input control comprises:
presenting a first selection control for a plurality of data tables, wherein each data table of the plurality of data tables includes one or more data fields of the set of data fields;
receiving, by the first selection control, a selection of a data table of the plurality of data tables; and
a second selection control is presented for the one or more data fields in the selected data table.
7. The method of claim 6, wherein the at least one data record is for recording behavior by a first object on a second object, and the plurality of data tables comprises:
a first data table for recording information relating to said behaviour,
a second data table for recording information related to the first object,
and a second data table for recording information related to the second object.
8. The method of claim 1, further comprising:
identification information of the target data field and configuration information of the sample field are presented in association.
9. The method of claim 8, wherein the identification information comprises at least one of:
the name of the target data field, or
The name of the data table comprising the target data field.
10. A data processing apparatus comprising:
an interface presentation module configured to present an input control for specifying data fields for at least one data record, each of the at least one data record comprising a set of data fields;
an input module configured to receive user input specifying a target data field of the set of data fields through the input control; and
a sample generation module configured to generate at least one sample respectively corresponding to the at least one data record based on the target data field, the samples in the at least one sample having an encoded representation of the corresponding data record and a sample field corresponding to the target data field of the corresponding data record.
11. An electronic device, comprising:
at least one processing unit; and
at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, which when executed by the at least one processing unit, cause the electronic device to perform the method of any one of claims 1 to 9.
12. A computer readable storage medium having stored thereon a computer program executable by a processor to implement the method of any of claims 1 to 9.
CN202310934615.0A 2023-07-27 2023-07-27 Data processing method, device, equipment and storage medium for recommending scene Pending CN116955821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310934615.0A CN116955821A (en) 2023-07-27 2023-07-27 Data processing method, device, equipment and storage medium for recommending scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310934615.0A CN116955821A (en) 2023-07-27 2023-07-27 Data processing method, device, equipment and storage medium for recommending scene

Publications (1)

Publication Number Publication Date
CN116955821A true CN116955821A (en) 2023-10-27

Family

ID=88457946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310934615.0A Pending CN116955821A (en) 2023-07-27 2023-07-27 Data processing method, device, equipment and storage medium for recommending scene

Country Status (1)

Country Link
CN (1) CN116955821A (en)

Similar Documents

Publication Publication Date Title
US10255108B2 (en) Parallel execution of blockchain transactions
CN111026568B (en) Data and task relation construction method and device, computer equipment and storage medium
CN110493342B (en) File transmission method and device, electronic equipment and readable storage medium
CN110688111A (en) Configuration method, device, server and storage medium of business process
CN106503069A (en) File sharing method and device
CN111291936B (en) Product life cycle prediction model generation method and device and electronic equipment
US8539492B1 (en) Managing data dependencies among multiple jobs using separate tables that store job results and dependency satisfaction
EP3264254B1 (en) System and method for a simulation of a block storage system on an object storage system
CN111767267B (en) Metadata processing method and device and electronic equipment
US9535713B2 (en) Manipulating rules for adding new devices
US9176645B1 (en) Manipulating collections of items in a user interface
US20220360458A1 (en) Control method, information processing apparatus, and non-transitory computer-readable storage medium for storing control program
CN116955821A (en) Data processing method, device, equipment and storage medium for recommending scene
US20090164197A1 (en) Method for transforming overlapping paths in a logical model to their physical equivalent based on transformation rules and limited traceability
US10331693B1 (en) Filters and event schema for categorizing and processing streaming event data
CN115599401A (en) Publishing method, device, equipment and medium of user-defined model
US11809833B2 (en) System and method for image localization in knowledge base content
CN110019507B (en) Data synchronization method and device
US20240176766A1 (en) Dynamic modeling using profiles
CN112084168B (en) Label preservation method, device and server
US11797892B1 (en) Systems and methods for customizing user interfaces using artificial intelligence
CN114416805B (en) Data checking method and device, computer equipment and storage medium
CN115357604B (en) Data query method and device
US20230230324A1 (en) Method, System, Equipment and Medium for Modifying the Layering Layer Information of Finite Element Model Unit
CN106503176B (en) Solution searching system and operation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination