CN112445499A - Derived variable determination method, device, equipment and storage medium - Google Patents

Derived variable determination method, device, equipment and storage medium Download PDF

Info

Publication number
CN112445499A
CN112445499A CN202011359460.5A CN202011359460A CN112445499A CN 112445499 A CN112445499 A CN 112445499A CN 202011359460 A CN202011359460 A CN 202011359460A CN 112445499 A CN112445499 A CN 112445499A
Authority
CN
China
Prior art keywords
data
programming language
data structure
processed
derived
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011359460.5A
Other languages
Chinese (zh)
Inventor
刘玉德
黄启军
陈瑞钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011359460.5A priority Critical patent/CN112445499A/en
Publication of CN112445499A publication Critical patent/CN112445499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting

Abstract

The invention discloses a method, a device, equipment and a storage medium for determining derived variables, which are applied to a client terminal of a first programming language, wherein the client terminal is provided with derived variable codes written in a second programming language, and the method comprises the following steps: acquiring data to be processed, and converting the data to be processed into first data of a first data structure, wherein the first data structure is suitable for a second programming language; and performing variable derivation on the first data according to the derived variable code to generate derived variables, wherein the derived variables are used for model training. Because the codes written by the second programming language are directly deployed in the client terminal of the first programming language, the derived variable codes can be directly called to generate the derived variables by converting the data to be processed into the data structure suitable for the second programming language, developers do not need to rewrite the derived variable codes by using the first programming language, and the efficiency of obtaining the derived variables is improved while the deployment process of the derived variable codes is simplified.

Description

Derived variable determination method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of financial technology (Fintech), in particular to a method, a device and equipment for determining derivative variables and a readable storage medium.
Background
With the continuous development of internet science and technology finance, more and more technologies (such as distributed, Blockchain, artificial intelligence and the like) are applied to the financial field, and meanwhile, higher requirements are provided for controlling financial risks. The construction of the derivative variables is a common data processing method in the field of financial risk models, and the method can be widely applied to wind control models because new variables can be mined and the dimensionality of data is enhanced.
In the prior art, risk modeling personnel often write derivative variable codes through a Python language. However, since the current wind control systems are mostly written based on other programming languages, for example: JAVA language. Therefore, after the risk modeling personnel write the derivative variable codes through the Python language, developers need to rewrite logic of the derivative variable codes through the JAVA language, and after the risk modeling personnel perform manual verification, the JAVA language rewrite derivative variable codes are deployed in the wind control system to obtain the derivative variables of all data. However, the deployment process of the derived variable code is cumbersome, which results in low efficiency of obtaining the derived variable.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for determining a derivative variable, and aims to solve the technical problems that the process of deploying derivative variable codes is complicated and the efficiency of obtaining the derivative variable is low in the prior art.
In order to achieve the above object, the present invention provides a derived variable determining method, applied to a client terminal in a first programming language, where a derived variable code written in a second programming language is deployed in the client terminal, the method including:
acquiring data to be processed;
converting the data to be processed into first data of a first data structure, wherein the first data structure is suitable for a second programming language;
and performing variable derivation on the first data of the first data structure according to the derived variable code to generate derived variables of the first data, wherein the derived variables are used for model training.
Optionally, converting the data to be processed into the first data of the first data structure includes:
converting the data to be processed into second data of a second data structure, wherein the second data structure is suitable for the first programming language;
second data of the second data structure is converted into first data of the first data structure.
Optionally, converting the data to be processed into second data of a second data structure, including:
converting data to be processed acquired through the Internet into a data object of a first programming language;
the data object in the first programming language is converted to second data in a second data structure.
Optionally, converting the second data of the second data structure into the first data of the first data structure includes:
converting the second data of the second data structure into third data of a third data structure, wherein the third data structure is suitable for a third programming language;
the third data of the third data structure is converted into the first data of the first data structure.
Optionally, converting the data to be processed acquired through the internet into a data object in the first programming language, including:
acquiring a mapping code corresponding to the data to be processed according to the data type of the data to be processed, wherein the mapping code is used for mapping the data to be processed into a data object of a first programming language;
and converting the data to be processed into a data object of a first programming language according to mapping codes, wherein the mapping codes are generated according to the data type of the data to be processed.
Optionally, converting the data object in the first programming language into second data in a second data structure, including:
if the memory database is allowed to be used, storing the data object of the first programming language into the memory database, and outputting second data of a second data structure through the memory database;
alternatively, the first and second electrodes may be,
and if the memory database is forbidden to be used, converting the data object of the first programming language into second data of a second data structure through the reflection mechanism of the first programming language.
Optionally, after performing variable derivation on the first data of the first data structure according to the derived variable code, and generating a derived variable of the first data, the method further includes:
converting the data structure of the derived variable into a second data structure, the second data structure being adapted for use in the first programming language;
the derived variables of the second data structure are used for model training.
The invention also provides a derived variable determining device, which is applied to a client terminal of a first programming language, wherein the client terminal is deployed with a derived variable code written in a second programming language, and the device comprises:
the acquisition module is used for acquiring data to be processed;
the processing module is used for converting the data to be processed into first data of a first data structure, and the first data structure is suitable for a second programming language;
and the output module is used for carrying out variable derivation on the first data of the first data structure according to the derived variable code to generate derived variables of the first data, wherein the derived variables are used for model training.
Optionally, the processing module is specifically configured to: converting the data to be processed into second data of a second data structure, wherein the second data structure is suitable for the first programming language;
second data of the second data structure is converted into first data of the first data structure.
Optionally, the processing module is specifically configured to: converting data to be processed acquired through the Internet into a data object of a first programming language;
the data object in the first programming language is converted to second data in a second data structure.
Optionally, the processing module is specifically configured to: converting the second data of the second data structure into third data of a third data structure, wherein the third data structure is suitable for a third programming language;
the third data of the third data structure is converted into the first data of the first data structure.
Optionally, the processing module is specifically configured to: acquiring a mapping code corresponding to the data to be processed according to the data type of the data to be processed, wherein the mapping code is used for mapping the data to be processed into a data object of a first programming language;
and converting the data to be processed into a data object of a first programming language according to mapping codes, wherein the mapping codes are generated according to the data type of the data to be processed.
Optionally, the processing module is specifically configured to: if the memory database is allowed to be used, storing the data object of the first programming language into the memory database, and outputting second data of a second data structure through the memory database;
alternatively, the first and second electrodes may be,
and if the memory database is forbidden to be used, converting the data object of the first programming language into second data of a second data structure through the reflection mechanism of the first programming language.
Optionally, the processing module is further configured to: after the derived variables of the first data are generated,
converting the data structure of the derived variable into a second data structure, the second data structure being adapted for use in the first programming language;
the derived variables of the second data structure are used for model training.
The present invention also provides an electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor;
the computer program, when being executed by a processor, realizes the steps of the method for determining derived variables according to any of the embodiments of the first aspect.
The present invention further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the derived variable determining method as provided in any of the embodiments of the first aspect.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the derived variable determination method as provided in any of the embodiments of the first aspect.
The method for determining the derived variable is applied to a client terminal of a first programming language, wherein a derived variable code written in a second programming language is deployed at the client terminal, and comprises the following steps: acquiring data to be processed, and converting the data to be processed into first data of a first data structure, wherein the first data structure is suitable for a second programming language; and performing variable derivation on the first data of the first data structure according to the derived variable code to generate derived variables of the first data, wherein the derived variables are used for model training. The method has the advantages that the codes written by the second programming language are directly deployed in the client terminal of the first programming language, and then the data structure of the data to be processed is converted into the data structure suitable for the second programming language, so that the client terminal can directly call the codes written by the second language to process the converted data, developers do not need to rewrite the logic of the derived variable codes by using the first programming language in the process, the deployment flow of the derived variable codes is simplified, and meanwhile, the efficiency of obtaining the derived variables is improved.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention;
fig. 2 is a schematic flow chart of a derived variable determination method according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a derived variable determination method according to another embodiment of the present invention;
fig. 4 is a schematic flow chart of a derived variable determining method according to another embodiment of the present invention;
FIG. 5 is a flow chart illustrating data structure conversion according to another embodiment of the present invention;
FIG. 6 is a schematic diagram of data structure conversion according to another embodiment of the present invention;
fig. 7 is a schematic structural diagram of a derived variable determining apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Constructing derivative variables is a common data processing method in the field of financial risk models, and is an essential link for applying data to wind control models. On one hand, the original data are generally in an unstructured data format and have the characteristic of scattered distribution, and are connected through a main key, so that the original data cannot be directly transmitted to a wind control model for use; on the other hand, the original data often has a large amount of redundant invalid data, and the accuracy of the model is reduced by directly inputting the data into the model for use. Therefore, the derivative variables are processed into a wide table form, redundant data in the original data are removed, the redundant data can be used in the wind control model, and a more accurate model is obtained.
In the prior art, building derived variable codes requires modeling personnel to have rich domain knowledge, and the derived variable codes are usually built by using a flexible language, such as: python language, etc., which is then deployed in the wind control system when the variables are applied to production.
However, since the current wind control systems are mostly written based on other programming languages, for example: the JAVA language, taking the two programming languages as an example, after the modeling personnel writes the derivative variable code through the Python language, the logic of the derivative variable code needs to be rewritten by the developer through the JAVA language, and after the artificial verification by the risk modeling personnel, the JAVA language is deployed in the wind control system, so that the derivative variable is obtained by rewriting the derivative variable code of the logic. However, on one hand, a long development period is required for each update of the derived variables in the wind control system, and on the other hand, the function of the duplicated derived variable codes requires a modeling person and a developer to perform repeated test checks. The above process is cumbersome, consumes a lot of manpower and time, and results in low efficiency of obtaining derived variables.
In view of this, the present invention provides a method, an apparatus, a device, and a storage medium for determining derived variables, in which a derived variable code written in a second programming language is directly deployed in a client terminal of a wind control system written in a first programming language, and after acquiring original data, a data structure of the original data is converted into a data structure applicable to the second programming language, so that the client terminal can directly call a code written in the second language to process the converted data, so as to obtain derived variables of the original data. In the process, developers do not need to rewrite the logic of the derived variable codes by using the first programming language, and the efficiency of obtaining the derived variables can be improved while the process of deploying the derived variable codes is simplified.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention. As shown in fig. 1, the present scenario includes a client terminal 100, a database 101, a server 102, and a terminal device 103. A wind control system runs in the client terminal 100, a constructed derivative variable code is deployed in the wind control system, and a training model is arranged in the server 102 and used for performing model training according to the derivative variable.
Specifically, the client terminal 100 is in communication connection with the server 102 and the database 101, respectively, and is configured to send the derived variable to the server 102 or the database 101; the server 102 is in communication connection with the database 101, and is used for the server 102 to obtain the stored derived variables from the database 101; the client terminal 100 is in communication connection with the terminal device 103, and is used for the client terminal 100 to acquire the data to be processed sent by the terminal device 103.
It should be noted that the terminal device 103 is a data provider, and the present invention is not particularly limited to the type of the data provider, and for example, the data provider may be another bank terminal.
In some scenarios, the wind control system is developed based on a first programming language, and the first programming language is not specifically limited in the present invention, for example, the first programming language may be: JAVA programming language, C programming language, etc.; the derived variable code is written based on a second programming language, and the invention is not particularly limited to the second programming language, for example, the second programming language may be: python programming language.
In practical application, when the wind control system is used for generating the derivative variable, the wind control system is used as an active calling party of a program and is responsible for starting the program and loading data so as to generate the derivative variable.
Specifically, the client terminal 100 sends an http request to the terminal device 103 through the network, and after the terminal device 103 receives the request sent by the client terminal 100, the data to be processed is sent to the client terminal 100 through the network.
Further, the client terminal 100 calls the derived variable code through the wind control system to generate a derived variable corresponding to the data to be processed through the derived variable code.
Further, the generated derivative variables are input into the server 102, and model training is performed according to a training model, where the type of the training model is not specifically limited in the embodiments of the present invention, and the training model may be, for example, a prediction model.
In one alternative, after the derivative variables are generated, they may also be entered into the database 101 for storage, so that they can be called at the time of use, and the server 102 may retrieve the derivative variables from the database 101 for model training.
In the invention, because the code written by adopting the second programming language is directly deployed in the client terminal of the first programming language, and then the data structure of the data to be processed is converted into the data structure suitable for the second programming language, the client terminal can directly call the code written by the second language to process the converted data, and developers do not need to rewrite the logic of the derived variable code by using the first programming language in the process, thereby simplifying the deployment process of the derived variable code and improving the efficiency of obtaining the derived variable.
Fig. 2 is a schematic flow chart of a method for determining derived variables according to an embodiment of the present invention. The method is applied to a client terminal of a first programming language, the client terminal is deployed with a derivative variable code written in a second programming language, and as shown in fig. 2, the derivative variable determining method may include the following steps:
s201, acquiring data to be processed.
It should be noted that the first programming language and the second programming language are different programming languages, and the present invention is not limited to specific types of the first programming language and the second programming language, for example, in some scenarios, the first programming language is JAVA programming language, the second programming language is Python programming language, in other application scenarios, the first programming language and the second programming language may also be other types of programming languages, and the scheme of the present application is described by taking the first programming language as JAVA programming language, and the second programming language as Python programming language as an example.
In this step, the client terminal may be a terminal arbitrarily deployed with a wind control system, and the embodiment of the present invention is not particularly limited, for example, the client terminal may be a bank terminal.
In practical application, the client terminal may send an http request to the data provider, and the data provider sends the data to be processed to the client terminal through the network to obtain the data to be processed.
The data to be processed may also be any kind of data, and the embodiment of the present invention is not particularly limited, and for example, the client terminal is taken as a bank terminal, and the data to be processed may be credit data of a user. It should be noted that, because the data providers are different, and the data format and the data structure of the to-be-processed data provided by each data provider are different, the present invention is not limited to the data format and the data structure of the to-be-processed data.
Optionally, in the process of data interaction, the data to be processed of the interaction may be encrypted. For example, in this step, the data provider may encrypt the data to be processed and send the encrypted data to the client terminal, so as to ensure the security of the data and meet the requirement of privacy protection calculation.
S202, converting the data to be processed into first data of a first data structure.
Wherein the first data structure is adapted for a second programming language.
It should be noted that, in the derived variable determining method provided by the present invention, since the deployed derived variable code is written based on the second programming language, and the data format of the acquired to-be-processed data may be difficult to apply, after the to-be-processed data is acquired, the data structure of the to-be-processed data needs to be converted into a data structure compatible with the second programming language, so that the derived variable code can normally run.
Specifically, taking the second programming language as Python language as an example, the first data of the first data structure may be data of a DataFrame data structure in a Pandas database in Python language, and in this step, when the second programming language is another type of language, the first data of the first data structure is a data structure corresponding to the language.
Continuing to take the above example as an example, on the one hand, the Pandas database is not limited to read local offline files, but also can read data of the database online, and write the data back to the database after processing, so that the data can be read in real time, and flexibility of determination of derivative variables can be increased while satisfying most typical cases in fields of financial, statistical, social science and the like.
On the other hand, the DataFrame data structure is a tabular data structure and is composed of a plurality of columns of data arranged in a certain sequence, and the cells of the DataFrame data structure can store data of various data types such as numerical values, character strings and the like, so that the processing requirements of the data of different data types can be met.
And S203, performing variable derivation on the first data according to the derived variable code to generate a derived variable of the first data.
Wherein the derived variables are used for model training.
In an alternative, the derived variable code may be a data management and Analysis language (SAS) code.
In practical application, the derived variable code is written according to the data to be processed, and in the present invention, the derived variable code may be written in various ways, and the embodiment of the present invention is not particularly limited.
Taking the data to be processed as credit investigation data of the user as an example, on one hand, a modeling worker can perform summation, averaging, counting and other processing on the information according to data such as transaction times, transaction amount, transaction date and the like in the credit investigation data and by combining modeling experience, so as to write derivative variable codes.
By the method, information in the data to be processed can be fully explored, and the derived variable codes can be designed according to requirements, so that the reliability of the derived variable codes is improved.
On the other hand, the derived variable code may be automatically generated by a code generation tool, specifically, the generation logic of the derived variable code is determined according to the type of the data to be processed, and the derived variable code is generated according to the generation logic.
By the method, the generation efficiency of the derived variable code can be improved, so that the efficiency of obtaining the derived variable is improved.
Further, the derived variable code is operated based on the current operating environment, variable derivation processing is performed on the first data of the first structure, and a derived variable corresponding to the first data is generated, wherein the current operating environment may be an offline environment.
Optionally, after generating the derivative variables, on one hand, the derivative variables may be directly sent to a model training system for model training.
On the other hand, the derived variables can be stored in a database for calling in the subsequent model training process.
The method for determining the derived variable provided by the embodiment of the invention is applied to a client terminal of a first programming language, wherein a derived variable code written by a second programming language is deployed at the client terminal, and the method comprises the following steps: acquiring data to be processed, and converting the data to be processed into first data of a first data structure, wherein the first data structure is suitable for a second programming language; and performing variable derivation on the first data according to the derived variable codes to generate derived variables of the first data, wherein the derived variable codes are written based on a second programming language, and the derived variables are used for model training. The method has the advantages that the codes written by the second programming language are directly deployed in the client terminal of the first programming language, and then the data structure of the data to be processed is converted into the data structure suitable for the second programming language, so that the client terminal can directly call the codes written by the second language to process the converted data, developers do not need to rewrite the logic of the derived variable codes by using the first programming language in the process, the deployment flow of the derived variable codes is simplified, and meanwhile, the efficiency of obtaining the derived variables is improved.
Fig. 3 is a schematic flow chart of a derived variable determination method according to another embodiment of the present invention. As shown in fig. 3, the derived variable determining method is applied to a client terminal in a first programming language, and the client terminal is deployed with a derived variable code written in a second programming language, and the method may include the following steps:
s301, obtaining data to be processed through the Internet.
In practical application, the client terminal may send an http request to the data provider, and the data provider sends the to-be-processed data to the client terminal through the network after receiving the request, so that the client terminal obtains the to-be-processed data.
S302, converting the data to be processed acquired through the Internet into a data object of a first programming language.
For convenience of understanding, the first programming language is JAVA language for example, please refer to fig. 4, and fig. 4 is a flowchart illustrating a derived variable determining method according to another embodiment of the present invention. As shown in fig. 4, after the to-be-processed data is acquired, the to-be-processed data acquired through the internet is converted into a data object in a first programming language, where the data object in the first programming language is a data object in JAVA language.
It should be noted that, because the data providers are different, the data formats of the obtained to-be-processed data are also different, and by this step, the obtained to-be-processed data can be converted into a data object in the first programming language, so that the current client terminal can directly call the data object. The following specifically describes the conversion process of step S302 with reference to steps S3021 and S3022:
and S3021, acquiring a mapping code corresponding to the data to be processed according to the data type of the data to be processed.
The mapping code is used for mapping the data to be processed into a data object of a first programming language, and the mapping code is generated according to the data type of the data to be processed.
Specifically, a set of Mapping codes suitable for the Mapping relationship between the http data source and the JAVA data Object is generated according to the type of the data to be processed through an Object Relational Mapping (ORM) technique, and the embodiment of the present invention is not particularly limited to the type of the data to be processed. For example, the types of data to be processed may be: personal credit data, enterprise credit data, and the like.
Correspondingly, the data mapping code may be: personal credit investigation data mapping codes and enterprise credit investigation data mapping codes.
Furthermore, after the data to be processed is obtained through the http request, the type of the data to be processed is determined, and the data mapping code is determined according to the type of the data to be processed.
Illustratively, when the type of the data to be processed is the personal credit data mapping code, the data mapping code is determined to be the personal credit data mapping code.
And S3022, converting the data to be processed into a data object of the first programming language according to the mapping codes.
Further, the data mapping code is executed based on the current operating environment, and a data object in the JAVA language corresponding to the data to be processed is generated, which may be understood that the current operating environment may be an offline environment.
Through the steps, the conversion of the program data object to the data object of the first programming language can be realized, so that the current operating environment can directly call the data object.
S303, converting the data object of the first programming language into second data of a second data structure.
In one embodiment, the second data of the second data structure is applicable to the first programming language, and specifically, when the first programming language is JAVA, the second data of the second data structure may be a data set represented by a HashMap data structure in the JAVA language.
In practical applications, the data set of the HashMap data structure can be represented as: HashMap < String, HashMap < String, ArrayLIst < String > >).
Each key-value pair in the HashMap of the outer layer represents a table, namely each key-value pair is in the form of 'table name- > table content'; the HashMap < String, array < String > of the inner layer refers to the contents of a single table, i.e., each key-value pair represents a form of "field name- > field contents".
In practical application, each keyword key corresponds to a unique value in a data set of a HashMap data structure, in some scenes, a data object in JAVA language is converted into the data structure of the data set of the HashMap data structure, the corresponding value can be quickly accessed through the keywords in the data, and the process has high access speed, so that the speed of data query and modification can be increased, and the performance of system derivative variable determination is further improved.
In practical applications, there are various ways to convert the data object in JAVA language into the data set represented by the HashMap data structure, and the following detailed description is provided with reference to steps S3031 to S3032:
s3031, judging whether the memory database is allowed to be used or not.
Specifically, after obtaining the data object in the first programming language, it is necessary to determine whether the current data processing environment allows the in-memory database to be used.
S3032, if the memory database is allowed to be used, storing the data object of the first programming language into the memory database, and outputting the second data of the second data structure through the memory database.
Still by way of example, with continuing reference to fig. 4, in an alternative, if the current operating environment allows the use of the in-memory database, the data objects in JAVA language are stored in the in-memory database, all tables in the data objects in JAVA language are obtained from the in-memory database, and the tables are integrated into the data set represented by the HashMap data structure.
By the method, caching and management of the data in the memory library are realized, manageability of the data can be realized, and safety and reliability of the data are guaranteed.
S3033, if the use of the memory database is forbidden, the data object in the first programming language is converted into the second data in the second data structure through the reflection mechanism of the first programming language.
In another alternative, still taking the JAVA language as the first programming language as an example, if the current operating environment prohibits the use of the in-memory database, the data object in the JAVA language is converted into the data set represented by the HashMap data structure through the reflection mechanism in the JAVA language.
The reflection mechanism of the programming language refers to the attribute of the class dynamically acquired by the programming language during running, and is described below by taking the reflection mechanism of the JAVA language as an example:
it should be noted that, the class definition of the data object in the JAVA language is:
main table class:
class member variable 1: field data of the current table;
class member variable 2: and a sub-table, wherein the structure of the sub-table is the same as that of the main table.
The definition of the dataset represented by the HashMap data structure is:
master table name- > master table data
Name of sub-table- > data of sub-table
Specifically, a data object in JAVA language is used as input data of the reflection mechanism, and class information of the data object in JAVA language is obtained to determine whether the data object is a table object.
Further, if the data object is determined to be a table object, the table name in the data object is obtained, all member variables of the data object are traversed, the field data are divided together and combined into the field data of the current table, and a data set represented by a HashMap data structure is formed according to all the field data of the current table.
Furthermore, if the data object is determined to be a sub-table, the recursive call of the table processing logic is continued, and after all the member variables of the data object are accessed, the recursive call is ended.
Through the reflection mechanism of the first programming language, all properties and methods of any class can be known, any method and property of any object can be called, and the property of the object can be changed, so that the dynamic loading access can be realized in the running process of the program, and the flexibility of the program is increased.
It should be noted that, in some scenarios, if the first programming language is another type of programming language, for example: the C language is a reflection mechanism of the C language, so as to convert the data object of the C language into the data set of the corresponding data structure. Specifically, if the first programming language is C language, the second data of the second data structure is a data set represented by the fact data structure, that is, the step is: the data object in C language is converted into a dataset represented by a dit data structure.
S304, converting the second data of the second data structure into the first data of the first data structure.
In practical applications, since the second data of the second data structure is applicable to the first programming language, and the derived variable code is written based on the second programming language, the second data of the second data structure needs to be converted into the first data of the first data structure applicable to the second programming language, so as to realize the calling of the derived variable code.
Referring to fig. 5, fig. 5 is a schematic flow chart illustrating data structure conversion according to another embodiment of the present invention.
Still by way of example, after the dataset represented by the HashMap data structure is generated, it is converted into a dataset represented by a ditt data structure for which the Python language is applicable. The following describes the data conversion process in detail with reference to steps S3041 and S3042:
s3041, converting the second data of the second data structure into third data of a third data structure.
The third data structure is applicable to a third programming language, where the third programming language, the first programming language and the second programming language may both be directly invoked, and the type of the third programming language is not specifically limited in the embodiment of the present invention, and the third programming language may be, for example, a C language or a C + + language.
Specifically, the data set represented by the HashMap data structure is converted into the data set represented by the ditt data structure in the C language or the C + + language.
In this step, the data type in the data set represented by the ditt data structure in the C language or the C + + language includes: str, dit, list data types. The data types respectively correspond to the data types in the data set represented by the HashMap data structure: string, HashMap, ArrayList data types.
From the correspondence, it can be derived that the dit data structure can be expressed as: a data set of dit < str, dit < String, list < str > >.
Each key-value pair in the outer layer dit represents a table, namely each key-value pair is in the form of 'table name- > table content'; the fact < String, list < String > of the inner layer refers to the contents of a single table, i.e., each key-value pair represents a form of "field name- > field contents".
In this step, as to the above data structure conversion manner, the embodiment of the present application is not particularly limited, for example, a corresponding data structure conversion code may be developed to implement conversion from the second data of the second data structure to the third data of the third data structure.
S3042, converting the third data of the third data structure into the first data of the first data structure.
Further, the data set represented by the dit data structure in the third programming language is converted into the data set represented by the dit data structure in the second programming language.
It should be noted that, the data set represented by the fact data structure in the second programming language may be represented as: a data set of fact < str, fact < String, list < str > > >, wherein each key-value pair in the fact of the outer layer represents a table, namely each key-value pair is in the form of "table name- > table content"; the fact < String, list < String > of the inner layer refers to the contents of a single table, i.e., each key-value pair represents a form of "field name- > field contents".
Further, with continued reference to fig. 5, after obtaining the dataset represented by the ditt data structure, each table in the dataset is converted into the first data of the first data structure in the first programming language, for example, the data of the DataFrame data structure in the Pandas database in Python language, and in particular, corresponding data structure conversion codes may be developed to realize the conversion of the third data structure into the first data of the first data structure.
The principle and beneficial effects of the conversion process are similar to those of S202 in the embodiment shown in fig. 2, and reference may be made to the embodiment shown in fig. 2, which is not repeated herein.
In the scheme, because a JAVA language and a Python language have no interface for mutual calling, however, JAVA language, Python language and C language or C + + language can be directly called by loading a dynamic link library, the second data of the second data structure is converted into third data of a third data structure suitable for C language or C + + language, and then the third data of the third data structure is converted into the first data of the first data structure, so that the client terminal based on JAVA language can call the derivative variable code written based on Python language.
It should be noted that, in some scenarios, if the first programming language is another type of programming language, for example: and C, the second data of the second data structure is a data set represented by the dit data structure, and the second data of the second data structure is directly converted into the first data of the first data structure without being converted into the third data of the third data structure.
S305, performing variable derivation on the first data of the first data structure according to the derived variable code to generate derived variables of the first data.
Specifically, the client terminal written based on the JAVA language operates the derivative variable code written based on the Python language by calling the Python interpreter, and performs variable derivative on the first data of the first data structure to generate a derivative variable of the first data.
The following describes, with reference to a specific embodiment, a process in which a client terminal in JAVA language invokes a Python interpreter to generate a derived variable through data structure conversion in the present invention:
referring to fig. 4, first, after obtaining second data (HashMap data structure suitable for JAVA language) of a second data structure, the second data is converted into third data of a dit data structure suitable for C language or C + + language by a data structure conversion code, and then the third data is converted into first data of a dit data structure suitable for Python language by a data structure conversion code;
further, the client terminal assigns the first data of the ditt data structure to a Python interpreter by a setValue method;
furthermore, the client terminal executes a derivative variable code in the Python interpreter by an execute method, and performs variable derivative on the first data of the ditt data structure to generate a derivative variable corresponding to the first data.
And finally, the client terminal acquires the derivative variable corresponding to the first data from the Python interpreter by a getValue method.
According to the scheme, the client terminal written based on the JAVA language can convert data into data of a first data structure suitable for the Python language through the C language or the C + + language and then transmit the data to the Python parser from Java to realize calling of derivative variable codes of the Python language, so that developers do not need to rewrite logic of the derivative variable codes through the first programming language, and the efficiency of obtaining the derivative variables can be improved while the deployment process of the derivative variable codes is simplified.
S306, converting the data structure of the derived variable into a second data structure.
Wherein the second data structure is adapted for use in the first programming language and the derived variables of the second data structure are used for model training.
Specifically, with continuing reference to fig. 5, the derived variables output in step S305 are in the form of a data set represented by a dit data structure, which is suitable for the second programming language, and since the operating environment of the client terminal provided by the present invention is written in the first programming language, the data structure of the derived variables needs to be converted into the second data structure, for example, the data set represented by the dit data structure is converted into the data set represented by the HashMap data structure, so as to implement the output of the derived variables.
It should be noted that, in this step, the process of converting the data set represented by the dit data structure into the data set represented by the HashMap data structure is the same as the process and principle of converting the data set represented by the HashMap data structure into the data set represented by the dit data structure, and is not described here again. Illustratively, the dataset represented by the dit data structure (first data structure) is first converted into the data of the dit data structure (third data structure) applicable to the C language or the C + + language through the data structure conversion code, and then the data of the dit data structure is converted into the dataset represented by the HashMap data structure (second data structure), so as to output the dataset of the derived variables represented by the HashMap data structure.
Further, after generating the derived variables, in one aspect, the derived variables may be sent directly to a model training system for model training.
Alternatively, the derived variables may be stored in a database for later invocation in model training.
The method provided by the embodiment of the invention converts the second data of the second data structure into the first data of the first data structure, and realizes the mutual calling between the first programming language and the second programming language by taking the third programming language as an interface. The derived variable codes written by the second programming language are directly deployed in the wind control system written by the first programming language, so that the derived variable codes do not need to be rewritten by the first programming language, and the efficiency of obtaining the derived variables can be improved while the deployment process of the derived variable codes is simplified.
In addition, it should be noted that the execution sequence of the steps in the embodiments of the present invention is not limited to the sequence defined by the above serial numbers, and those skilled in the art may perform any configuration according to the specific application requirement and design requirement.
Fig. 7 is a schematic structural diagram of a derived variable determining apparatus according to an embodiment of the present invention. As shown in fig. 7, the derived variable determining apparatus 700 is applied to a client terminal in a first programming language, where a derived variable code written in a second programming language is deployed, and may include:
an obtaining module 701, configured to obtain data to be processed;
a processing module 702, configured to convert data to be processed into first data of a first data structure, where the first data structure is applicable to a second programming language;
the output module 703 is configured to perform variable derivation on the first data of the first data structure according to the derived variable code, and generate a derived variable of the first data, where the derived variable is used for model training.
The derived variable determining apparatus provided in this embodiment may be configured to execute the technical solution provided in any of the foregoing method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
Optionally, the processing module 702 is specifically configured to: converting the data to be processed into second data of a second data structure, wherein the second data structure is suitable for a second programming language;
second data of the second data structure is converted into first data of the first data structure.
Optionally, the processing module 702 is specifically configured to:
converting data to be processed acquired through the Internet into a data object of a first programming language;
the data object in the first programming language is converted to second data in a second data structure.
Optionally, the processing module 702 is specifically configured to: converting the second data of the second data structure into third data of a third data structure, wherein the third data structure is suitable for a third programming language;
the third data of the third data structure is converted into the first data of the first data structure.
Optionally, the processing module 702 is specifically configured to: acquiring a mapping code corresponding to the data to be processed according to the data type of the data to be processed, wherein the mapping code is used for mapping the data to be processed into a data object of a first programming language;
and converting the data to be processed into a data object of a first programming language according to mapping codes, wherein the mapping codes are generated according to the data type of the data to be processed.
Optionally, the processing module 702 is specifically configured to: if the memory database is allowed to be used, storing the data object of the first programming language into the memory database, and outputting second data of a second data structure through the memory database;
alternatively, the first and second electrodes may be,
if the memory database is forbidden to be used currently, the data object of the first programming language is converted into second data of a second data structure through a reflection mechanism of the first programming language.
Optionally, the processing module 702 is further configured to: after generating the derived variables of the first data, converting the data structure of the derived variables into a second data structure, the second data structure being applicable to the first programming language;
the derived variables of the second data structure are used for model training.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 8, the electronic device provided by the present invention may include:
a memory 801, a processor 802, and a derivative variable determination program stored on the memory 801 and executable on the processor 802;
the derived variable determination program, when executed by the processor 802, implements the steps of the derived variable determination method as described in any of the previous embodiments.
Alternatively, the memory 801 may be separate or integrated with the processor 802.
For the implementation principle and the technical effect of the electronic device provided by this embodiment, reference may be made to the foregoing embodiments, which are not described herein again.
An embodiment of the present invention further provides a computer-readable storage medium, where a derived variable determining program is stored on the computer-readable storage medium, and when being executed by a processor, the derived variable determining program implements the steps of the derived variable determining method according to any of the foregoing embodiments.
Embodiments of the present invention further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the derived variable determining method as provided in any of the foregoing embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device) or a processor execute part of the steps of the methods according to the embodiments of the present invention.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (11)

1. A derived variable determination method is applied to a client terminal of a first programming language, and derived variable codes written in a second programming language are deployed on the client terminal, and the method comprises the following steps:
acquiring data to be processed;
converting the data to be processed into first data of a first data structure, wherein the first data structure is suitable for the second programming language;
and performing variable derivation on the first data according to the derived variable code to generate derived variables of the first data, wherein the derived variables are used for model training.
2. The method of claim 1, wherein converting the data to be processed into first data of a first data structure comprises:
converting the data to be processed into second data of a second data structure, the second data structure being applicable to the first programming language;
converting the second data of the second data structure into the first data of the first data structure.
3. The method of claim 2, wherein converting the data to be processed into second data of a second data structure comprises:
converting the data to be processed acquired through the Internet into a data object of a first programming language;
and converting the data object of the first programming language into second data of a second data structure.
4. The method of claim 2 or 3, wherein converting the second data of the second data structure into the first data of the first data structure comprises:
converting the second data of the second data structure into third data of a third data structure, wherein the third data structure is adapted for a third programming language;
converting the third data of the third data structure into the first data of the first data structure.
5. The method of claim 4, wherein converting the data to be processed, obtained via the internet, into data objects in a first programming language comprises:
acquiring a mapping code corresponding to the data to be processed according to the data type of the data to be processed, wherein the mapping code is used for mapping the data to be processed into a data object of a first programming language;
and converting the data to be processed into a data object of a first programming language according to the mapping code, wherein the mapping code is generated according to the data type of the data to be processed.
6. The method of claim 5, wherein converting the data object in the first programming language to the second data in the second data structure comprises:
if the memory database is allowed to be used, storing the data object of the first programming language into the memory database, and outputting second data of a second data structure through the memory database;
alternatively, the first and second electrodes may be,
and if the memory database is forbidden to be used, converting the data object of the first programming language into second data of a second data structure through a reflection mechanism of the first programming language.
7. The method according to any one of claims 1 to 6, wherein after the variable derivation of the first data structure according to the derived variable code, generating the derived variable of the first data, further comprising:
converting the data structure of the derived variable into a second data structure, the second data structure being applicable to a first programming language;
the derived variables of the second data structure are used for model training.
8. A derived variable determining apparatus applied to a client terminal of a first programming language, the client terminal being deployed with derived variable code written in a second programming language, the apparatus comprising:
the acquisition module is used for acquiring data to be processed;
the processing module is used for converting the data to be processed into first data of a first data structure, and the first data structure is suitable for the second programming language;
and the output module is used for performing variable derivation on the first data of the first data structure according to a derived variable code to generate a derived variable of the first data, wherein the derived variable is used for model training.
9. An electronic device, characterized in that the electronic device comprises:
a memory, a processor, and a computer program stored on the memory and executable on the processor;
the computer program, when being executed by the processor, realizes the steps of the derived variable determination method as set forth in any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a derived variable determination program which, when executed by a processor, implements the steps of the derived variable determination method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 7 when executed by a processor.
CN202011359460.5A 2020-11-27 2020-11-27 Derived variable determination method, device, equipment and storage medium Pending CN112445499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011359460.5A CN112445499A (en) 2020-11-27 2020-11-27 Derived variable determination method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011359460.5A CN112445499A (en) 2020-11-27 2020-11-27 Derived variable determination method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112445499A true CN112445499A (en) 2021-03-05

Family

ID=74737823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011359460.5A Pending CN112445499A (en) 2020-11-27 2020-11-27 Derived variable determination method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112445499A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117406965A (en) * 2023-10-26 2024-01-16 苏州爱医斯坦智能科技有限公司 Visual output method, device, equipment and medium of artificial intelligent model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117406965A (en) * 2023-10-26 2024-01-16 苏州爱医斯坦智能科技有限公司 Visual output method, device, equipment and medium of artificial intelligent model

Similar Documents

Publication Publication Date Title
US10839011B2 (en) Application programing interface document generator
US11334692B2 (en) Extracting a knowledge graph from program source code
US20230057335A1 (en) Deployment of self-contained decision logic
US10169034B2 (en) Verification of backward compatibility of software components
US20210049137A1 (en) Building and managing data-processign attributes for modeled data sources
US11720527B2 (en) API for implementing scoring functions
US11163906B2 (en) Adaptive redaction and data releasability systems using dynamic parameters and user defined rule sets
US10496744B2 (en) Domain-specific lexically-driven pre-parser
US9110659B2 (en) Policy to source code conversion
CN103559118A (en) Security auditing method based on aspect oriented programming (AOP) and annotation information system
CN113076104A (en) Page generation method, device, equipment and storage medium
US20160098563A1 (en) Signatures for software components
CN111414350A (en) Service generation method and device
AU2017276243B2 (en) System And Method For Generating Service Operation Implementation
CN112445499A (en) Derived variable determination method, device, equipment and storage medium
US20220366056A1 (en) Computer security using zero-trust principles and artificial intelligence for source code
US20220083334A1 (en) Generation of equivalent microservices to replace existing object-oriented application
CN112925523A (en) Object comparison method, device, equipment and computer readable medium
CN114416530A (en) Byte code modification method and device, computer equipment and storage medium
US10769376B2 (en) Domain-specific lexical analysis
US11074407B2 (en) Cognitive analysis and dictionary management
US11062221B1 (en) Extensible data structures for rule based systems
CN116821220A (en) Interface request sample generation method, device, storage medium and equipment
CN114661714A (en) Data query method and device and electronic equipment
CN117408238A (en) Activity task management method, system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination