CN118277470A - Data conversion method and device, electronic equipment and storage medium - Google Patents

Data conversion method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118277470A
CN118277470A CN202410478253.3A CN202410478253A CN118277470A CN 118277470 A CN118277470 A CN 118277470A CN 202410478253 A CN202410478253 A CN 202410478253A CN 118277470 A CN118277470 A CN 118277470A
Authority
CN
China
Prior art keywords
data
historical
structured
original
element value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410478253.3A
Other languages
Chinese (zh)
Inventor
汤云凡
姜磊
周悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202410478253.3A priority Critical patent/CN118277470A/en
Publication of CN118277470A publication Critical patent/CN118277470A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification discloses a data conversion method, a device, an electronic device and a storage medium, which can automatically analyze historical data of different data formats in advance to generate an analysis template with strong adaptability, so that a server can automatically convert a large amount of acquired semi-structured original data into structured data supported by a system according to the analysis template, so as to manage each original data, update a risk list to be maintained in real time, and further reduce the maintenance cost of the risk list.

Description

Data conversion method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data conversion method, a data conversion device, an electronic device, and a storage medium.
Background
With the development of computer technologies such as the internet, big data, artificial intelligence and the like, protection of personal privacy data of users by various financial institutions is also increasingly emphasized.
Typically, each financial institution performs risk control on the user-initiated traffic by maintaining a risk list, i.e., wind control on traffic performed by users in the risk list. The risk list is obtained by sorting the wind control data (such as wind control data published by other financial institutions) obtained from various channels, but the obtained wind control data is usually semi-structured data, and the semi-structured data has the problems of irregular naming, diversified formats and the like, so that the obtained wind control data is difficult to sort, and the maintenance cost of maintaining the risk list by various financial institutions is high.
Therefore, how to reduce the maintenance cost of the risk list is a problem to be solved.
Disclosure of Invention
The specification provides a data conversion method, a data conversion device, electronic equipment and a storage medium, so as to solve the problem of high maintenance cost of a risk list in the prior art.
The technical scheme adopted in the specification is as follows:
The specification provides a data conversion method, which includes:
Acquiring original data and data source information corresponding to the original data, wherein the original data is semi-structured data;
Determining a target analysis template from the predetermined analysis templates according to the data source information, wherein the target analysis template comprises a corresponding relation between each element value contained in the original data and each attribute contained in the historical structured data;
and analyzing the original data according to the target analysis template to convert the original data into structured data, and executing tasks according to the structured data.
Optionally, determining the parsing template specifically includes:
traversing elements of each hierarchy included in the history data to determine element values included in the history data;
Determining an attribute corresponding to each element value from all the attributes contained in the structured data corresponding to the historical data according to each element value;
and generating an analysis template corresponding to the historical data according to the path of each element value in the historical data and the attribute corresponding to each element value, wherein the path of each element value in the historical data is used for representing all ancestor element values corresponding to the element value.
Optionally, for each element value, determining an attribute corresponding to the element value from the attributes contained in the specified structured data, specifically including:
And inputting path information of a path of each element value in the historical data into a preset classification model for determining a path characteristic representation corresponding to the element value through the classification model, and outputting a corresponding attribute of the element value in each attribute contained in the historical structured data according to the path characteristic representation.
Optionally, generating the parsing template corresponding to the historical data according to the path of each element value in the historical data and the attribute corresponding to each element value specifically includes:
Aggregating the element values with the same father path in the history data to obtain an aggregated element value;
and generating an analysis template corresponding to the historical data according to the path of the aggregated element values in the historical data and the attribute corresponding to the aggregated element values.
Optionally, according to the target parsing template, parsing the original data to convert the original data into structured data, including:
Analyzing the original data according to the target analysis template to convert the original data into initial structured data;
Judging whether the initial structured data accords with a predetermined data cleaning rule;
if yes, carrying out data cleaning on the initial structured data according to a specified mode to obtain structured data after the original data are converted, wherein the specified mode comprises the following steps: at least one of merging, converting, splitting, deleting.
Optionally, determining the data cleansing rule specifically includes:
Acquiring historical data and historical data source information corresponding to the historical data;
determining an analysis template corresponding to the historical data according to the historical data source information, taking the analysis template as a historical analysis template, analyzing the historical data according to the historical analysis template, converting the historical data into historical structured data, and displaying the historical structured wind control to a designated user;
And determining a data cleaning rule according to the adjustment operation executed by the appointed user on the historical structured data.
Optionally, according to the target parsing template, parsing the original data to convert the original data into structured data, including:
Analyzing the original data according to the target analysis template to convert the original data into initial structured data;
Determining a cleaning sample with the association degree higher than a preset threshold value between the cleaning sample and the original data from prestored cleaning samples for carrying out data cleaning on the historical data, and taking the cleaning sample as a target cleaning sample;
Inputting the target cleaning sample and the initial structured data into a preset cleaning model, so as to carry out data cleaning on the initial structured data according to the target cleaning sample through the cleaning model, and obtaining structured data after the original data is converted.
Optionally, the raw data includes: original wind control data, wherein the original wind control data comprises user information of a risk user;
the method further comprises the steps of:
And carrying out service wind control according to the structured data.
The present specification provides a data conversion apparatus including:
the acquisition module is used for acquiring original data and data source information corresponding to the original data, wherein the original data is semi-structured data;
The analysis module is used for determining a target analysis template from all the predetermined analysis templates according to the data source information, wherein the target analysis template comprises a corresponding relation between each element value contained in the original data and each attribute contained in the historical structured data;
And the execution module is used for analyzing the original data according to the target analysis template so as to convert the original data into structured data and executing tasks according to the structured data.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above data conversion method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the data conversion method described above when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
According to the data conversion method provided by the specification, the original data and the data source information corresponding to the original data are obtained, wherein the original data are semi-structured data, a target analysis template is determined from all predetermined analysis templates according to the data source information, the target analysis template comprises a corresponding relation between each element value contained in the original data and each attribute contained in the historical structured data, the original data are analyzed according to the target analysis template, so that the original data are converted into the structured data, and task execution is carried out according to the structured data.
According to the method, the historical data of different data formats can be automatically analyzed in advance to generate the analysis template with strong adaptability, so that the server can automatically convert a large amount of acquired semi-structured original data into structured data supported by the system according to the analysis template, so that each original data is managed, a risk list to be maintained is updated in real time, and further the maintenance cost of the risk list can be reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at
In the figure:
fig. 1 is a schematic flow chart of a data conversion method provided in the present specification;
FIG. 2 is a schematic diagram of a data conversion process provided in the present specification;
FIG. 3 is a schematic diagram of a data conversion device provided in the present specification;
Fig. 4 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a service wind control method provided in the present specification, including the following steps:
S100: and acquiring original data and data source information corresponding to the original data, wherein the original data is semi-structured data.
In the process of managing and maintaining a database system, data is usually obtained from a plurality of different systems or data sources, and the data may exist in a plurality of semi-structured data forms (such as JavaScript object notation (JavaScript Object Notation, JSON) format, extensible markup language (eXtensible Markup Language, XML) format, etc.), so that an administrator of the database system needs to sort the semi-structured data to convert the semi-structured data into a form conforming to the structured data maintained by the database system for storage, so that the maintenance cost of the database system is high.
Based on the above, in this description, the service platform may respond to an operation instruction of a manager in a management and maintenance process, obtain original data and data source information corresponding to the original data from various different types of data sources, and further may perform automatic conversion on the original data, so as to obtain and store structured data consistent with historical structured data maintained by the service platform.
For example: in a business wind control scenario, various data sources (such as various financial institutions and related institutions) usually regularly publish user information of risk users (in addition, the published information may also be other information, for example, region information of a certain risk region is published, that is, business wind control needs to be performed for a business request of the risk region according to the region information), and a wind control business platform can acquire the user information and perform business wind control according to the user information of the risk users.
For another example: in the social media platform, the social media platform can acquire text data corresponding to articles published by other social media platforms and can display the text data to the user in response to a request of the user, but because the formats adopted by text data corresponding to articles published by different social media platforms are often different, the social media platform can convert and store the original data after acquiring the text data corresponding to the articles as the original data, so that the text data can be displayed to the user after receiving the request of the user.
It should be noted that the semi-structured data may be in a form that has a more distinct organization than unstructured data, but does not follow a strict relational database schema or a fixed format as structured data. The semi-structured data contains identifiable tags or some inherent logical structure that allows information to be parsed and understood in a loose, flexible manner.
The above structured data may refer to data stored in a fixed format and structure, which is characterized by orderly data organization and well-defined format, wherein the structured data may be generally stored in a table form consisting of rows and columns, each column (field) has a specific data type, such as an integer, a floating point number, a character string, a date, etc., and the value of each row of data in each column must conform to the data type defined by the column.
In order to facilitate understanding, the above-mentioned semi-structured data and structured data will be described below by taking user information of a risk user used in a business wind control scenario as an example.
For example: in a service wind control scenario, the format of the structured data maintained by the service platform may be: a plurality of fields such as identification code, name, gender, date of birth, etc. and for each piece of data to be saved, the data to be saved needs to be converted into the above format to be saved, and the form of user information of risk users published by various data sources may be various, for example: "Risk user" { "name": "John Doe", "work" { "type": "IT", "Unit Address": "New York" }, "native place": "xx City, xx province", "identification code": "12345" }.
From the foregoing, it can be seen that the semi-structured data obtained by the service platform is not stored in a predefined table column, but each field has a distinct key (e.g., id, name) and has a hierarchical structure inside, so that the semi-structured data can be read by a machine.
Therefore, after the service platform obtains the original data and the data source information corresponding to the original data, the original data can be analyzed based on the keys contained in the original data, so as to obtain and store the structured data consistent with the historical structured data maintained by the service platform.
In the present specification, the execution body for implementing the data conversion method may refer to a designated device such as a server provided on a service platform, or may refer to a terminal device such as a desktop computer or a notebook computer, and for convenience of description, the service wind control method provided in the present specification will be described below by taking the server as an example of the execution body.
S102: and determining a target analysis template from the predetermined analysis templates according to the data source information, wherein the target analysis template comprises a corresponding relation between each element value contained in the original data and each attribute contained in the historical structured data.
In this specification, after the server obtains the original data, it may determine, from the preset analysis templates, an analysis template that matches with the data source information of the original data (where the data source information includes a data source identifier, an original data batch, etc.), as a target analysis template. The target parsing template includes a correspondence between each element value included in the original data and each attribute included in the historical structured data, where the attribute is a field or column of the structured data maintained by the service platform, in other words, the target parsing template includes a correspondence between each element value included in the original data and each field or column of the historical structured data.
In the foregoing, each element value included in the original data is each field value of the semi-structured data, for example: name: mr. is a field, here "name" is a key, i.e. field name, here "Mr. is a field value, i.e. element value.
It should be noted that, the foregoing parsing templates may traverse the elements of each level included in the history data in advance for the server, so as to determine, for each element value, an attribute corresponding to the element value from the attributes included in the structured data corresponding to the history data, and generate, for each element value, a parsing template corresponding to the history data according to a path of each element value in the history data and an attribute corresponding to each element value, where, for each element value, the path of the element value in the history data is used to represent all ancestor element values corresponding to the element value.
The history data may be raw data obtained by a server history.
As can be seen from the above, each parsing template includes a correspondence between a path corresponding to an element value and an attribute corresponding to the element value, so in a subsequent practical application, the server may determine, according to the correspondence between the path corresponding to the element value and the attribute corresponding to the element value included in the target parsing template, a correspondence between each element value included in the original data and each attribute included in the historical structured data.
It should be noted that, the method for the server to traverse the elements of each hierarchy included in the history data may be various, for example: depth-first traversal, breadth-first traversal, etc., the server may preferentially traverse elements of each level included in the history data using the depth-first traversal.
The above hierarchy may refer to a hierarchy that a tag employed in the semi-structured data has, for example: "Risk user" { "name": "John Doe", "work" { "type": "IT", "Unit address": "New York" } "native": "xx city", "identification code": "12345" }, the Risk user is a large hierarchy containing fields such as name, work, penetration, identification code, etc., wherein the work field is a sub-hierarchy below this hierarchy of Risk user, the sub-hierarchy contains: type, unit address, etc., and in terms of the "type" field, the path of the field is the risk user-work-type, i.e., the path of the field contains all ancestor fields of the field.
The method for determining the attribute corresponding to the element value from the attributes contained in the structured data corresponding to the historical data may be that the attribute with the consistent field name corresponding to the element value is selected from the attributes contained in the structured data corresponding to the historical data and used as the attribute corresponding to the element value.
In an actual application scenario, because naming manners of different data sources for the same field are often different, a part of element values of original data exist, and an attribute consistent with a field name corresponding to the element value cannot be screened from all attributes contained in structured data corresponding to historical data, for example: the risk user name may be named in different data sources: target user name, etc.
Based on this, the server may further input, for each element value, path information of a path of the element value in the history data into a preset classification model, so as to determine a path feature representation corresponding to the element value through the classification model, and output, according to the path feature representation, an attribute corresponding to each attribute of the element value included in the history structured data.
It should be noted that the above attributes may include: the first-level attribute is an attribute of an entity corresponding to the historical structured data (the entity can be an object or concept of a series of related attributes, for example, a risk user is an entity), the second-level attribute is an attribute of an entity contained in the entity corresponding to the historical structured data, and the like, and three-level attributes, four-level attributes and the like can also exist.
For example: the entity of the risk user may include attributes such as name, address, gender, etc., and for the attribute of the address, the attribute may further include sub-attributes such as country, province, city, county, etc., and these sub-attributes are secondary attributes, and the name, address, gender are primary attributes.
In an actual application scenario, there may be some element values that are element values of the same field, but field names corresponding to the element values are not the same, for example: the names of the partial users may be the names: alias name: mr. tense, name: the following names: zhang Xian, and the like, the above "Mr. and" Zhang Xian "are names of users, but the corresponding field names" aliases "and" great names "are different.
Based on the above, the server may further aggregate the element values with the same parent path in the history data to obtain an aggregated element value, and generate an analysis template corresponding to the history data according to the path of the aggregated element value in the history data and the attribute corresponding to the aggregated element value.
The parent path is the path of the parent element value corresponding to the element value, and the name is: and (5) surname: tension, name: name: mr, "Zhang" and "Mr" make up a complete name, so that aggregation is required, and the parent paths of both are "names".
S104: and analyzing the original data according to the target analysis template to convert the original data into structured data, and executing tasks according to the structured data.
In this specification, the server may parse the original data according to the target parsing template to convert the original data into structured data, and perform task execution according to the structured data.
The task execution may be determined according to an actual task scenario, for example: in a business wind control scene, a server can conduct task wind control according to the structured data. For another example: in a search recommendation scenario, a server may present structured data to a user, or the like, in response to a request by the user.
In addition, during the process of converting the original data by the server, some homogeneous content cannot be identified, so that the homogeneous content occupies storage space.
For example: the date of birth field of the risky user may be contained in the raw data obtained from different data sources, but the different formats adopted by the fields result in the data having different formats but substantially the same date of birth being identified as different for storage, for example: 12 th 1990 and 1990.12.12 are substantially identical but in different formats.
Based on this, the server may parse the original data according to the target parsing template, so as to convert the original data into initial structured data, and further may determine whether the initial structured data meets a predetermined data cleaning rule, if yes, perform data cleaning on the initial structured data according to a specified manner, to obtain structured data after converting the original data, where the specified manner includes: at least one of merging, converting, splitting, deleting.
It should be noted that, the method for determining the data cleaning rule may be that historical data and historical data source information corresponding to the historical data are obtained, an analysis template corresponding to the historical data is determined according to the historical data source information, the analysis template is used as a historical analysis template, the historical data is analyzed according to the historical analysis template, so that the historical data is converted into historical structured data, the historical structured wind control is displayed to a designated user, and the data cleaning rule is determined according to an adjustment operation executed by the designated user on the historical structured data.
In a practical application scenario, there may also be complex data cleansing tasks that need to be identified based on semantics for data cleansing, for example: the original data obtained from different data sources may each contain address fields of the risk users, but the address fields are omitted differently, so that the data identified as different data is saved, for example: the two seven-area two-seven squares in Zhengzhou city in Henan province are substantially the same as the two seven-square in Zhengzhou city in Henan province, but the omitted cases are different.
At this time, the server may parse the original data according to the target parsing template to convert the original data into initial structured data, determine, from among the pre-stored cleaning samples for cleaning the data of each historical data, a cleaning sample with a degree of association with the original data higher than a preset threshold value, as a target cleaning sample, input the target cleaning sample and the initial structured data into a preset cleaning model, so as to clean the data of the initial structured data according to the target cleaning sample through the cleaning model, and obtain the structured data after converting the original data.
The degree of association between each cleaning sample and the original data may be determined according to a similarity between a data source corresponding to each cleaning sample and a data source corresponding to the original data, a similarity between an entity corresponding to each cleaning sample and an entity corresponding to the original data, and so on.
In addition, the server may further input the initial structured data into a preset association degree determination model, so as to determine, according to the association degree determination model, an association degree between each cleaning sample and the original data, for each cleaning sample of the data cleaning for the historical data, which is stored in advance.
The cleaning examples described above include: history data, data obtained by cleaning the history data, text information corresponding to the step of cleaning the history data, and the like. The text information corresponding to the step of cleaning the historical data is generated by inputting the recorded text data corresponding to the step of cleaning the historical data into a preset large language model (Large Language Model, LLM) according to the input recorded text data through the large language model.
It should be noted that the two data cleaning methods may be used alone or together, that is, the server may adopt any one of the data cleaning methods to clean the initial structured data to obtain the structured data after converting the original data. Of course, the server may also clean the initial structured data by the first data cleaning method to obtain structured data after data cleaning by the first data cleaning method, and further may clean the structured data after data cleaning by the first data cleaning method according to the second data cleaning method to obtain structured data after conversion of the original data.
Further, for ease of understanding, the following will describe in detail the process of performing data conversion on the raw data by the above-described data conversion method, as shown in fig. 2.
Fig. 2 is a schematic diagram of a data conversion process provided in the present specification.
As can be seen in connection with fig. 2, the server may convert raw data into structured data through a data conversion system, wherein the data conversion system comprises: the system comprises a template generation module, a data analysis module, a rule cleaning module, a complex cleaning module and a spot check module.
Specifically, after the server obtains the original data, it may determine whether an analysis template matched with the data source information of the original data exists in each predetermined analysis template, if yes, the analysis template matched with the data source information of the original data may be used as a target analysis template, the original data may be converted into initial structured data by the data analysis module according to the target analysis template, and then the initial structured data may be input to the rule cleaning module, so that the rule cleaning module may clean the data of the initial structured data according to a preset data cleaning rule, and the structured data after the data cleaning is performed by the rule cleaning module is obtained.
Further, the server may continuously input the structured data subjected to data cleaning by the rule cleaning module and the target cleaning sample corresponding to the original data into the complex cleaning module, so that the complex cleaning module performs data cleaning on the structured data subjected to data cleaning by the rule cleaning module according to the target cleaning sample through a preset cleaning model, and the structured data subjected to conversion of the original data is obtained.
If the analysis templates matched with the data source information of the original data do not exist in the predetermined analysis templates, the server can input the original data into the template generation module so as to traverse the original data through the template generation module and generate the analysis template corresponding to the original data as a target analysis template corresponding to the original data.
Further, the server can sample the original data subjected to historical cleaning through the sampling detection module to obtain sample original data, the sample original data and the converted structured data corresponding to the sample original data are displayed to appointed personnel, so that the appointed personnel can optimize the sample original data and the converted structured data corresponding to the sample original data, and according to recorded texts of optimized operations of the appointed personnel on the sample original data and the converted structured data corresponding to the sample original data, data cleaning rules and cleaning samples are generated.
From the above, it can be seen that the server may automatically parse historical data of different data formats in advance to generate a parsing template with strong adaptability, so that the server may automatically convert a large amount of obtained semi-structured original data into structured data supported by the system according to the parsing template, so as to manage each original data, update a risk list to be maintained in real time, and further reduce maintenance cost of the risk list.
The data conversion method provided for one or more embodiments of the present disclosure further provides a corresponding data conversion device based on the same concept, as shown in fig. 3.
Fig. 3 is a schematic diagram of a service wind control device provided in the present specification, including:
The acquiring module 301 is configured to acquire original data and data source information corresponding to the original data, where the original data is semi-structured data;
The parsing module 302 is configured to determine, according to the data source information, a target parsing template from predetermined parsing templates, where the target parsing template includes a correspondence between each element value included in the original data and each attribute included in the historical structured data;
And the execution module 303 is configured to parse the original data according to the target parsing template, so as to convert the original data into structured data, and perform task execution according to the structured data.
Optionally, the apparatus further comprises: a determination module 304;
The determining module 304 is specifically configured to traverse elements of each level included in the history data to determine values of elements included in the history data; determining an attribute corresponding to each element value from all the attributes contained in the structured data corresponding to the historical data according to each element value; and generating an analysis template corresponding to the historical data according to the path of each element value in the historical data and the attribute corresponding to each element value, wherein the path of each element value in the historical data is used for representing all ancestor element values corresponding to the element value.
Optionally, the determining module 304 is specifically configured to, for each element value, input path information of a path of the element value in the history data into a preset classification model, determine, by using the classification model, a path feature representation corresponding to the element value, and output, according to the path feature representation, an attribute corresponding to each attribute of the element value included in the history structured data.
Optionally, the determining module 304 is specifically configured to aggregate the element values that are the same as the parent path in the history data to obtain an aggregated element value; and generating an analysis template corresponding to the historical data according to the path of the aggregated element values in the historical data and the attribute corresponding to the aggregated element values.
Optionally, the parsing module 302 is specifically configured to parse the original data according to the target parsing template, so as to convert the original data into initial structured data; judging whether the initial structured data accords with a predetermined data cleaning rule; if yes, carrying out data cleaning on the initial structured data according to a specified mode to obtain structured data after the original data are converted, wherein the specified mode comprises the following steps: at least one of merging, converting and deleting.
Optionally, the determining module 304 is specifically configured to obtain historical data and historical data source information corresponding to the historical data; determining an analysis template corresponding to the historical data according to the historical data source information, taking the analysis template as a historical analysis template, analyzing the historical data according to the historical analysis template, converting the historical data into historical structured data, and displaying the historical structured wind control to a designated user; and determining a data cleaning rule according to the adjustment operation executed by the appointed user on the historical structured data.
Optionally, the parsing module 302 is specifically configured to parse the original data according to the target parsing template, so as to convert the original data into initial structured data; determining a cleaning sample with the association degree higher than a preset threshold value between the cleaning sample and the original data from prestored cleaning samples for carrying out data cleaning on the historical data, and taking the cleaning sample as a target cleaning sample; inputting the target cleaning sample and the initial structured data into a preset cleaning model, so as to carry out data cleaning on the initial structured data according to the target cleaning sample through the cleaning model, and obtaining structured data after the original data is converted.
Optionally, the raw data includes: original wind control data, wherein the original wind control data comprises user information of a risk user;
the apparatus further comprises: a business wind control module 305;
The service wind control module 305 is specifically configured to perform service wind control according to the structured data.
The present specification also provides a computer readable storage medium storing a computer program operable to perform a data conversion method as provided in fig. 1 above.
The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 4. At the hardware level, as in fig. 4, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, although it may include hardware required for other services. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the data conversion method of fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely an example of the present specification and is not intended to limit the present specification. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (11)

1. A data conversion method, comprising:
Acquiring original data and data source information corresponding to the original data, wherein the original data is semi-structured data;
Determining a target analysis template from the predetermined analysis templates according to the data source information, wherein the target analysis template comprises a corresponding relation between each element value contained in the original data and each attribute contained in the historical structured data;
and analyzing the original data according to the target analysis template to convert the original data into structured data, and executing tasks according to the structured data.
2. The method of claim 1, determining a parsing template, comprising:
traversing elements of each hierarchy included in the history data to determine element values included in the history data;
Determining an attribute corresponding to each element value from all the attributes contained in the structured data corresponding to the historical data according to each element value;
and generating an analysis template corresponding to the historical data according to the path of each element value in the historical data and the attribute corresponding to each element value, wherein the path of each element value in the historical data is used for representing all ancestor element values corresponding to the element value.
3. The method according to claim 2, for each element value, determining, from the attributes contained in the specified structured data, an attribute corresponding to the element value, comprising in particular:
And inputting path information of a path of each element value in the historical data into a preset classification model for determining a path characteristic representation corresponding to the element value through the classification model, and outputting a corresponding attribute of the element value in each attribute contained in the historical structured data according to the path characteristic representation.
4. The method of claim 2, generating a parsing template corresponding to the historical data according to the path of each element value in the historical data and the attribute corresponding to each element value, specifically comprising:
Aggregating the element values with the same father path in the history data to obtain an aggregated element value;
and generating an analysis template corresponding to the historical data according to the path of the aggregated element values in the historical data and the attribute corresponding to the aggregated element values.
5. The method of claim 1, parsing the original data according to the target parsing template to convert the original data into structured data, comprising:
Analyzing the original data according to the target analysis template to convert the original data into initial structured data;
Judging whether the initial structured data accords with a predetermined data cleaning rule;
if yes, carrying out data cleaning on the initial structured data according to a specified mode to obtain structured data after the original data are converted, wherein the specified mode comprises the following steps: at least one of merging, converting, splitting, deleting.
6. The method of claim 5, determining a data cleansing rule, comprising:
Acquiring historical data and historical data source information corresponding to the historical data;
determining an analysis template corresponding to the historical data according to the historical data source information, taking the analysis template as a historical analysis template, analyzing the historical data according to the historical analysis template, converting the historical data into historical structured data, and displaying the historical structured wind control to a designated user;
And determining a data cleaning rule according to the adjustment operation executed by the appointed user on the historical structured data.
7. The method of claim 1, parsing the original data according to the target parsing template to convert the original data into structured data, comprising:
Analyzing the original data according to the target analysis template to convert the original data into initial structured data;
Determining a cleaning sample with the association degree higher than a preset threshold value between the cleaning sample and the original data from prestored cleaning samples for carrying out data cleaning on the historical data, and taking the cleaning sample as a target cleaning sample;
Inputting the target cleaning sample and the initial structured data into a preset cleaning model, so as to carry out data cleaning on the initial structured data according to the target cleaning sample through the cleaning model, and obtaining structured data after the original data is converted.
8. The method of claim 1, the raw data comprising: original wind control data, wherein the original wind control data comprises user information of a risk user;
the method further comprises the steps of:
And carrying out service wind control according to the structured data.
9. A data conversion apparatus comprising:
the acquisition module is used for acquiring original data and data source information corresponding to the original data, wherein the original data is semi-structured data;
The analysis module is used for determining a target analysis template from all the predetermined analysis templates according to the data source information, wherein the target analysis template comprises a corresponding relation between each element value contained in the original data and each attribute contained in the historical structured data;
And the execution module is used for analyzing the original data according to the target analysis template so as to convert the original data into structured data and executing tasks according to the structured data.
10. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-8.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-8 when the program is executed.
CN202410478253.3A 2024-04-18 2024-04-18 Data conversion method and device, electronic equipment and storage medium Pending CN118277470A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410478253.3A CN118277470A (en) 2024-04-18 2024-04-18 Data conversion method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410478253.3A CN118277470A (en) 2024-04-18 2024-04-18 Data conversion method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118277470A true CN118277470A (en) 2024-07-02

Family

ID=91648604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410478253.3A Pending CN118277470A (en) 2024-04-18 2024-04-18 Data conversion method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118277470A (en)

Similar Documents

Publication Publication Date Title
CN112036736B (en) Workflow creation method and device
CN107038207B (en) Data query method, data processing method and device
CN107622080B (en) Data processing method and equipment
CN115756449B (en) Page multiplexing method and device, storage medium and electronic equipment
CN114416714B (en) Data management system
CN112347324B (en) Document query method and device, electronic equipment and storage medium
CN108804563B (en) Data labeling method, device and equipment
CN116010419A (en) Method and device for creating unique index and optimizing logic deletion
CN116521705A (en) Data query method and device, storage medium and electronic equipment
US9411836B2 (en) Facilitating consistency between a glossary and a repository
US11580251B1 (en) Query-based database redaction
CN118277470A (en) Data conversion method and device, electronic equipment and storage medium
CN114547477A (en) Data processing method and device, electronic equipment and storage medium
CN113723047A (en) Map construction method, device and medium based on legal document
CN114385794A (en) Method, device, equipment and storage medium for generating enterprise knowledge graph
CN112988986A (en) Man-machine interaction method, device and equipment
CN117033420B (en) Visual display method and device for entity data under same concept of knowledge graph
CN117349401B (en) Metadata storage method, device, medium and equipment for unstructured data
CN117494068B (en) Network public opinion analysis method and device combining deep learning and causal inference
CN117076515B (en) Metadata tracing method and device in medical management system, server and storage medium
CN117252183B (en) Semantic-based multi-source table automatic matching method, device and storage medium
KR102682244B1 (en) Method for learning machine-learning model with structured ESG data using ESG auxiliary tool and service server for generating automatically completed ESG documents with the machine-learning model
CN116628214A (en) Information display method and device, readable storage medium and electronic equipment
CN118349468A (en) Program testing method and device, electronic equipment and storage medium
CN117931672A (en) Query processing method and device applied to code change

Legal Events

Date Code Title Description
PB01 Publication