CN114610808A

CN114610808A - Data storage method, data storage device, electronic equipment and medium

Info

Publication number: CN114610808A
Application number: CN202210262752.XA
Authority: CN
Inventors: 李磊; 曹俊鹏
Original assignee: Boc Financial Technology Co ltd
Current assignee: Boc Financial Technology Co ltd
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2022-06-10

Abstract

The application discloses a data storage method, a data storage device, an electronic device and a medium, which can be applied to the financial field or other fields. Different files with the same file identification in the file names need to be stored in the same data table, namely the corresponding relation between the file identification and the data table name needs to be preset; the corresponding relation between the file and the data table name is not required to be set for different files corresponding to the same file identifier, and the corresponding relation between the file identifier and the data table name is only required to be set once, so that the operation is simplified. After a file to be stored is obtained, a target file identifier can be obtained from the file name of the file, a target data table name corresponding to the target file identifier is searched from the corresponding relation, a loading command statement is obtained based on the character coding type obtained from the file, the field separators used for separating different fields and the target data table name, and structural data in the file is stored into a data table with the target data table name based on the loading command statement.

Description

Data storage method, data storage device, electronic equipment and medium

Technical Field

The present application relates to the field of data analysis and storage technologies, and in particular, to a data storage method and apparatus, an electronic device, and a medium.

Background

File types of files in different data sources or the same data source may be different, and structure data in different files belonging to the same file type in different data sources or the same data source also have a certain difference, for example, a file 1 belonging to an excel type includes: age, school number, class and score 4 fields, file 2 belonging to excel type includes: payroll, year of work, job number, name and marital status 5 fields.

In the process of storing the structure data contained in different data sources or different files in the same data source into the data table of the database of the data mart, the corresponding relationship between the file and the data table needs to be preset, so that the structure data in the text can be stored into the corresponding data table in the data mart, for example, the structure data in the file 1 includes: 15, 10001, class 01, 95, the structure data in file 1 can be stored in a data table with age, school number, class and score 4 fields, and the structure data in file 2 includes: 50W, 7, 256012, zhang san, married, the structure data in file 2 may be stored into a data table that includes 5 fields for payroll, year of work, job number, name, and marital status.

In summary, a data table corresponding to each file needs to be set, and the operation is complicated.

Disclosure of Invention

In view of the above, the present application provides a data storage method, an apparatus, an electronic device and a medium.

In order to achieve the above purpose, the present application provides the following technical solutions:

according to a first aspect of the embodiments of the present disclosure, there is provided a data storage method, including:

acquiring a file to be stored;

acquiring a target file type to which the file belongs;

searching a target configuration table corresponding to a target file type from a preset corresponding relation between the file type and the configuration table, wherein a loading command template and a plurality of file dimensions to be identified corresponding to the target file type are stored in the target configuration table, the loading command template comprises placeholders which are required to be replaced by the file dimensions respectively, and the file dimensions comprise: the character encoding type of the file, field separators used for separating different fields in the file, the file name of the file and the data table name of a data table used for storing the fields contained in the file;

acquiring the file name of the file;

searching a target character string which does not correspond to the preset common meaning in the file name from the preset corresponding relation between the character string and the preset common meaning;

determining the target character string as a target file identifier of the file;

searching a target data table name corresponding to the target file identifier from a preset corresponding relation between the file identifier and the data table name;

acquiring a character coding type corresponding to the file;

analyzing the file through the character coding type to obtain field separators for separating different fields in the file;

replacing the corresponding placeholder in the loading command template with the character encoding type, the field separator, the file name and the target data table name to obtain a loading command statement;

and storing the fields in the file to the data table with the target data table name through the loading command statement.

According to a second aspect of embodiments of the present disclosure, there is provided a data storage device including:

the first acquisition module is used for acquiring a file to be stored;

the second acquisition module is used for acquiring the target file type of the file;

the first searching module is configured to search a target configuration table corresponding to a target file type from a preset corresponding relationship between the file type and the configuration table, where the target configuration table stores a loading command template and a plurality of file dimensions to be identified corresponding to the target file type, the loading command template includes placeholders that the plurality of file dimensions respectively need to be replaced, and the plurality of file dimensions include: the character encoding type of the file, field separators used for separating different fields in the file, the file name of the file and the data table name of a data table used for storing the fields contained in the file;

the third acquisition module is used for acquiring the file name of the file;

the second searching module is used for searching a target character string which does not correspond to the preset common meaning in the file name from the preset corresponding relation between the character string and the preset common meaning;

the first determining module is used for determining that the target character string is the target file identifier of the file;

the third searching module is used for searching a target data table name corresponding to the target file identifier from the preset corresponding relation between the file identifier and the data table name;

the fourth acquisition module is used for acquiring the character coding type corresponding to the file;

the analysis module is used for analyzing the file according to the character coding type to obtain field separators used for separating different fields in the file;

a first replacing module, configured to replace the corresponding placeholder in the load command template with the character encoding type, the field separator, the file name, and the target data table name, so as to obtain a load command statement;

and the storage module is used for storing the fields in the file to the data table with the target data table name through the loading command statement.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the data storage method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the data storage method according to the first aspect.

According to the technical scheme, different files with the same file identification in the file names need to be stored in the same data table, namely the files with the same structure data need to be stored in the same data table, namely the corresponding relation between the file identification and the data table name needs to be preset; after a file to be stored is obtained, a target configuration table corresponding to the target file type is searched from a preset corresponding relation between the file type and the configuration table based on the target file type to which the file belongs, and a loading command template and a plurality of file dimensions to be identified corresponding to the target file type are stored in the target configuration table. Acquiring the file name of the file; searching a target character string which does not correspond to the preset common meaning in the file name from a preset corresponding relation between the character string and the preset common meaning, wherein the target character string which does not correspond to the preset common meaning can only uniquely identify a structure of structural data contained in the file; determining the target character string as a target file identifier of the file; searching a target data table name corresponding to the target file identifier from a preset corresponding relation between the file identifier and the data table name; acquiring a character coding type corresponding to the file; analyzing the file through the character coding type to obtain field separators for separating different fields in the file; replacing the corresponding placeholder in the loading command template with the character encoding type, the field separator, the file name and the target data table name to obtain a loading command statement; and storing the fields in the file to the data table with the target data table name through the loading command statement. According to the method and the device, the corresponding relation between the file identification and the data table name needs to be set for each file identification, and a plurality of files containing the same file identification exist in the file names, so that the problem that the corresponding relation between the file and the data table name is set for each file is solved, and the operation is simple and convenient. And the method and the device can automatically generate the loading command statement without manual operation, thereby improving the speed of storing the structural data contained in the file into the data table.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a diagram illustrating a hardware architecture according to an embodiment of the present application;

FIG. 2 is a flowchart of a data storage method according to an embodiment of the present disclosure;

FIG. 3 is a diagram illustrating structure data contained in a file of a plain text type according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a data storage device according to an embodiment of the present application;

FIG. 5 is a block diagram illustrating an apparatus for an electronic device in accordance with an example embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a data storage method and device, electronic equipment and a medium. Before introducing the technical solutions provided in the embodiments of the present application, a hardware architecture related to the present application is described.

Fig. 1 is a schematic diagram of a hardware architecture according to an embodiment of the present application.

The hardware architecture includes: an electronic device 11 and one or more servers 12.

The electronic device 11 may be any electronic product capable of interacting with a user through one or more ways, such as a keyboard, a touch PAD, a touch screen, a remote controller, a voice interaction device, or a handwriting device, for example, a mobile phone, a notebook computer, a tablet computer, a palm computer, a personal computer, a wearable device, a smart television, a PAD, and the like.

It should be noted that fig. 1 is only an example, the types of the electronic devices may be various and are not limited to the computers in fig. 1, and the number of the servers 12 may be one or more and is not limited to two shown in fig. 1.

The server 12 may be a server, a server cluster composed of a plurality of servers, or a cloud computing server center. The server 12 may include a processor, memory, and a network interface, among others.

Illustratively, different servers 12 correspond to different data sources.

For example, the electronic device 11 may obtain different files belonging to the same file type or files belonging to different file types from the same server; for example, the electronic device 11 may obtain files belonging to the same file type or different file types from different servers, respectively.

File types include, but are not limited to: text/plane: a plain text type; text/HTML (Hypertext Markup Language ): an HTML text type; application/PDF (Portable Document Format): a PDF document type; application/msword: word document type; image/PNG (Portable Network Graphic Format): a PNG picture type; volume/JPEG (Joint Photographic Experts Group, Joint Picture Experts Group): a JPEG picture type; application/x-TAR: TAR file type; application/x-gzip: a GZIP file type; a JSON (JavaScript Object Notation) text type.

Illustratively, the server may transmit the File to the electronic device 11 through a network by using an FTP (File Transfer Protocol) or an SFTP (SSH File Transfer Protocol) Protocol.

The electronic device 11 may execute the data storage method provided by the present application.

Those skilled in the art will appreciate that the above described electronic devices and servers are merely examples, and that other existing or future electronic devices or servers, as may be suitable for use with the present disclosure, are also included within the scope of the present disclosure and are hereby incorporated by reference.

The data storage method according to the present application is described below with reference to the hardware architecture.

As shown in fig. 2, which is a flowchart of a data storage method provided in this embodiment of the present application, the method may be applied to the electronic device 11 described above, and in the implementation process, the method includes the following steps S201 to S211.

Step S201: and acquiring the file to be stored.

Illustratively, the file may be obtained from a server.

Illustratively, files obtained from the server may be stored to a specified directory in the data mart. The directory can be scanned at regular time, and whether the file to be stored exists or not can be judged through a file list under the directory.

Illustratively, the file containing data is structural data, and the structural data is described below by way of example.

For example, a file may include multiple records, with different records separated by record separators, each record including one or more fields. For example, the record separators may be different for files of different file types. The following description will take the file type as a general text type as an example.

Fig. 3 is a schematic diagram of structural data included in a file belonging to a plain text type according to an embodiment of the present application.

As can be seen from fig. 3, the file includes M records, and the record separators between different records are carriage returns; i.e. one line in the text shown in fig. 3 is a record.

In the embodiment of the present application, records contained in a file need to be stored in a data table in a database. For example, table 1 is a table that already stores records contained in a file, each record including 6 fields.

TABLE 1

Step S202: and acquiring the type of the target file to which the file belongs.

For example, the file type of the file may be obtained according to the file name of the file in combination with the file command.

For example, the character encoding type of the file may be obtained according to the file name of the file in combination with the file command.

For example, for a file with a file name of "alms _ pub _ prc.sql", text/play may be obtained by a file command of "file-films _ pub _ prc.sql"; the charset is utf-8', wherein text/play refers to the file type of a file being a common text type; wherein, utf-8 refers to the character encoding type of the file.

Step S203: and searching a target configuration table corresponding to the target file type from the preset corresponding relation between the file type and the configuration table.

The target configuration table stores a loading command template and a plurality of file dimensions to be identified corresponding to the target file type, the loading command template includes placeholders that the plurality of file dimensions respectively need to be replaced, and the plurality of file dimensions include: the character encoding type of the file, field separators in the file for separating different fields, the file name of the file, and the data table name of a data table for storing the fields contained in the file.

The configuration tables corresponding to different file types are different. The configuration table will be described below by way of example.

In different application scenarios, file dimensions corresponding to placeholders included in the loading command template may be different, and the loading command template in the following example is only an example, and the number of placeholders included in the loading command template and the file dimensions corresponding to the placeholders are not limited.

Illustratively, the load command template may be as follows: LOAD DATA in FILE [ host ] in FILE [ FILE ] in table [ table ebname ] CHARACTER SET [ charset ] in fields terminated by [ term ] in DATA FORMAT [ FILE _ FORMAT ] FILE _ FORMAT [ FILE _ FORMAT ] max _ bad _ records [ badLine ].

In order to enable a person skilled in the art to quickly locate the position of the placeholder from the loading command template, the placeholder is identified by "[ sic]", and in practical applications, the placeholder in the loading command template may be identified by "[ sic]", may be identified by other symbols, or may not be identified by any symbol.

Illustratively, the placeholders may be represented by preset characters, illustratively, the number of preset characters is equal to the number of placeholders, which is represented by the character "$" above.

In an alternative implementation, in order to clarify the correspondence between the placeholder and the file dimension, the placeholder may include a string of variables that characterize the file dimension, for example, the string of variables of the data table name is "tablebname".

The file dimensions corresponding to the placeholders included in the loading command template are described below.

Illustratively, the file dimension for $ host includes: the IP (Internet protocol address) address and storage path of the device where the file is located.

For example, if the electronic device has obtained a file from the server, the device where the file is located may be the electronic device; for example, if the electronic device has not obtained the file from the server, but only obtained the file name of the file, the device where the file is located is the server.

For example, the IP address of the device where the file is located is 21.86.77.31, the storage path of the file stored in the device is share/data/data/20200610/, and the file dimension corresponding to $ host is 21.86.77.31// share/data/data/20200610/.

Illustratively, if the device where the file is located is a server, the file dimension corresponding to the $ host further includes: the transmission protocol between the server and the electronic equipment is that at the moment, the dimension of a file corresponding to the $ host is sftp:// almsftp: alms 100! ftp @21.86.77.31// share/data/data/20200610/.

Illustratively, the file dimension for $ file includes: the file name.

Assuming a file name of "0134690 d.i32. gz.202006101", the file dimension for $ filename is 0134690d.i32. gz.202006101.

Illustratively, the file dimension corresponding to $ tablebanme includes: a data table name for a data table storing fields contained by the file.

Still taking fig. 3 and table 1 as an example, table 1 is a data table storing fields included in the file shown in fig. 3.

Illustratively, the file dimensions for $ charset include: character encoding type of the file.

Exemplary character encoding types include, but are not limited to: unicode, ASCII (American Standard Code for Information exchange Code), GBK (Chinese Internal Code Specification), GB2312, UTF-8, DOS (Disk Operating System).

In the process of parsing the file, the character encoding type of the file needs to be used for parsing so as to be able to parse the content in the file, for example, for the file shown in fig. 3 belonging to the plain text type, decoding needs to be performed by using the UTF-8 character encoding type so as to obtain the structure data contained in the text. So that the fields in the file can be stored in table 1.

Illustratively, the file dimensions for $ terminateChar include: a field separator.

For example, different fields in the structure data included in the text are separated by field separators, and in the process of storing the fields in the file into the data table, a record needs to be split by using the character separators to obtain the fields, so that the fields are sequentially stored into the data table according to the sequence of the fields in the file.

Still taking fig. 3 and table 1 as an example, if the field separator is "|", and the first record included in the file is "Tom 1|25| boy | English |50w | 7", then 6 fields can be obtained from the first record, and the following are in order: tom1, 25, boy, English, 50w, 7; the 7 fields are sequentially stored in the data table in order, so that the data table shown in table 1 is obtained.

Illustratively, the method of obtaining the field separator includes the following steps a11 through a 12.

Step A11: and acquiring each record contained in the file.

Step A12: and determining the character with the highest frequency of occurrence in each record as the field separator.

Still taking the M records shown in fig. 3 as an example, the character with the highest frequency of appearance in the M records is "|", and therefore "|" is a field separator.

Illustratively, the field separators may be different in different files, and the specific case may be determined according to the actual situation.

Illustratively, the file dimensions for $ dataFormat include: the date format type of the date contained in the file name.

For example, the date format type may be: y% m% d, wherein "%" may be null or "-", or "/".

Still taking the file name "0134690 d.i32. gz.202006101" as an example, the date format type of the date contained in the file name is Ymd.

Illustratively, the file dimension corresponding to $ filetype includes: the file type.

The file may be parsed based on the file type and the character encoding type to obtain the structure data contained by the file.

Illustratively, the file dimensions for $ badLine include: the error log threshold.

It is understood that a file may include a plurality of records, each record should include the same number of fields, so that the records may be stored in the same data table, and if the number of fields included in each of the plurality of records included in the file is different, for example, in the file shown in fig. 3, it is assumed that the 4 th record includes fields with the number of fields being 9, and the other records include fields with the number of fields being 6, and for any record except the 4 th record, the 6 fields may be sequentially stored in table 1 according to the sequence of the fields in the record; however, for the 4 th record, since the number of fields included in the 4 th record is 9 and table 1 includes 6 fields, an error occurs in the process of sequentially storing the 9 fields in table 1, and at this time, the 4 th record is called an error record, and the other records are called correct records.

In the process of storing the structure data contained in the file into the data table, the error record can be skipped, and only the field contained in the correct record is stored into the data table.

It is understood that if the number of erroneous recordings is large, the file is considered to have a problem, and therefore, the erroneous recording threshold value is set.

Step S204: and acquiring the file name of the file.

Step S205: and searching a target character string which does not correspond to the preset common meaning in the file name from the preset corresponding relation between the character string and the preset common meaning.

For example, the commonality and distinctiveness of file names in the various data sources may be analyzed to obtain the meaning of the string representations contained in the file names. The following examples are given.

Illustratively, file names may include, but are not limited to: file extensions (e.g., ZIP), date time stamps (e.g., ". 201507010"), zone names (e.g., HUBEI denotes the name of the north-of-lake zone). Then, "correspondence of the preset character string with the preset common meaning" may be as shown in table 2.

TABLE 2

Character string	Predefining common meanings
		.ZIP.	Compressed packet extensions
.Doc.	Word file extension
		.gz.	gzip file extension
…	…
		Y％m％d	Date and time stamp
HUBEI	Geographical name of Hubei
		…	…

After the file name is obtained, the character strings included in the file name can be analyzed based on table 2 to obtain the target character string in the file name, which does not correspond to the preset common meaning. The following description will be given by taking an example of a file name "r 100006d.cap.gz.201507010".

If it is identified from table 2 that ". gz." is gzip file extension, "201507010" is date and time stamp, and "100006d.cap" does not correspond to a predetermined common meaning, "100006d.cap" is determined as a target character string.

Step S206: and determining the target character string as the target file identification of the file.

It will be appreciated that in actual practice, files generated at different times contain the same number of fields in the records for the same object, e.g., the same product or the same application scenario, and that the meaning of the fields in the records in the files generated at different times are correspondingly the same.

For example, a company generates file 1 and file 2 at different times, file 1 contains a record of "Tom 1|25| boy | English |50w | 7", and file 2 contains a record of "hana 1|20| boy | English |25w | 1". The meaning of the 6 fields in the record contained in file 1 is as follows: name, age, gender, nationality, annual salary, working age; the meaning of 6 fields in the record contained in file 2 is as follows: name, age, gender, nationality, annual salary, working age; the meaning of the corresponding fields in the two files is correspondingly the same. The structure data in file 1 and file 2 can be stored in the same data table.

In practical applications, in order to enable a user to quickly recognize whether a plurality of files are files generated at different times for the same object, the target file identifications are the same among the file names of the files generated at different times for the same object. Based on this, the correspondence relationship of the file identification and the data table name may be set in advance. That is, the data table names corresponding to different files with the same file identification are the same. The data table names corresponding to files with different file identifications may be the same or different.

Step S207: and searching a target data table name corresponding to the target file identifier from a preset corresponding relation between the file identifier and the data table name.

It can be understood that, structural data contained in different files belonging to the same file type have great difference, and there are two reasons why structural data contained in different files have great difference in this application, one reason is: the number of fields in the records contained in different files is different; the second reason is that: the meaning of the field representations in the records contained by different files differs.

For example, if the number of fields in the record included in file 3 is 3 and the number of fields in the record included in file 4 is 5, the structure data included in file 3 needs to be stored in a data table having at least 3 fields, and the structure data included in file 4 needs to be stored in a data table having at least 5 fields. I.e. the data tables storing the structure data in file 3 and file 4 differ with a high probability.

For example, for the reason two, the meanings of 3 fields in the record contained in the file 3 are respectively: name, school number and score, the meaning of the 5 fields in the record contained in file 4 are: name, gender, annual salary, working age, nationality; it is clear that the meaning of the field representations in the records contained in file 3 and file 4 differ.

However, the number of fields contained in different files containing the same file identifier in the file name is the same as the meaning of the field representation, so that the structure data in different files containing the same file identifier in the file name can be stored in the same data table. The correspondence relationship between the file identification and the data table name can be set in advance.

Step S208: and acquiring the character coding type corresponding to the file.

Since the file needs to be parsed by the character encoding type to obtain the structure data included in the file, the character encoding type of the file needs to be obtained.

Step S209: and analyzing the file through the character coding type to obtain field separators used for separating different fields in the file.

It will be appreciated that the structural data contained in the file is obtained by parsing the file by character encoding type.

Illustratively, there are various implementations of step S209, and the present application provides, but is not limited to, the following method including the following steps B11 through B13.

Step B11: and searching a target record separator corresponding to the target file type from a preset corresponding relation between the file type and the record separator, wherein the record separator is used for separating two adjacent records and the record comprises one or more fields.

Step B12: and acquiring the record contained in the file based on the target record separator.

Step B13: and determining the character with the highest occurrence frequency in each record as a field separator.

Still taking fig. 3 as an example, the character with the highest frequency of occurrence in each record is "|", so "|" is a field separator.

Step S210: and replacing the corresponding placeholder in the loading command template with the character encoding type, the field separator, the file name and the target data table name to obtain a loading command statement.

Assume the load command template is as follows: LOAD DATA in FILE [ host ] in FILE [ FILE ] in table [ table ebname ] CHARACTER SET [ charset ] in fields terminated by [ term ] in DATA FORMAT [ FILE _ FORMAT ] FILE _ FORMAT [ FILE _ FORMAT ] max _ bad _ records [ badLine ].

Suppose that the file dimensions obtained from the file are: the file dimension for $ host (including the transport protocol, the IP address of the server storing the file, and the storage path of the file in the server) is sftp:// almsftp: alms 100! ftp @21.86.77.31// share/data/data/20200610/; the file dimension (i.e. file name) corresponding to the filename is 0134690d.i32. gz.202006101; the file dimension (i.e. the data table name of the data table for storing the structure data contained in the file) corresponding to the tabebname is T _ ODS _ BANCS _ CUSVAA _ L _ D; the file dimension (i.e., field separator) corresponding to terminateChar is |; the file dimension (including date format type) corresponding to the dateFormat is Y% m% d; the file dimension (including the file type) corresponding to the filetype is GZIP; the file dimension (including error logging threshold) for $ badLine is 100.

The corresponding placeholder can be replaced by the file dimension based on the preset corresponding relationship between the placeholder and the file dimension, and the obtained loading command statement is as follows:

LOAD DATA infile

【sftp://almsftp:alms100！ftp@21.86.77.31//share/data/data/20200610/0134690D.i32.gz.202006101】into table【T_ODS_BANCS_CUSVAA_L_D】CHARACTER SET【GBK】fields terminated by【|】DATE FORMAT【％Y％m％d】FILE_FORMAT【GZIP】max_bad_records【100】。

in order for those skilled in the art to quickly find the file dimension replacing the placeholder from the load command statement, the file dimension is marked with a "[ MEANS ]), and in practical applications, the file dimension in the load command statement may not be marked with a" [ MEANS ]), may be marked with other symbols, such as, for example,', or may not be marked with any character.

It can be understood that the LOAD command statement is specific to the mysql database, statements specific to different databases are different, and required file dimensions may be different, and the LOAD command statement may be set for a specific database.

Step S211: and storing the fields in the file to the data table with the target data table name through the loading command statement.

In the data storage method provided by the embodiment of the application, different files with the same file identification in the file name need to be stored in the same data table, that is, files with the same structure data need to be stored in the same data table, that is, the corresponding relationship between the file identification and the data table name needs to be preset; after a file to be stored is obtained, a target configuration table corresponding to the target file type is searched from a preset corresponding relation between the file type and the configuration table based on the target file type to which the file belongs, and a loading command template and a plurality of file dimensions to be identified corresponding to the target file type are stored in the target configuration table. Acquiring the file name of the file; searching a target character string which does not correspond to the preset common meaning in the file name from a preset corresponding relation between the character string and the preset common meaning, wherein the target character string which does not correspond to the preset common meaning can only uniquely identify a structure of structural data contained in the file; determining the target character string as a target file identifier of the file; searching a target data table name corresponding to the target file identifier from a preset corresponding relation between the file identifier and the data table name; acquiring a character coding type corresponding to the file; analyzing the file through the character coding type to obtain field separators for separating different fields in the file; replacing the corresponding placeholder in the loading command template with the character encoding type, the field separator, the file name and the target data table name to obtain a loading command statement; and storing the fields in the file to the data table with the target data table name through the loading command statement. According to the method and the device, the corresponding relation between the file identification and the data table name needs to be set for each file identification, and a plurality of files containing the same file identification exist in the file names, so that the problem that the corresponding relation between the file and the data table name is set for each file is solved, and the operation is simple and convenient. And the method and the device can automatically generate the loading command statement without manual operation, thereby improving the speed of storing the structural data contained in the file into the data table.

It can be understood that the corresponding relationship between the file identifier and the data table name needs to be preset, and if the corresponding relationship between the file identifier included in the file name of the file and the data table name is not preset for the file to be stored, a relevant person needs to be prompted to set the corresponding relationship between the file identifier included in the file and the data table name. If the data table for storing the structural data contained in the file does not exist in the database, related personnel are required to create the corresponding data table by themselves, so that the situation that the structural data contained in the file cannot be stored in the data table in time occurs. Based on this, the present application also provides the following method for automatically creating a data table, which includes the following steps C11 to C17 in implementation.

Step C11: and searching a target record separator corresponding to the target file type from a preset corresponding relation between the file type and the record separator, wherein the record separator is used for separating two adjacent records and the record comprises one or more fields.

For example, the record separators corresponding to different file types may be the same, or may be different, as shown in fig. 3, the record separator corresponding to the common file type is carriage return, that is, a row is a record.

Step C12: and acquiring the record contained in the file based on the target record separator.

Step C13: determining, based on the field separators, a number of fields included by the record and a target field of the fields included by the record.

It will be appreciated that the number of fields contained in different records in the same file should be the same. Step C13 may be the number of fields contained by any record and the fields contained by any record.

It will be appreciated that the number of fields contained in different records in the same file should be the same, but that errors may occur, resulting in a different number of fields contained in a few records in the same file. For example, a file includes 100 records, wherein 99 records contain the same number of fields, and each record contains 4 fields, 1 record contains 7 fields, and the 1 record is considered as an error record, and the 99 records are all correct records, and the record mentioned in step C13 should be correct records.

Illustratively, step C13 specifically includes the following steps D11 to D15.

Step D11: and acquiring the field number of the fields contained in each record contained in the file based on the field separator.

As shown in fig. 3, the field separator is "|", and the first record contains fields with a number of fields of 6.

Step D12: records with the same number of fields are divided into the same set of records.

Step D13: a target set of records containing the largest number of records is determined from the sets of records.

The file includes 100 records, wherein 99 records have the same number of fields, which are all 4, and 1 record has a field number of 7. Then 99 records are in record set 1, 1 record is in record set 2, and the target record set is record set 1.

Step D14: and determining the field contained in any record in the target record set as the target field.

The 4 fields in any record contained in the record set 1 are respectively target fields.

Step D15: determining the number of fields contained in any record in the target record set as the target number of fields.

Step C14: and searching a target table building template corresponding to the target field number from the corresponding relation between the pre-stored field number and the table building template, wherein the target table building template comprises placeholders needing to be replaced by the target field number and placeholders needing to be replaced by the data table names.

The following describes the table creation template by way of specific examples. Taking a table creation template corresponding to the field number 4 as an example, the table creation template is create table [ table _ jd123 ] [ fields1 ], [ fields2 ], [ fields3 ], and [ fields4 ].

To allow one skilled in the art to quickly see the placeholders in the tabulation template, the placeholders are labeled with "[ ]", in practical applications, the placeholders may be labeled with other characters, or without any characters.

Step C15: and determining the target data table name corresponding to the file based on the target file identifier.

For example, the target file identifier may be used as the target data table name corresponding to the file; illustratively, a target file identification may be randomly assigned a target data table name.

Step C16: and replacing corresponding placeholders in the target table building template with the target data table name and the target field number fields to obtain a table building statement.

Specifically, the target data table name replaces a placeholder [ table _ jd123 ] that needs to be replaced by the data table name in the table building template; and sequentially replacing the placeholders needing to be replaced by the characters in the table building template by 4 fields according to the sequence in the records.

Suppose that the file contains 4 fields in the record in turn: tom1, 25, boy, English, then Tom1, 25, boy, English replace in sequence [ fields1 ], [ fields2 ], [ fields3 ], and [ fields4 ] in the tabulation template to obtain the tabulation statement: create table [ target data table name ] [ Tom1 ] [ 25 ], [ boy ], [ English ].

Step C17: and creating a data table with the name of the target data table as the name through the table building statement.

The data table named by the target data table name can be automatically created through the above-described steps C11 through C17. And then, a loading command statement can be automatically generated without manually creating a data table, so that the speed of storing the structural data in the file into the data table in the database is increased.

After the data table with the target data table name as the name is automatically created, the corresponding relationship between the target data table name and the file identifier needs to be automatically set, so that the target data table name corresponding to the target file identifier is searched directly from the preset corresponding relationship between the file identifier and the data table name next time, and the data table does not need to be repeatedly created for the same file identifier.

It can be understood that the number of fields in each record contained in a file may be different, when each record in the file is stored, an error record needs to be skipped, and if the number of error records contained in the file is too large, it indicates that the file may be a problematic file, and the structure data in the file does not need to be stored, for this reason, the present application further provides the following method, which further includes the following steps E11 to E12 in implementation.

Step E11: determining a total number of records contained in the file.

The plurality of file dimensions further includes an error logging threshold, and the load command template further includes placeholders for which the error logging threshold requires replacement.

Step E12: and replacing the corresponding placeholder in the loading command template by the product of the total number of records and a preset proportion.

The product is used for indicating that if the total number of the target records contained in the file is higher than or equal to the error record threshold value, fields contained in the file are prohibited from being stored in a data table with the target data table name, and the number of the fields contained in the target records is not equal to the target number of the fields.

It can be understood that the file may further include definition information for a file type of the file, for example, if the file is of a plain text type, the file may include a column header, and if the file is of a JSON type, the file may include object or array structure or header file information, which does not need to be stored in the data table, so that the number of lines occupied by the definition information in the file needs to be identified, and these lines of data need to be ignored in the process of storing the structure data in the file into the data table, and these lines of data need to be ignored in the process of calculating the total number of records; generally, if a file contains definition information, the definition information is in the first few lines of the file. Based on this, the embodiments of the present application also provide the following method, which includes the following steps F11 to F12 in the implementation process.

Step F11: and analyzing the file to obtain the number of lines occupied by the definition information.

Wherein the plurality of file dimensions further comprise: a number of lines occupied by definition information for specifying a structure of the file, the load command template further including: a placeholder for line replacement occupied by information needs to be defined.

For example, the load command template may be: LOAD DATA in FILE [ host ] in [ FILE ] in table [ table ebname ] CHARACTER SET [ charset ] in fields terminated by [ term ] in DATA FORMAT [ FILE _ FORMAT ] FILE _ FORMAT [ FILE _ FORMAT ] max _ bad _ records [ badLine ] in depth [ line ] in line.

Step F12: and replacing the corresponding placeholder in the loading command template with the line number, wherein the line number is used for indicating that the data of the line number is forbidden to be loaded into the data table with the target data table name.

It can be understood that, after obtaining a loading command statement corresponding to a target file identifier for each target file identifier, a corresponding relationship between the target file identifier and the loading command statement may be constructed; if the file to be stored with the file identifier as the target file identifier is obtained again, the loading command statement corresponding to the target file identifier can be directly searched from the corresponding relation between the pre-constructed file identifier and the loading command statement, so that the structural data in the file can be directly stored into the data table with the target data table name based on the loading command statement, and the storage speed of the structural data in the file is improved. The embodiment of the application also provides the following method, which comprises the following steps G11 to G12 in the implementation process.

Step G11: and constructing a corresponding relation between the target file identification and the loading command statement.

Before step S207 is executed, the method further includes:

step G12: and searching whether a loading command statement corresponding to the target file identifier exists or not from a corresponding relation between a pre-constructed file identifier and the loading command statement.

Step G12: if the loading command statement corresponding to the target file identifier is not found, step S207 is executed, and if the loading command statement corresponding to the target file identifier is found, step S211 is executed.

The method is described in detail in the embodiments disclosed in the present application, and the method of the present application can be implemented by various types of apparatuses, so that an apparatus is also disclosed in the present application, and the following detailed description is given of specific embodiments.

As shown in fig. 4, a schematic diagram of a data storage device provided in an embodiment of the present application, the data storage device includes: a first obtaining module 401, a second obtaining module 402, a first searching module 403, a third obtaining module 404, a second searching module 405, a first determining module 406, a third searching module 407, a fourth obtaining module 408, an analyzing module 409, a first replacing module 410, and a storing module 411, wherein:

a first obtaining module 401, configured to obtain a file to be stored;

a second obtaining module 402, configured to obtain a target file type to which the file belongs;

a first searching module 403, configured to search a target configuration table corresponding to a target file type from a preset corresponding relationship between a file type and a configuration table, where the target configuration table stores a loading command template and a plurality of file dimensions to be identified corresponding to the target file type, the loading command template includes placeholders that the plurality of file dimensions respectively need to be replaced, and the plurality of file dimensions include: the character encoding type of the file, field separators used for separating different fields in the file, the file name of the file and the data table name of a data table used for storing the fields contained in the file;

a third obtaining module 404, configured to obtain a file name of the file;

a second searching module 405, configured to search, from a preset correspondence between a character string and a preset common meaning, a target character string in the file name that does not correspond to the preset common meaning;

a first determining module 406, configured to determine that the target character string is a target file identifier of the file;

a third searching module 407, configured to search, from a preset correspondence between a file identifier and a data table name, a target data table name corresponding to the target file identifier;

a fourth obtaining module 408, configured to obtain a character encoding type corresponding to the file;

the parsing module 409 is configured to parse the file according to the character encoding type to obtain field separators for separating different fields in the file;

a first replacing module 410, configured to replace a corresponding placeholder in the load command template with the character encoding type, the field separator, the file name, and the target data table name, so as to obtain a load command statement;

the storage module 411 is configured to store the fields in the file to the data table with the target data table name through the load command statement.

In an optional implementation manner, the method further includes:

a fourth searching module, configured to search a target record separator corresponding to a target file type from a preset correspondence between a file type and a record separator if a target data table name corresponding to the target file identifier is not searched from a preset correspondence between a file identifier and a data table name, where the record separator is used to separate two adjacent records, and the record includes one or more fields;

a fifth obtaining module, configured to obtain records included in the file based on the target record delimiter;

a second determining module for determining, based on the field separator, a field included in the record and a target number of fields of the field included in the record;

a fifth searching module, configured to search a target table building template corresponding to the target field number from a corresponding relationship between a pre-stored field number and the table building template, where the target table building template includes placeholders that need to be replaced by the target field number of target fields and placeholders that need to be replaced by a data table name;

a third determining module, configured to determine, based on the target file identifier, a target data table name corresponding to the file;

the second replacing module is used for replacing corresponding placeholders in the target table building template with the target data table name and the target field number fields to obtain table building sentences;

and the creating module is used for creating the data table with the name of the target data table as the name through the table building statement.

In an optional implementation manner, the second determining module includes:

a first obtaining unit, configured to obtain, based on the field separator, field numbers of fields included in respective records included in the file;

a dividing unit for dividing the records with the same number of fields into the same record set;

a first determining unit, configured to determine a target record set containing the largest number of records from the record sets;

a second determining unit, configured to determine that a field included in any record in the target record set is the target field;

and the third determining unit is used for determining the field number of the fields contained in any record in the target record set as the target field number.

In an optional implementation, the plurality of file dimensions further include an error logging threshold, and the load command template further includes placeholders that the error logging threshold needs to be replaced; further comprising:

a fourth determining module for determining a total number of records contained in the file;

and a third replacing module, configured to replace a corresponding placeholder in the load command template with a product of the total number of records and a preset ratio, where the product is used to indicate that, if the total number of target records included in the file is higher than or equal to the error record threshold, fields included in the file are prohibited from being stored in a data table having the target data table name, and the number of fields included in the target records is not equal to the number of target fields.

In an optional implementation manner, the parsing module includes:

a second obtaining unit, configured to obtain each record included in the file;

a fourth determining unit, configured to determine that the character with the highest occurrence frequency in each record is the field separator.

In an alternative implementation, the plurality of file dimensions further includes: a number of lines occupied by definition information for specifying a structure of the file, the load command template further including: the placeholder that needs to define the line replacement occupied by the information further comprises:

the analysis information module is used for analyzing the file and acquiring the number of lines occupied by the definition information;

and the fourth replacing module is used for replacing the corresponding placeholder in the loading command template with the line number, wherein the line number is used for indicating that the data of the line number is forbidden to be loaded into the data table with the target data table name.

In an optional implementation manner, the method further includes:

the building module is used for building the corresponding relation between the target file identification and the loading command statement;

before executing the third searching module, the method further comprises:

a sixth searching module, configured to search, from a correspondence between a pre-constructed file identifier and a loading command statement, whether a loading command statement corresponding to the target file identifier exists;

and the triggering module is used for executing the third searching module if the loading command statement corresponding to the target file identifier is not searched. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Electronic devices include, but are not limited to: a processor 51, a memory 52, a network interface 53, an I/O controller 45, and a communication bus 55.

It should be noted that, as those skilled in the art will appreciate, the structure of the electronic device shown in fig. 5 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown in fig. 5, or may combine some components, or may be arranged in different components.

The following describes each component of the electronic device in detail with reference to fig. 5:

the processor 51 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 52 and calling data stored in the memory 52, thereby performing overall monitoring of the electronic device. Processor 51 may include one or more processing units; illustratively, the processor 51 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 51.

Processor 51 may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention, or the like;

the Memory 52 may include Memory, such as a Random-Access Memory (RAM) 521 and a Read-Only Memory (ROM) 522, and may also include a mass storage device 523, such as at least 1 disk storage. Of course, the electronic device may also include hardware required for other services.

The memory 52 is used for storing the executable instructions of the processor 51. The processor 51 has the following functions: acquiring a file to be stored;

acquiring a target file type to which the file belongs;

acquiring the file name of the file;

acquiring a character coding type corresponding to the file;

A wired or wireless network interface 53 is configured to connect the electronic device to a network.

The processor 51, the memory 52, the network interface 53, and the I/O controller 45 may be connected to each other by a communication bus 55, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc.

In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the data storage method.

In an exemplary embodiment, the disclosed embodiments provide a storage medium comprising instructions, such as the memory 52 comprising instructions, executable by the processor 51 of the electronic device to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer readable storage medium is also provided, which can be directly loaded into the internal memory of a computer, such as the memory 52, and contains software codes, and the computer program can realize the data storage method by being loaded and executed by the computer.

The data storage method, the data storage device, the electronic equipment and the data storage medium can be used in the financial field or other fields, for example, can be used in a data mart application scenario or a data warehouse in the financial field. The other fields are arbitrary fields other than the financial field, for example, the electric power field. The foregoing is merely an example, and does not limit the application fields of the data storage method, apparatus, electronic device and medium provided by the present invention.

Note that the features described in the embodiments in the present specification may be replaced with or combined with each other. For the device or system type embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of storing data, comprising:

acquiring a file to be stored;

acquiring a target file type to which the file belongs;

acquiring the file name of the file;

acquiring a character coding type corresponding to the file;

2. The data storage method according to claim 1, wherein if the target data table name corresponding to the target file identifier is not searched from the preset correspondence between the file identifier and the data table name, further comprising:

searching a target record separator corresponding to the target file type from a preset corresponding relation between the file type and the record separator, wherein the target record separator is used for separating two adjacent records, and the record comprises one or more fields;

acquiring records contained in the file based on the target record separator;

determining, based on the field separators, fields contained by the record and a target number of fields of the fields contained by the record;

searching a target table building template corresponding to the target field number from a corresponding relation between the pre-stored field number and the table building template, wherein the target table building template comprises placeholders needing to be replaced by the target field number and placeholders needing to be replaced by data table names;

determining a target data table name corresponding to the file based on the target file identifier;

replacing corresponding placeholders in the target table building template with the target data table name and the target field number fields to obtain a table building statement;

and creating a data table with the name of the target data table as the name through the table building statement.

3. The data storage method of claim 2, wherein said determining the number of fields contained by the record and the target field of the field based on the field delimiter comprises:

acquiring the field number of fields contained in each record contained in the file based on the field separator;

dividing the records with the same number of fields into the same record set;

determining a target record set containing the maximum number of records from the record sets;

determining a field contained in any record in the target record set as the target field;

determining the number of fields contained in any record in the target record set as the target number of fields.

4. The data storage method of claim 2 or 3, wherein a plurality of the file dimensions further comprise an error logging threshold, and wherein the load command template further comprises placeholders that the error logging threshold requires replacement; further comprising:

determining a total number of records contained in the file;

replacing the corresponding placeholder in the load command template with a product of the total number of records and a preset proportion, wherein the product is used for indicating that if the total number of target records contained in the file is higher than or equal to the error record threshold value, fields contained in the file are prohibited from being stored in a data table with the target data table name, and the number of fields contained in the target records is not equal to the number of target fields.

5. The data storage method of any of claims 1 to 3, wherein parsing the file to obtain field separators comprises:

acquiring each record contained in the file;

and determining the character with the highest frequency of occurrence in each record as the field separator.

6. A data storage method according to any one of claims 1 to 3, wherein the plurality of file dimensions further comprises: a number of lines occupied by definition information for specifying a structure of the file, the load command template further including: a placeholder for a row replacement to be occupied by definition information is required, further comprising:

analyzing the file to obtain the number of lines occupied by the definition information;

and replacing the corresponding placeholder in the loading command template with the line number, wherein the line number is used for indicating that the data of the line number is forbidden to be loaded into the data table with the target data table name.

7. The data storage method of any of claims 1 to 3, further comprising:

constructing a corresponding relation between the target file identification and a loading command statement;

before searching the target data table name corresponding to the target file identifier from the preset corresponding relationship between the file identifier and the data table name in the executing step, the method further comprises the following steps:

searching whether a loading command statement corresponding to the target file identifier exists in a corresponding relation between a pre-constructed file identifier and the loading command statement;

if the loading command sentence corresponding to the target file identifier is not found, the execution step searches the target data table name corresponding to the target file identifier from the preset corresponding relation between the file identifier and the data table name.

8. A data storage device, comprising:

the first acquisition module is used for acquiring a file to be stored;

the third acquisition module is used for acquiring the file name of the file;

the third searching module is used for searching the target data table name corresponding to the target file identifier from the preset corresponding relation between the file identifier and the data table name;

the fourth obtaining module is used for obtaining the character coding type corresponding to the file;

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the data storage method of any of claims 1 to 7.

10. A computer readable storage medium, instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the data storage method of any one of claims 1 to 7.