CN106649890A - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
CN106649890A
CN106649890A CN201710066733.9A CN201710066733A CN106649890A CN 106649890 A CN106649890 A CN 106649890A CN 201710066733 A CN201710066733 A CN 201710066733A CN 106649890 A CN106649890 A CN 106649890A
Authority
CN
China
Prior art keywords
data
vector
input
classification model
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710066733.9A
Other languages
Chinese (zh)
Other versions
CN106649890B (en
Inventor
程力
王云
仇瑜
马超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tax Cloud Network Technology Services Ltd
Original Assignee
Tax Cloud Network Technology Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tax Cloud Network Technology Services Ltd filed Critical Tax Cloud Network Technology Services Ltd
Priority to CN201710066733.9A priority Critical patent/CN106649890B/en
Publication of CN106649890A publication Critical patent/CN106649890A/en
Application granted granted Critical
Publication of CN106649890B publication Critical patent/CN106649890B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention discloses a data storage method and device. One embodiment of the data storage method includes: acquiring characteristic information of data to be stored, wherein the characteristic information includes at least one of a name of a data table item in a data table to which the data belongs, statistical characteristic information for indicting statistical characteristics of the data, and keywords; converting the characteristic information into an input vector of a data classification model to be input in the data classification model, and acquiring an output vector indicting the type of the data, wherein the data classification model is generated by performing training on a training sample in a monitoring manner, and the raining sample includes the characteristic information of the stored data, and the type of the marked storage data; and storing the data in a storage area corresponding to the type. The data storage method can save storage space, and can rapidly store data.

Description

Date storage method and device
Technical field
The application is related to field of computer technology, and in particular to Internet technical field, more particularly to date storage method And device.
Background technology
Data storage is the collection to data, storage, retrieval, processing, conversion and transmits.In existing data storage, Especially in finance, the data storage procedure in tax field, generally according to the needs of business, the good data of Manual definition first are special Levy and the data type corresponding with data characteristics and stored, in order to follow-up financial accounting.
However, the existing finance, the data-storage system in tax field of being applied to lack to enter unstructured data first Row analyzing and processing ability, secondly as there is larger difference between different financial accounting systems, according to different accounting systems System, needs repeatedly to define data characteristics and matched rule to be stored, and while increasing data storage loaded down with trivial details and spend, takes Substantial amounts of memory space, reduces the utilization ratio of data.
The content of the invention
The purpose of the application is to propose a kind of improved date storage method and device to solve background above technology department Divide the technical problem mentioned.
In a first aspect, this application provides a kind of date storage method, said method includes:Obtain data to be stored Characteristic information, features described above information include it is following at least one:The title of the data table items in tables of data belonging to above-mentioned data, Indicate statistical nature information, the keyword of the statistical nature of above-mentioned data;Features described above information is converted into data classification model Input vector be input to data classification model, the output vector of the type for obtaining indicating above-mentioned data, above-mentioned data are classified mould Type is generated based on advancing with training sample with there is monitor mode to be trained, and above-mentioned training sample includes:Data storage Features described above information, Jing mark above-mentioned data storage type;Above-mentioned data storage is deposited the above-mentioned type is corresponding Storage area domain.
In certain embodiments, above-mentioned data classification model is decision-tree model.
In some optional implementations of the present embodiment, above-mentioned data are the data in tables of data, and features described above is believed Breath includes:The title of the data table items in tables of data belonging to above-mentioned data, statistical nature information;And by features described above information The input vector for being converted to data classification model is input to data classification model, the output of the type for obtaining indicating above-mentioned data to Amount includes:The corresponding tables of data characteristic vector of characteristic information is generated, above-mentioned tables of data characteristic vector includes:Represent above-mentioned data institute The component of the title of the data table items in the tables of data of category, the component for representing statistical nature information;Generate and include successively above-mentioned number According to table characteristic vector and the first input vector of the data classification model of null vector;Above-mentioned first input vector is input into data Disaggregated model, the output vector of the type for obtaining indicating above-mentioned data.
In certain embodiments, above-mentioned statistical nature information includes:Indicate incidence relation between above-mentioned data table items Related information, the mean value of the length of above-mentioned data, the maximum of the length of above-mentioned data, the minimum of the length of above-mentioned data The type of the character in value, above-mentioned data.
In some optional implementations of the present embodiment, above-mentioned data are text data, and features described above information is pass Keyword;And the input vector that features described above information is converted to data classification model is input into data classification model, referred to Showing the output vector of the type of above-mentioned data includes:The corresponding keyword feature vector of characteristic information is generated, wherein, keyword is special Levy each keyword correspondence one-component in vector;Generate the data comprising null vector and above-mentioned keyword feature vector successively Second input vector of disaggregated model;
In certain embodiments, above-mentioned second input vector is input into data classification model, obtains indicating above-mentioned data Type output vector.
Second aspect, this application provides a kind of data storage device, said apparatus include:Acquiring unit, is configured to Obtain the characteristic information of data to be stored, features described above information include it is following at least one:Tables of data belonging to above-mentioned data In data table items title, indicate above-mentioned data statistical nature statistical nature information, keyword;Input block, configuration Input vector for features described above information to be converted to data classification model is input to data classification model, obtains instruction above-mentioned The output vector of the type of data, above-mentioned data classification model is based on and advances with training sample to have monitor mode to be trained And generate, above-mentioned training sample includes:The features described above information of data storage, Jing mark above-mentioned data storage class Type;Memory cell, is configured to above-mentioned data storage in the corresponding storage region of the above-mentioned type.
In certain embodiments, above-mentioned data classification model is decision-tree model.
In certain embodiments, above-mentioned data are the data in tables of data, and features described above information includes:Belonging to above-mentioned data Tables of data in data table items title, statistical nature information, and above-mentioned input block includes:Tables of data characteristic vector is given birth to Into subelement, it is configured to generate the corresponding tables of data characteristic vector of characteristic information, above-mentioned tables of data characteristic vector includes:Represent The component of the title of the data table items in tables of data belonging to above-mentioned data, the component for representing statistical nature information;First input Vector generates subelement, is configured to generate data classification model successively comprising above-mentioned tables of data characteristic vector and null vector First input vector;Output vector generates subelement, is configured to for above-mentioned first input vector to be input to data classification model, The output vector of the type for obtaining indicating above-mentioned data.
In certain embodiments, above-mentioned statistical nature information includes:Indicate incidence relation between above-mentioned data table items Related information, the mean value of the length of above-mentioned data, the maximum of the length of above-mentioned data, the minimum of the length of above-mentioned data The type of the character in value, above-mentioned data.
In certain embodiments, above-mentioned data are text data, and features described above information is keyword, and above-mentioned input list Unit includes:Keyword feature vector generates subelement, is configured to generate the corresponding keyword feature vector of characteristic information, its In, each keyword correspondence one-component in keyword feature vector;Second input vector generates subelement, is configured to life Into the second input vector of the data classification model for including null vector and above-mentioned keyword feature vector successively;Output vector is generated Subelement, is configured to for above-mentioned second input vector to be input to data classification model, obtains indicating the type of above-mentioned data Output vector.
Date storage method and device that the application is provided, by the characteristic information for obtaining data to be stored, then will Characteristic information is converted into input vector and is input in the data classification model of Training, and will be defeated from data classification model The data vector for going out is stored in storage region corresponding with data type, so as to effectively be divided data according to data type Class, saves the memory space of data storage areas.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 is the flow chart of one embodiment of the date storage method according to the application;
Fig. 3 is the flow chart of another embodiment of the date storage method according to the application;
Fig. 4 is the structural representation of one embodiment of the data storage device according to the application;
Fig. 5 is adapted for the structural representation of the computer system of the server for realizing the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that, in order to Be easy to description, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Below with reference to the accompanying drawings and in conjunction with the embodiments describing the application in detail.
Fig. 1 shows the exemplary system of the embodiment of the date storage method or data storage device that can apply the application System framework 100.
As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted by network 104 with using terminal equipment 101,102,103 with server 105, to receive or send out Send message etc..Various client applications, such as web browser applications, number can be installed on terminal device 101,102,103 According to accounting class application, financial statement class application, searching class application, JICQ, mailbox client, social platform software Deng.
Terminal device 101,102,103 can be the various electronic equipments with display screen, including but not limited to intelligent hand Machine, panel computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio frequency aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio frequency aspect 4) player, pocket computer on knee and desktop computer etc. Deng.
Server 105 can be to provide the server of various services, such as to operation on terminal device 101,102,103 Using the back-end data processing server that data are supported is provided, the server from gathered data in each data source is can also be. Back-end data processing server can be analyzed process to the data got from data source, and result is deposited Store up and feed back to terminal device.
It should be noted that the date storage method that the embodiment of the present application is provided typically is performed by server 105, accordingly Ground, data storage device is generally positioned in server 105.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematic.According to realizing need Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, flow process Figure 200 of one embodiment of date storage method according to the application is shown.Institute The date storage method stated, comprises the following steps:
Step 201, obtains the characteristic information of data to be stored.
In the present embodiment, date storage method operation electronic equipment (such as the server shown in Fig. 1) thereon can To obtain the data source information of data to be stored by wired connection mode or radio connection, and according to data source information Obtain data to be stored.Here, data source refers to the original media for providing desired data or the number supported by memory device According to storehouse.Data source information refers to the information set up needed for database connection.When data to be stored are obtained according to data source information, Data to be stored can be obtained from network, database or the application relevant with financial system.
When data to be stored are obtained from database, above-mentioned electronic equipment can be by the service for supporting database Device provides correct DSN, finds corresponding database annexation, and then gets from corresponding data source and wait to deposit The data of storage.
When data to be stored are obtained from the financial system of enterprise, data source information can include financial internal information And external information, wherein internal information can include miscellaneous service processing data and all kinds of document datas, and external information can be with Including all kinds of laws and regulations, market information etc..
In the present embodiment, after server gets data to be stored from data source, can further obtain and treat The characteristic information of the data of storage, wherein, the characteristic information of data to be stored include it is following at least one:It is above-mentioned to be stored The title of the data table items in tables of data belonging to data, indicate data statistical nature statistical nature information and key Word.Here, above-mentioned tables of data can be arranged in above-mentioned database, for depositing above-mentioned data to be stored.Wherein, one Tables of data can arrange a title, and the title can be for example department name, funds, employee etc..Above-mentioned statistical nature can be with For the quantity of data, length of data etc..When above-mentioned data to be stored are text data, features described above information can be use To indicate the keyword of text content.For example, when above-mentioned text data is " research funding of A departments ", above-mentioned keyword Can be " A departments ", " research funding ".
In some optional implementations of the present embodiment, above-mentioned statistical nature information includes indicating above-mentioned data table items Between incidence relation related information, the mean value of the length of data, the maximum of the length of data, the length of data most The type of the character in little value, data.
As an example, server gets first data to be stored from multiple data sources.Then, server can enter one Step gets the title of the data table items in the affiliated tables of data in database of data to be stored, and for example, one of them is treated Entitled " department's wage " of the data table items in the affiliated tables of data in database of the data of storage, another is to be stored Entitled " performance pay " of the data table items in the affiliated tables of data in database of data.Server can also obtain above-mentioned The statistical nature information of data to be stored, for example, server can obtain the data length of " department's wage " this data Mean value, it is also possible to obtain the minimum of a value and maximum of the data length of " performance pay " this data.
Step 202, by the input vector that characteristic information is converted to data classification model data classification model is input to, and is obtained Indicate the output vector of the type of data.
In the present embodiment, according to the characteristic information of the data to be stored got in step 201, server can be with root According to characteristic information build for represent data to be stored multiple features multi-C vector as data classification model input Vector.The input vector include represent data table items title component, represent data statistical nature statistical nature component, Represent the characteristic component of keyword.Then input vector is input in data classification model, it is to be stored so as to obtain instruction The output vector of the type of data.Output vector can include type component, data to be stored and the number of each preset data According to type between matching degree component.Mutually matching degree can be used between corresponding data to be stored and the type of data Represent the power of its corresponding relation.Generally, matching degree is higher, and the probability of the type that data to be stored then belong to the data is got over Greatly.
The type of data can include the character string for representing the title such as department name, document title of all kinds of things Data type, can also include the data type for representing digital such as integer, floating-point, positive number, negative, can also include using In the data type for representing date and time, the data type etc. for representing currency can also be included.
Data classification model can be used for the type of description data (such as the data in tables of data) to be stored and data The corresponding relation of (such as numeral data type).Data classification model is by the characteristic information of data storage and Data storage characteristic information matching Jing mark data storage type and the characteristic information of data storage with Matching degree between the type of data storage is carried out in supervised learning mode as training sample by the method for machine learning Training is formed.
Wherein, supervised learning mode can be carried out as follows:
First, using data storage as training sample, server obtains the characteristic information of the data for having stored.For example, When the data for having stored are the data in database, due to there are multiple tables of data in database, server can be obtained The title of the data table items of data storage, type of the character of data storage etc. can be obtained;When the data for having stored are text During notebook data, server can obtain the keyword of data storage as characteristic information.
Then, it is the type label of the data of data storage setting, such as label can be numeral data class Type, the data type for representing the date, data type of expression text etc..
Again, the data type label based on data storage and the characteristic information of data storage, set up and have stored number According to data type and the matching degree between the characteristic information of data storage.Due to one data storage sample have extremely A few characteristic information, and the type label of each one data of data storage sample standard deviation correspondence, server can basis The algorithm of setting calculates the type of the data of data storage and the matching degree between the characteristic information of data storage.
Finally, using machine learning method, characteristic information based on data storage and the characteristic information of data storage The type of the data storage of the Jing marks of matching and the characteristic information of data storage and between the type of data storage Matching degree carry out data classification model training.
The method of above-mentioned machine learning can include the methods such as neutral net, genetic algorithm.
With " department name " this data instance to be stored, this step is illustrated." department name " this word exists Name in different application scenarios is differed, and can be cried " department " in the system having, and may be cried in another system " department ", and " depart " can be named as in another system, but their classification is " department name ".Cause This, in a system, when data to be stored are that any of the above is a kind of, will can get and the above in step 201 The relevant characteristic information of title is converted to the input vector of data classification model and is input in data classification model and matched, and obtains To the output vector of the type for indicating above-mentioned data to be stored, server can determine above-mentioned to be stored according to the output vector Data type be " department name ".
Step 203, stores data in the corresponding storage region of type of the data indicated by output vector.
In the present embodiment, according to the output vector of the data classification model obtained in step 202, it may be determined that data institute The type of category, so as to store data in the corresponding storage region of the above-mentioned type in.In server or client for the ease of Data are carried out to unify effectively management, storage region is set generally according to different data types, server is according to output After vector determination data type to be stored, can first look for whether being provided with the data type in default storage region, If having, data to be stored can be stored directly in the corresponding storage region of the type, if nothing, server can be built again Found a new storage region to be stored.
The date storage method that the embodiment of the present application is provided, by the characteristic information for obtaining the data with storage, then will Characteristic information is converted to the input vector of the data classification model of training in advance and is input in data classification model, is referred to The output vector of the type of registration evidence, finally stores data in the corresponding storage of data type indicated by data classification model Region, so as to classifying to data to be stored for mailbox, while the storage efficiency of data is improved data has been saved Memory space.
With further reference to Fig. 3, the flow process 300 of another embodiment of date storage method is it illustrates.The data storage The flow process 300 of method, comprises the following steps:
Step 301, obtains the characteristic information of data to be stored.
Whether existing data can divide number of different types, can be realized come logical expression with bivariate table structure according to data, Data can be divided into structural data and unstructured data.Structural data namely row data, can be with unified knot Structure represented, for example numeral, symbol and traditional data models;Unstructured data refers to that the field length of data is variable, And the record of each field can be included by the data for repeating or unrepeatable son field is constituted, unstructured data again Video, audio frequency, document, textual image, all kinds of forms, image, office documents etc..In there is mass data table in financial system Data, i.e. structural data, its characteristic information can be by type of character string in data length value, data etc. come table Show;Substantial amounts of text data is also there is, its characteristic information can be represented by keyword.
In the present embodiment, date storage method operation electronic equipment (such as the server shown in Fig. 1) thereon can To obtain the characteristic information of data to be stored by wired connection mode or radio connection.When above-mentioned number to be stored According to for data in tables of data when, its characteristic information include it is following at least one:The data table items in tables of data belonging to data Title, indicate the statistical nature information of the statistical nature of data, the statistical nature information of statistical nature for indicating data is also wrapped Include indicate between data table items the related information of incidence relation, the mean value of the length of data, the maximum of the length of data, The type of the character in minimum of a value, the data of the length of data.When above-mentioned data to be stored are text data, its feature letter Breath includes keyword.
In the present embodiment, when data to be stored be text data when, it is possible to use natural language processing method or Circulation neural network model carries out cutting word, participle to text data, so that it is determined that the keyword in text data.
Step 302, generates the corresponding tables of data characteristic vector of characteristic information.
The characteristic information of the data to be stored in the tables of data got in step 301, in the present embodiment, clothes The characteristic information of data to be stored can be generated tables of data characteristic vector by business device, wherein, tables of data characteristic vector includes table Registration is according to the component of the title of the data table items in affiliated tables of data, the component of expression statistical nature information.As an example, exist In one system, data " B " to be stored are " employee information ", " information of employee " such as " sex ", " age " can be in " member Store in the essential information of work " this tables of data, it is also possible to set up and " department information " this tables of data using main foreign key relationship Relation being stored.The characteristic vector corresponding with data " B " to be stored is instruction " employee information " this data institute The component of the title of the list item of the tables of data of category, the component of the incidence relation between instruction and " department information ", instruction employee's letter The component of the average length of the data of breath.
Step 303, generate the first of the data classification model comprising tables of data characteristic vector and null vector successively be input into Amount.
The input vector of data classification model can include the characteristic vector of structural data and the spy of unstructured data Vectorial two parts are levied, in summing up in the point that general financial system, the input vector of data classification model mainly includes that tables of data is special Vector sum keyword feature vector two parts are levied, when data to be stored are data table data, i.e. structural data, can be by Keyword feature vector representation into null vector form, when data to be stored be text data, i.e. unstructured data when, can So that tables of data characteristic vector to be expressed as in the form of null vector.
In the present embodiment, server is the data in tables of data according to the data to be stored determined in step 301, and The characteristic vector of the data in the tables of data determined in step 302, server can further generate data classification model The first input vector, in first input vector successively include step 302 in determine tables of data characteristic vector and null vector.
Step 304, generates the corresponding keyword feature vector of characteristic information.In the present embodiment, when data to be stored For text data when, because the characteristic information of text data is keyword, in this step, can will be corresponding with text data Key word information generates keyword feature vector, wherein, each keyword correspondence one-component in keyword feature vector. In the present embodiment, it is possible to use vector space model is vectorial to generate keyword feature, vector space model is existing known Technology, will not be described here.As an example, in some system, there are the non-structured text such as substantial amounts of document, contract Notebook data.When data to be stored are " C company contracts ", characteristic information of the server according to " the C company contracts " for getting The keyword such as " C companies ", " contract " generates respectively keyword component corresponding with keyword " C companies " and corresponding with " contract " Keyword component.
Step 305, generate successively comprising null vector and keyword feature vector data classification model second be input into Amount.
In the present embodiment, server is text data according to the data to be stored determined in step 301, and according to step The crucial term vector of the text data determined in rapid 305, server can further generate the second input of data classification model Vector, includes successively null vector and the key term vector determined in step 305 in the input vector
Step 306, by input vector data classification model is input to, and obtains the output vector of the type of instruction data.
In the present embodiment, according in step 303 and step 305 determine data classification model the first input vector and Above-mentioned first input vector and the second input vector can be separately input to data classification model by the second input vector, server In, obtain the output vector of the type of instruction data.Output vector can include the type component, to be stored of each preset data Data and the type of data between matching degree component.Here, data classification model can be first according to input vector first First determine that data to be stored are the data or text data in tables of data, then data classification model can be to above two Data carry out separating to process, so as to respectively according to the first input vector and the second input vector generation output vector.For example, when When server is input to the input vector that data " X " to be stored are generated in data classification model, data classification model can be with Tables of data characteristic component and null vector based on the input vector determines that data " X " to be stored are the data in tables of data, The data type for determining the data simultaneously is " data type relevant with numeral ", therefore data classification model output is " with numeral The corresponding output vector of relevant data type ".Again for example, the input for generating data " Y " to be stored when server to When amount is input in data classification model, data classification model can be based on the null vector and keyword feature point of the input vector Amount determines that data " Y " to be stored are text data, while the data type for determining the data is " character type ", therefore data point Class model exports the output component corresponding with " character type ".
In the present embodiment, above-mentioned data classification model is based on and advances with training sample to have monitor mode to be trained Form, alternatively, above-mentioned data classification model is decision-tree model, here it should be noted that the machine of decision-tree model Learning method is widely studied at present and application known technology, be will not be described here.
Step 307, stores data in the corresponding storage region of type of the data indicated by output vector.
In the present embodiment, according to the output vector of the data classification model obtained in step 306, it may be determined that data institute The type of category, so as to store data in the corresponding storage region of the above-mentioned type in.
From figure 3, it can be seen that compared with the corresponding embodiments of Fig. 2, the flow process of the date storage method in the present embodiment 300 data to be stored are divided into data and text data in structural data and unstructured data, i.e. tables of data, together When two kinds of data distributions be input in data classification model matched, data classification model carries out above two data Separate to process, the output vector for respectively obtaining the type for indicating the data in tables of data is defeated with the type for indicating text data Outgoing vector, so as to more rapidly and effectively data fast and effectively classify, and accelerates the speed of data storage, reduces storage number According to space.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of data storage dress The one embodiment put, the device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and the device specifically can apply to respectively In planting electronic equipment.
As shown in figure 4, the above-mentioned data storage device 400 of the present embodiment includes:Acquiring unit 401, input block 402, And memory cell 403.Wherein, acquiring unit 401 is configured to the characteristic information of the data for obtaining to be stored, features described above letter Breath include it is following at least one:The title of the data table items in tables of data belonging to above-mentioned data, the statistics for indicating above-mentioned data The statistical nature information of feature, keyword;Input block 402 is configured to for features described above information to be converted to data classification model Input vector be input to data classification model, the output vector of the type for obtaining indicating above-mentioned data, above-mentioned data are classified mould Type is generated based on advancing with training sample with there is monitor mode to be trained, and above-mentioned training sample includes:Data storage Features described above information, Jing mark above-mentioned data storage type;Memory cell 403 is configured to above-mentioned data storage In the corresponding storage region of the above-mentioned type.
In the present embodiment, the acquiring unit 401 of data storage device 400, input block 402 and memory cell 403 Concrete process and its technique effect that brought can respectively with reference to step 201, step 202 and step in Fig. 2 correspondence embodiments 203 related description, will not be described here.
In some optional implementations of the present embodiment, above-mentioned data are the data in tables of data, and features described above is believed Breath includes title, the statistical nature information of the data table items in the tables of data belonging to above-mentioned data, and above-mentioned input block 402 Including:Tables of data characteristic vector generates subelement 4021 and is configured to generate the corresponding tables of data characteristic vector of characteristic information, on Stating tables of data characteristic vector includes:Represent the component of the title of the data table items in the tables of data belonging to above-mentioned data, represent system The component of meter characteristic information;First input vector generates subelement 4022 and is configured to generate successively comprising above-mentioned tables of data feature The input vector of the data classification model of vector sum null vector;Output vector generates subelement 4025 and is configured to above-mentioned input Vector is input to data classification model, the output vector of the type for obtaining indicating above-mentioned data.
In some optional implementations of the present embodiment, above-mentioned statistical nature information includes:Indicate above-mentioned tables of data It is the related information of incidence relation, the mean value of the length of above-mentioned data, the maximum of the length of above-mentioned data between, above-mentioned The type of the character in the minimum of a value of the length of data, above-mentioned data.
In some optional implementations of the present embodiment, above-mentioned data are text data, and features described above information is pass Keyword, and above-mentioned input block 402 includes:Keyword feature vector generates subelement 4023 and is configured to generate characteristic information Corresponding keyword feature vector, wherein, each keyword correspondence one-component in keyword feature vector;Second be input into Amount generates subelement 4024 and is configured to generate the data classification model comprising null vector and above-mentioned keyword feature vector successively The second input vector;Output vector determination subelement 4025 is configured to for above-mentioned second input vector to be input to data classification Model, the output vector of the type for obtaining indicating above-mentioned data.
Below with reference to Fig. 5, the computer system 500 that is suitable to the server for realizing the embodiment of the present application is it illustrates Structural representation.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and Perform various appropriate actions and process.In RAM 503, the system that is also stored with 500 operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
I/O interfaces 505 are connected to lower component:Including the importation 506 of keyboard, mouse etc.;Penetrate including such as negative electrode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.; And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc., as needed on driver 510, in order to read from it Computer program be mounted into as needed storage part 508.
Especially, in accordance with an embodiment of the present disclosure, may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program, and it includes being tangibly embodied in machine readable Computer program on medium, the computer program includes the program code for the method shown in execution flow chart.At this In the embodiment of sample, the computer program can be downloaded and installed by communications portion 509 from network, and/or from removable Unload medium 511 to be mounted.When the computer program is performed by CPU (CPU) 501, in performing the present processes The above-mentioned functions of restriction.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey The architectural framework in the cards of sequence product, function and operation.At this point, each square frame in flow chart or block diagram can generation A part for table one module, program segment or code a, part for the module, program segment or code includes one or more For realizing the executable instruction of the logic function of regulation.It should also be noted that in some realizations as replacement, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, the two square frame reality for succeedingly representing On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that the combination of block diagram and/or each square frame in flow chart and block diagram and/or the square frame in flow chart, Ke Yiyong Perform the function of regulation or the special hardware based system of operation to realize, or can be referred to computer with specialized hardware The combination of order is realizing.
Being described in unit involved in the embodiment of the present application can be realized by way of software, it is also possible to by hard The mode of part is realizing.Described unit can also be arranged within a processor, for example, can be described as:A kind of processor bag Include acquiring unit, input block and memory cell.Wherein, the title of these units is not constituted under certain conditions to the unit The restriction of itself, for example, acquiring unit is also described as " obtaining the unit of the characteristic information of data to be stored ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating Machine storage medium can be the nonvolatile computer storage media described in above-described embodiment included in device;Can also be Individualism, without the nonvolatile computer storage media allocated into terminal.Above-mentioned nonvolatile computer storage media is deposited One or more program is contained, when one or more of programs are performed by an equipment so that the equipment:Obtain The characteristic information of data to be stored, features described above information include it is following at least one:In tables of data belonging to above-mentioned data The title of data table items, statistical nature information, the keyword of the statistical nature of the above-mentioned data of instruction;By the conversion of features described above information Input vector for data classification model is input to data classification model, the output vector of the type for obtaining indicating above-mentioned data, Above-mentioned data classification model is generated based on advancing with training sample with there is monitor mode to be trained, above-mentioned training sample bag Include:The features described above information of data storage, Jing mark above-mentioned data storage type;By above-mentioned data storage above-mentioned The corresponding storage region of type.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology of the particular combination of above-mentioned technical characteristic Scheme, while also should cover in the case of without departing from the inventive concept, is carried out by above-mentioned technical characteristic or its equivalent feature Other technical schemes for being combined and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein The technical scheme that the technical characteristic of energy is replaced mutually and formed.

Claims (10)

1. a kind of date storage method, it is characterised in that methods described includes:
Obtain the characteristic information of data to be stored, the characteristic information include it is following at least one:Number belonging to the data According to the title of the data table items in table, statistical nature information, the keyword of the statistical nature for indicating the data;
The input vector that the characteristic information is converted to data classification model is input into data classification model, instruction is obtained described The output vector of the type of data, the data classification model is based on and advances with training sample to have monitor mode to be trained And generate, the training sample includes:The characteristic information of data storage, Jing mark the data storage class Type;
Store the data in the corresponding storage region of the type.
2. method according to claim 1, it is characterised in that the data classification model is decision-tree model.
3. method according to claim 2, it is characterised in that the data are the data in tables of data, the feature letter Breath includes:The title of the data table items in tables of data belonging to the data, statistical nature information;And
The input vector that the characteristic information is converted to data classification model is input into data classification model, instruction is obtained described The output vector of the type of data includes:
The corresponding tables of data characteristic vector of characteristic information is generated, the tables of data characteristic vector includes:Represent belonging to the data Tables of data in data table items title component, represent statistical nature information component;
Generate the first input vector comprising the tables of data characteristic vector and the data classification model of null vector successively;
First input vector is input into data classification model, the output vector of the type for obtaining indicating the data.
4. method according to claim 3, it is characterised in that the statistical nature information includes:Indicate the tables of data It is the related information of incidence relation, the mean value of the length of the data, the maximum of the length of the data between, described The type of the character in the minimum of a value of the length of data, the data.
5. method according to claim 2, it is characterised in that the data are text data, the characteristic information is to close Keyword;And
The input vector that the characteristic information is converted to data classification model is input into data classification model, instruction is obtained described The output vector of the type of data includes:
The corresponding keyword feature vector of characteristic information is generated, wherein, each keyword correspondence one in keyword feature vector Individual component;
Generate the second input vector comprising null vector and the data classification model of keyword feature vector successively;
Second input vector is input into data classification model, the output vector of the type for obtaining indicating the data.
6. a kind of data storage device, it is characterised in that described device includes:
Acquiring unit, is configured to obtain the characteristic information of data to be stored, the characteristic information include it is following at least one: The title of the data table items in tables of data belonging to the data, indicate the data statistical nature statistical nature information, Keyword;
Input block, is configured to for the input vector that the characteristic information is converted to data classification model to be input to data classification Model, the output vector of the type for obtaining indicating the data, the data classification model based on advance with training sample with There is monitor mode to be trained and generate, the training sample includes:The institute that characteristic information of data storage, Jing are marked State the type of data storage;
Memory cell, is configured to store the data in the corresponding storage region of the type.
7. device according to claim 5, it is characterised in that the data classification model is decision-tree model.
8. device according to claim 7, it is characterised in that the data are the data in tables of data, the feature letter Breath includes:The title of the data table items in tables of data belonging to the data, statistical nature information, and the input block bag Include:
Tables of data characteristic vector generates subelement, is configured to generate the corresponding tables of data characteristic vector of characteristic information, the number Include according to table characteristic vector:Represent the component of the title of the data table items in the tables of data belonging to the data, represent that statistics is special The component of reference breath;
First input vector generates subelement, is configured to generate the number comprising the tables of data characteristic vector and null vector successively According to the first input vector of disaggregated model;
Output vector generates subelement, is configured to for first input vector to be input to data classification model, is indicated The output vector of the type of the data.
9. device according to claim 8, it is characterised in that the statistical nature information includes:Indicate the tables of data It is the related information of incidence relation, the mean value of the length of the data, the maximum of the length of the data between, described The type of the character in the minimum of a value of the length of data, the data.
10. device according to claim 7, it is characterised in that the data are text data, the characteristic information is to close Keyword, and the input block includes:
Keyword feature vector generates subelement, is configured to generate the corresponding keyword feature vector of characteristic information, wherein, close Each keyword correspondence one-component in keyword characteristic vector;
Second input vector generates subelement, is configured to generate the number comprising null vector and keyword feature vector successively According to the second input vector of disaggregated model;
Output vector generates subelement, is configured to for second input vector to be input to data classification model, is indicated The output vector of the type of the data.
CN201710066733.9A 2017-02-07 2017-02-07 Data storage method and device Expired - Fee Related CN106649890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710066733.9A CN106649890B (en) 2017-02-07 2017-02-07 Data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710066733.9A CN106649890B (en) 2017-02-07 2017-02-07 Data storage method and device

Publications (2)

Publication Number Publication Date
CN106649890A true CN106649890A (en) 2017-05-10
CN106649890B CN106649890B (en) 2020-07-14

Family

ID=58845975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710066733.9A Expired - Fee Related CN106649890B (en) 2017-02-07 2017-02-07 Data storage method and device

Country Status (1)

Country Link
CN (1) CN106649890B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578014A (en) * 2017-09-06 2018-01-12 上海寒武纪信息科技有限公司 Information processor and method
CN108427725A (en) * 2018-02-11 2018-08-21 华为技术有限公司 Data processing method, device and system
CN108563783A (en) * 2018-04-25 2018-09-21 张艳 A kind of financial analysis management system and method based on big data
CN108763952A (en) * 2018-05-03 2018-11-06 阿里巴巴集团控股有限公司 A kind of data classification method, device and electronic equipment
CN109144999A (en) * 2018-08-02 2019-01-04 东软集团股份有限公司 A kind of data positioning method, device and storage medium, program product
CN109271356A (en) * 2018-09-03 2019-01-25 中国平安人寿保险股份有限公司 Log file formats processing method, device, computer equipment and storage medium
WO2019024231A1 (en) * 2017-08-04 2019-02-07 平安科技(深圳)有限公司 Automatic data matching method, electronic device and computer-readable storage medium
CN109951509A (en) * 2017-12-21 2019-06-28 航天信息股份有限公司 A kind of cloud storage dispatching method, device, electronic equipment and storage medium
WO2019196210A1 (en) * 2018-04-10 2019-10-17 平安科技(深圳)有限公司 Data analysis method, computer readable storage medium, terminal device and apparatus
CN111611418A (en) * 2019-02-25 2020-09-01 阿里巴巴集团控股有限公司 Data storage method and data query method
CN111626057A (en) * 2020-07-28 2020-09-04 南京中孚信息技术有限公司 Official document judgment method and judgment system based on named entity
CN111881869A (en) * 2020-08-04 2020-11-03 浪潮云信息技术股份公司 Hierarchical storage method and system based on gesture data
CN112199694A (en) * 2020-09-30 2021-01-08 杭州云链趣链数字科技有限公司 Standardized bill processing method and device, electronic device and storage medium
CN112732601A (en) * 2018-08-28 2021-04-30 中科寒武纪科技股份有限公司 Data preprocessing method and device, computer equipment and storage medium
CN112988884A (en) * 2019-12-17 2021-06-18 中国移动通信集团陕西有限公司 Big data platform data storage method and device
CN113515680A (en) * 2021-04-20 2021-10-19 建信金融科技有限责任公司 Financial monitoring data processing method and device
CN116432238A (en) * 2023-06-05 2023-07-14 全中半导体(深圳)有限公司 Data storage method and device and storage chip

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866333A (en) * 2009-12-24 2010-10-20 金蝶软件(中国)有限公司 Worksheet self-defining method and adapter engine
CN102033964A (en) * 2011-01-13 2011-04-27 北京邮电大学 Text classification method based on block partition and position weight
CN102073704A (en) * 2010-12-24 2011-05-25 华为终端有限公司 Text classification processing method, system and equipment
US8903182B1 (en) * 2012-03-08 2014-12-02 Google Inc. Image classification
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data
CN106126502A (en) * 2016-07-07 2016-11-16 四川长虹电器股份有限公司 A kind of emotional semantic classification system and method based on support vector machine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101866333A (en) * 2009-12-24 2010-10-20 金蝶软件(中国)有限公司 Worksheet self-defining method and adapter engine
CN102073704A (en) * 2010-12-24 2011-05-25 华为终端有限公司 Text classification processing method, system and equipment
CN102033964A (en) * 2011-01-13 2011-04-27 北京邮电大学 Text classification method based on block partition and position weight
US8903182B1 (en) * 2012-03-08 2014-12-02 Google Inc. Image classification
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data
CN106126502A (en) * 2016-07-07 2016-11-16 四川长虹电器股份有限公司 A kind of emotional semantic classification system and method based on support vector machine

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019024231A1 (en) * 2017-08-04 2019-02-07 平安科技(深圳)有限公司 Automatic data matching method, electronic device and computer-readable storage medium
CN107578014A (en) * 2017-09-06 2018-01-12 上海寒武纪信息科技有限公司 Information processor and method
CN107578014B (en) * 2017-09-06 2020-11-03 上海寒武纪信息科技有限公司 Information processing apparatus and method
CN109951509A (en) * 2017-12-21 2019-06-28 航天信息股份有限公司 A kind of cloud storage dispatching method, device, electronic equipment and storage medium
CN108427725B (en) * 2018-02-11 2021-08-03 华为技术有限公司 Data processing method, device and system
WO2019153735A1 (en) * 2018-02-11 2019-08-15 华为技术有限公司 Data processing method, device and system
CN108427725A (en) * 2018-02-11 2018-08-21 华为技术有限公司 Data processing method, device and system
WO2019196210A1 (en) * 2018-04-10 2019-10-17 平安科技(深圳)有限公司 Data analysis method, computer readable storage medium, terminal device and apparatus
CN108563783A (en) * 2018-04-25 2018-09-21 张艳 A kind of financial analysis management system and method based on big data
CN108763952A (en) * 2018-05-03 2018-11-06 阿里巴巴集团控股有限公司 A kind of data classification method, device and electronic equipment
CN108763952B (en) * 2018-05-03 2022-04-05 创新先进技术有限公司 Data classification method and device and electronic equipment
CN109144999A (en) * 2018-08-02 2019-01-04 东软集团股份有限公司 A kind of data positioning method, device and storage medium, program product
CN109144999B (en) * 2018-08-02 2021-06-08 东软集团股份有限公司 Data positioning method, device, storage medium and program product
CN112732601A (en) * 2018-08-28 2021-04-30 中科寒武纪科技股份有限公司 Data preprocessing method and device, computer equipment and storage medium
CN109271356A (en) * 2018-09-03 2019-01-25 中国平安人寿保险股份有限公司 Log file formats processing method, device, computer equipment and storage medium
CN111611418A (en) * 2019-02-25 2020-09-01 阿里巴巴集团控股有限公司 Data storage method and data query method
CN112988884A (en) * 2019-12-17 2021-06-18 中国移动通信集团陕西有限公司 Big data platform data storage method and device
CN112988884B (en) * 2019-12-17 2024-04-12 中国移动通信集团陕西有限公司 Big data platform data storage method and device
CN111626057B (en) * 2020-07-28 2020-10-30 南京中孚信息技术有限公司 Official document judgment method and judgment system based on named entity
CN111626057A (en) * 2020-07-28 2020-09-04 南京中孚信息技术有限公司 Official document judgment method and judgment system based on named entity
CN111881869A (en) * 2020-08-04 2020-11-03 浪潮云信息技术股份公司 Hierarchical storage method and system based on gesture data
CN111881869B (en) * 2020-08-04 2023-04-18 浪潮云信息技术股份公司 Hierarchical storage method and system based on gesture data
CN112199694A (en) * 2020-09-30 2021-01-08 杭州云链趣链数字科技有限公司 Standardized bill processing method and device, electronic device and storage medium
CN113515680A (en) * 2021-04-20 2021-10-19 建信金融科技有限责任公司 Financial monitoring data processing method and device
CN116432238A (en) * 2023-06-05 2023-07-14 全中半导体(深圳)有限公司 Data storage method and device and storage chip
CN116432238B (en) * 2023-06-05 2023-09-08 全中半导体(深圳)有限公司 Data storage method and device and storage chip

Also Published As

Publication number Publication date
CN106649890B (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN106649890A (en) Data storage method and device
US11663254B2 (en) System and engine for seeded clustering of news events
Vysotska et al. Web Content Support Method in Electronic Business Systems.
CN107247786A (en) Method, device and server for determining similar users
WO2021208685A1 (en) Method and apparatus for executing automatic machine learning process, and device
CN115002200B (en) Message pushing method, device, equipment and storage medium based on user portrait
CN109785064A (en) A kind of mobile e-business recommended method and system based on Multi-source Information Fusion
Ahmed et al. Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry
CN112015562A (en) Resource allocation method and device based on transfer learning and electronic equipment
CN111582314A (en) Target user determination method and device and electronic equipment
Li et al. RETRACTED ARTICLE: Data mining optimization model for financial management information system based on improved genetic algorithm
CN107346344A (en) The method and apparatus of text matches
CN111191825A (en) User default prediction method and device and electronic equipment
CN113282623A (en) Data processing method and device
CN107341685A (en) Data analysing method and device
US20190228101A1 (en) Transaction categorization system
CN111429161A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
Srinivasan et al. An approach to enhance business intelligence and operations by sentimental analysis
CN111930944B (en) File label classification method and device
CN116402546A (en) Store risk attribution method and device, equipment, medium and product thereof
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN115563176A (en) Electronic commerce data processing system and method
Uskenbayeva et al. Creation of Data Classification System for Local Administration
US20220292393A1 (en) Utilizing machine learning models to generate initiative plans
CN114445043A (en) Open ecological cloud ERP-based heterogeneous graph user demand accurate discovery method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200714

Termination date: 20220207

CF01 Termination of patent right due to non-payment of annual fee