Summary of the invention
The present invention provides reading, querying method, device and the readable storage medium storing program for executing of a kind of file data, has and quickly reads
The characteristics of file data.
The present invention provides a kind of read method of file data, comprising: obtains text information;It extracts in the text information
Required keyword;Judge in key-value pair data library with the presence or absence of the corresponding value information for extracting keyword;If it is determined that key-value pair
There is the corresponding value information for extracting keyword in database, then it is the keyword and respective value information preservation is literary to index
In part, the index file is loaded into memory.
In an embodiment, by the keyword and respective value information preservation into index file, comprising: will be described
Keyword and corresponding value information are added in data structure;Calculate the cryptographic Hash of the keyword;By the data structure according to
The cryptographic Hash is ranked up;The data structure storage is into index file after sorting.
In an embodiment, by data structure described after sequence storage into index file, comprising: use binary system
The data structure is stored into index file after format will sort.
In an embodiment, the index file includes at least header file information, level-one index and core data;Its
In, the header file information includes level-one index address;The level-one index includes the cryptographic Hash of the keyword, keyword rope
Draw and be worth index;The core data includes keyword data and value information data;The level-one index address is directed toward described one
Grade index, the keyword index are directed toward the keyword data, and described value index is directed toward described value information data.
In an embodiment, the level-one index further includes the keyword byte length and described value information byte
Length.
Another aspect of the present invention provides a kind of querying method of file data, comprising: obtains keyword to be checked;Described in calculating
The cryptographic Hash of keyword to be checked;Judge the cryptographic Hash that whether there is the keyword to be checked in index file;If it is determined that index text
There are the cryptographic Hash of the keyword to be checked in part, then obtain the value information of the corresponding keyword to be checked.
In an embodiment, before the value information for obtaining the corresponding keyword to be checked, the method also includes:
If it is determined that there are the cryptographic Hash of the keyword to be checked in index file, then further judge keyword to be checked whether with the rope
The keyword that cryptographic Hash is corresponded in quotation part is consistent;If it is determined that corresponding to the pass of cryptographic Hash in keyword to be checked and the index file
Keyword is consistent, then obtains the value information of the corresponding keyword to be checked.
In an embodiment, the Hash that whether there is the keyword to be checked in index file is judged by dichotomy
Value.
Another aspect of the present invention provides a kind of reading device of file data, and described device includes: acquisition module, for obtaining
Take text information;Extraction module, for extracting required keyword in the text information;Judgment module, for judging key-value pair
With the presence or absence of the corresponding value information for extracting keyword in database;Memory module, if for determining through the judgment module
There is the corresponding value information for extracting keyword in key-value pair data library, then arrives the keyword and respective value information preservation
In index file.
Another aspect of the present invention provides a kind of inquiry unit of file data, and described device includes: acquisition module, for obtaining
Take keyword to be checked;Computing module, for calculating the cryptographic Hash of the keyword to be checked;Judgment module, for judging index text
It whether there is the cryptographic Hash of the keyword to be checked in part;Enquiry module, if determining in index file for the judgment module
There are the cryptographic Hash of the keyword to be checked, then obtain the value information of the corresponding keyword to be checked.
Another aspect of the present invention provides a kind of computer readable storage medium, and being stored with computer in the storage medium can
It executes instruction, when executed for executing the read method of the file data.
Another aspect of the present invention provides a kind of computer readable storage medium, and being stored with computer in the storage medium can
It executes instruction, when executed for executing the querying method of the file data.
A kind of reading of file data, querying method described in the embodiment of the present invention, device and readable storage medium storing program for executing first will
Mass data is stored into index file, then is disposably loaded into memory by index file, is compared in traditional approach
For mass data is read memory line by line, this scheme can accelerate the speed read, so as to shorten data read time.
It is to be appreciated that the teachings of the present invention does not need to realize whole beneficial effects recited above, but it is specific
Technical solution may be implemented specific technical effect, and other embodiments of the invention can also be realized and not mentioned above
Beneficial effect.
Specific embodiment
To keep the purpose of the present invention, feature, advantage more obvious and understandable, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
It is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, those skilled in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of implementation process schematic diagram of the read method of file data of the embodiment of the present invention.
As shown in Figure 1, a kind of read method of file data, comprising:
Step 101: obtaining text information;
Step 102: extracting required keyword in text information;
Step 103: judging in key-value pair data library with the presence or absence of the corresponding value information for extracting keyword;
Step 104: if it is determined that there is the corresponding value information for extracting keyword in key-value pair data library, then by keyword and right
Value information is answered to be saved in index file;
Step 105: index file is loaded into memory.
In the embodiment of the present invention, first by step 101, text information is obtained, acquisition modes include but is not limited to following
It is several: text information being obtained by network, directly enters text information by keyboard, by OCR (optical character recognition technology)
It carries out the typing of text information, carry out the typing of text information by speech recognition technology.
Again by step 102, required keyword in text is extracted, and all required keywords extracted are formed into key
Dictionary.Wherein, required keyword is same category of information, such as required keyword can be all place names in document, people
Name, number etc..Extracting mode is including but not limited to following several: all character arranging combinations in exhaustive text information, will
All character arranging combinations containing all same category of dictionaries with being matched, if dictionary includes some character arranging group
It closes, then character arranging combination is required keyword;Or needed for being extracted in text by artificial intelligence natural language algorithm
Keyword.
Again by step 103, judge in key-value pair data library with the presence or absence of extracted keyword, wherein key-value pair
Database has the value information of same category of key word information and corresponding keyword, can be obtained by key word information corresponding
Value information.When judgement, specifically judge whether keyword in key-value pair data library includes required keyword in keywords database.
Again by step 104, if it is determined that the keyword in key-value pair data library includes the required keyword in keywords database,
Corresponding value information is then obtained according to required keyword, then required keyword and corresponding value information are saved in certain format
In index file.
Finally by step 105, the index file for having required keyword and corresponding value information is disposably loaded into interior
In depositing.
Through this scheme, first by mass data storage into index file, then in being disposably loaded by index file
It in depositing, is compared to for mass data is read memory line by line in traditional approach, this scheme can accelerate the speed of file reading
Degree, so as to shorten data read time.This scheme can be used for judging whether name in document, number are white name in this business
The application scenarios such as list or blacklist.
In an embodiment, by keyword and respective value information preservation into index file, comprising:
Keyword and corresponding value information are added in data structure;
Calculate the cryptographic Hash of keyword;
Data structure is ranked up according to cryptographic Hash;
Data structure storage is into index file after sorting.
In the embodiment of the present invention, keyword and corresponding value information are deposited into data structure, wherein number in the present embodiment
It is preferably array either HashMap according to structure.In data structure, corresponding value information can be obtained according to keyword.
Again in calculation data structure each keyword cryptographic Hash, and be ranked up according to cryptographic Hash, sortord can be with
It is ascending order, is also possible to descending, finally the data structure after sequence is stored into index file.
By this scheme, the required keyword extracted and corresponding value information are deposited into data structure first,
And it is ranked up according to the cryptographic Hash of each keyword.If desired corresponding value information is inquired from index file according to keyword
When, quick search can be carried out with dichotomy, improve search efficiency.
In an embodiment, by data structure storage after sequence into index file, comprising:
Data structure storage is into index file after being sorted using binary format.
In the embodiment of the present invention, when data structure is stored into index file, using binary format by data knot
Structure is stored into index file, plays the effect of encryption, guarantees the safety of file.
Fig. 2 is the structure chart of index file in a kind of read method of file data of the embodiment of the present invention;
Fig. 3 is the detailed structure view of index file in a kind of file data read method of the embodiment of the present invention.
As shown in Figures 2 and 3, in an embodiment, index file include at least header file information, level-one index and
Core data;Wherein, header file information includes level-one index address;Level-one index includes the cryptographic Hash of keyword, keyword rope
Draw and be worth index;Core data includes keyword data and value information data;Level-one index address is directed toward level-one index, keyword
Index is directed toward keyword data, and value index is directed toward value information data.
It include header file information, level-one index and core data in index file information in the embodiment of the present invention;Wherein,
Header file information includes at least level-one index address.Level-one index includes at least the keyword index of cryptographic Hash, corresponding cryptographic Hash
It is indexed with value.Core data includes at least keyword data and value information data.
Wherein, the level-one index address in header file information is directed toward level-one index, and the keyword index in level-one index refers to
Keyword into core data, level-one index in value information index be directed toward core data in value information data.
By this scheme, when inquiry, by obtaining corresponding keyword index according to cryptographic Hash and value information indexes, pass through
Keyword index obtains corresponding keyword, obtains corresponding value information by value information index.
In an embodiment, level-one index further includes keyword byte length and value information byte length.The present invention
In embodiment, keyword byte length is used to store the byte length of keyword, and value information byte length is for storing value information
Byte length.
Further, header file information further includes file type, keyword number, fileversion number, keyword starting rope
Draw address, the details for recording document.
Fig. 4 is a kind of structural schematic diagram of the reading device of file data of the embodiment of the present invention.
As shown in figure 4, the present invention also provides a kind of number of files based on a kind of read method of file data provided above
According to reading device, device includes:
Module 401 is obtained, for obtaining text information;
Extraction module 402, for extracting required keyword in text information;
Judgment module 403, for judging in key-value pair data library with the presence or absence of the corresponding value information for extracting keyword;
Memory module 404, if for determining there is the corresponding value for extracting keyword in key-value pair data library through judgment module
Information, then by keyword and respective value information preservation into index file;
Loading module 405, for the index file to be loaded into memory.
In the embodiment of the present invention, first by obtain module 401 obtain text information, acquisition modes include but is not limited to
Under it is several: text information is obtained by network, directly enters by keyboard text information, by OCR (optical character identification skill
Art) it carries out the typing of text information, carry out the typing of text information by speech recognition technology.
Again by extraction module 402, required keyword in text is extracted, then all required keywords extracted are formed
Keywords database.Wherein, required keyword is same category of information, for example, needed for keyword can in document allly
Name, name, number etc..Extracting mode is including but not limited to following several: all character arranging groups in exhaustive text information
It closes, by the combination of all character arrangings with being matched containing all same category of dictionaries, if dictionary is arranged comprising some text
Column combination, then character arranging combination is required keyword;Or it is extracted in text by artificial intelligence natural language algorithm
Required keyword.
Again by judgment module 403, judge in key-value pair data library with the presence or absence of extracted keyword, wherein key
Value has the value information of same category of all key word informations and corresponding keyword to data inventory, can be obtained by key word information
Obtain corresponding value information.When judgement, specifically judge whether the keyword in key-value pair data library includes required in keywords database
Keyword.
If determining the required keyword in key-value pair data library in keywords database by judgment module 403, then pass through storage
The required keyword and corresponding value information are deposited into index file by module 404.
Further, during keyword and corresponding value information being deposited into index file, first by keyword and correspondence
Value information is deposited into data structure, wherein data structure is preferably array either HashMap in the present embodiment.In data
In structure, corresponding value information can be obtained according to keyword.
Again in calculation data structure each keyword cryptographic Hash, and be ranked up according to cryptographic Hash, sortord can be with
It is ascending order, is also possible to descending, finally the data structure after sequence is stored into the index file of certain format.
Index file is disposably loaded into memory using binary format finally by loading module 405.
Through this scheme, first by mass data storage into index file, then in being disposably loaded by index file
It in depositing, is compared to for mass data is read memory line by line in traditional approach, this scheme is not losing file security
In the case of, the speed of file reading can be accelerated, so as to shorten data read time.
Fig. 5 is a kind of implementation process schematic diagram of the querying method of file data of the embodiment of the present invention.As shown in figure 5, base
In a kind of read method of file data mentioned above, the present invention also provides a kind of querying methods of file data, comprising:
Step 501, keyword to be checked is obtained;
Step 502, the cryptographic Hash of keyword to be checked is calculated;
Step 503, judge the cryptographic Hash that whether there is keyword to be checked in index file;
Step 504, if it is determined that there are the cryptographic Hash of keyword to be checked in index file, then corresponding keyword to be checked is obtained
Value information.
In the embodiment of the present invention, when inquiry, first by step 501, keyword to be checked is obtained, then pass through step 502, meter
The cryptographic Hash of keyword to be checked is calculated, the cryptographic Hash that whether there is keyword to be checked in index file is then judged by step 503,
Finally by step 504, if it is determined that there are the cryptographic Hash of keyword to be checked in index file, then inquired pair according to the cryptographic Hash
Should cryptographic Hash value information index, then index to obtain value information by value information, can be quickly according to keyword by this scheme
Inquire corresponding value information.
In an embodiment, before the value information for obtaining corresponding keyword to be checked, this method further include: if it is determined that
There are the cryptographic Hash of keyword to be checked in index file, then further judge whether keyword to be checked is corresponding with index file and breathe out
The keyword of uncommon value is consistent;If it is determined that the keyword that cryptographic Hash is corresponded in keyword to be checked and index file is consistent, then acquisition pair
Answer the value information of keyword to be checked.
In the embodiment of the present invention, before the value information for obtaining corresponding keyword to be checked, first judge corresponding in index file
Whether the keyword of the cryptographic Hash is consistent with key word information to be checked, if it is determined that corresponding to the keyword of the cryptographic Hash in index file
Consistent with key word information to be checked, then the value information for inquiring the corresponding cryptographic Hash further according to the cryptographic Hash indexes, and is worth
The value information that information index is directed toward, this scheme improve the accuracy of inquiry.
In an embodiment, the cryptographic Hash that whether there is keyword to be checked in index file is judged by dichotomy.
In the embodiment of the present invention, when whether there is the cryptographic Hash of keyword to be checked in judging index file, carried out by dichotomy fast
Quick checking is ask, high-efficient.
By this scheme, in inquiry, corresponding value letter can be quickly obtained from index file according to keyword query
Breath, it is high-efficient.
Fig. 6 is a kind of structural schematic diagram of the inquiry unit of file data of the embodiment of the present invention.
As shown in fig. 6, the present invention also provides a kind of files based on a kind of querying method of file data mentioned above
The inquiry unit of data, device include:
Module 601 is obtained, for obtaining keyword to be checked;
Computing module 602, for calculating the cryptographic Hash of keyword to be checked;
Judgment module 603, the cryptographic Hash for judging to whether there is keyword to be checked in index file;
Enquiry module 604, if determining to obtain in index file there are the cryptographic Hash of keyword to be checked for judgment module
The value information of corresponding keyword to be checked.
In the embodiment of the present invention, keyword to be checked is obtained by obtaining module 601 first, then count by computing module 602
The cryptographic Hash of keyword to be checked is calculated, then by judgment module 603, and taken in dichotomy search index file with the presence or absence of to be checked
The cryptographic Hash of keyword, if determining that there are the cryptographic Hash of keyword to be checked in index file by judgment module 603, by looking into
Ask the value information that module 604 obtains corresponding keyword to be checked from index file.By this scheme, can quickly be looked into according to keyword
Ask corresponding value information.
A kind of read method of file data based on above-mentioned offer, the present invention also provides a kind of computer-readable storage mediums
Matter is stored with computer executable instructions in storage medium, and executable instruction is used for: acquisition text information first extracts text
In required keyword, then judge in key-value pair data library with the presence or absence of extracted keyword, if it is determined that key-value pair data library
In keyword there is extracted keyword, then corresponding value information is obtained according to keyword, then by keyword and correspondence
Value information is added in data structure;It the cryptographic Hash of all keywords and is arranged according to cryptographic Hash size in calculation data structure
Sequence finally will index text in then index file that the data structure after sequence is stored to certain format in a binary format
Part is disposably loaded into memory.By this scheme, first by mass data storage into index file, then pass through index file one
Secondary property is loaded into memory, and is compared to for mass data is read memory line by line in traditional approach, this scheme is not being lost
In the case where file security, the speed of file reading can be accelerated, so as to shorten data read time.
A kind of querying method of file data based on above-mentioned offer, the present invention also provides a kind of computer-readable storage mediums
Matter is stored with computer executable instructions in storage medium, and instruction, which is performed, to be used for: being obtained keyword to be checked first, then is counted
The cryptographic Hash for calculating keyword to be checked judges the cryptographic Hash that whether there is keyword to be checked in index file by dichotomy, if sentencing
Determine in index file that there are the cryptographic Hash of keyword to be checked, then further judges whether keyword to be checked is corresponding with index file
The keyword of cryptographic Hash is consistent;If it is determined that the keyword for corresponding to cryptographic Hash in keyword to be checked and index file is consistent, then obtain
The value information of corresponding keyword to be checked.It can be quickly according to keyword query from index file in inquiry by this scheme
Corresponding value information is obtained, it is high-efficient.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described
It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this
The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples
Sign is combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden
It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise
Clear specific restriction.
More than, only a specific embodiment of the invention, but scope of protection of the present invention is not limited thereto, and it is any to be familiar with
Those skilled in the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all cover
Within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.