Summary of the invention
In view of this, the technical problem to be solved in the present invention is, a kind of power information search method and device based on the cloud storage is provided, and can realize the retrieval for power information, and improve retrieval rate.
For this reason, the embodiment of the invention adopts following technical scheme:
A kind of power information search method based on the cloud storage comprises:
Participle dictionary according to default electric power thesaurus carries out word segmentation processing to each document in the default electric power data storehouse, sets up index data base according to the word segmentation result that obtains, and described electric power data storehouse and index data base are stored in the cloud storage system;
Obtain user's retrieving information, according to the keyword of described retrieving information deterministic retrieval information, and the logical relation between the keyword;
From index data base, search the collection of document that obtains each keyword correspondence;
According to the logical relation between the keyword described collection of document is handled, obtained the document information of retrieving information correspondence;
Described document information is represented to the user.
Also comprise:
According to user's clicking operation, from the electric power data storehouse, obtain the document of the document information correspondence of user's click, described document is represented to the user.
Setting up index data base according to the word segmentation result that obtains comprises:
Determining the collection of document that each lexical item is pairing, comprise this lexical item according to word segmentation result, is main fields with the lexical item, sets up described index data base.
Also comprise: to the document information of retrieving information correspondence, according to document information the degree of association between corresponding document and the retrieving information sort; Accordingly,
The document information that represents to the user is: the document information after the ordering.
A kind of power information searching system based on the cloud storage comprises:
Set up the unit, be used for each document in default electric power data storehouse being carried out word segmentation processing, set up index data base according to the word segmentation result that obtains according to the participle dictionary of default electric power thesaurus;
Cloud storage system is used for the cloud storage is carried out in electric power data storehouse and index data base;
Determining unit is used to obtain user's retrieving information, according to the keyword of described retrieving information deterministic retrieval information, and the logical relation between the keyword;
Search the unit, be used for searching the collection of document that obtains each keyword correspondence from index data base;
Processing unit is used for according to the logical relation between the keyword described collection of document being handled, and obtains the document information of retrieving information correspondence;
First represents the unit, is used for described document information is represented to the user.
Also comprise: second represents the unit, is used for the clicking operation according to the user, obtains the document of the document information correspondence of user's click from the electric power data storehouse, and described document is represented to the user.
Setting up the unit comprises:
The participle subelement is used for according to the participle dictionary of default electric power thesaurus each document in default electric power data storehouse being carried out word segmentation processing;
Setting up subelement, be used for determining the collection of document that each lexical item is pairing, comprise this lexical item according to the word segmentation result that obtains, is main fields with the lexical item, sets up described index data base.
Also comprise:
Sequencing unit is used for the document information to the retrieving information correspondence, according to document information the degree of association between corresponding document and the retrieving information sort.
Technique effect for technique scheme is analyzed as follows:
Foundation is based on the electric power data storehouse of power information, participle dictionary according to default electric power thesaurus carries out word segmentation processing to each document in the default electric power data storehouse, set up index data base according to the word segmentation result that obtains, described index data base is stored in the cloud storage system, thereby subsequent user can be imported the retrieving information that is made of the lexical item in the electric power thesaurus, carry out the retrieval of power information document, thereby realized special retrieval at power information; And, electric power data storehouse and index data base are stored in the cloud storage system, thereby can improve retrieval rate for power information.
Embodiment
Below, be described with reference to the accompanying drawings the embodiment of the invention based on the method for carrying out the full-text search of electric power critical speech of cloud storage and the realization of device.
Fig. 1 is a kind of power information search method schematic flow sheet based on the cloud storage of the embodiment of the invention, and as shown in Figure 1, this method comprises:
The default database that is provided with based on power information, can comprise in the described database: from the highest ageing electric power multidate information to ageing lower various power information resources such as books handbook, concrete, can electric power dynamically, electric power newpapers and periodicals, electric power periodical, proceedings, technical standard, laws and regulations, scientific and technological report, investigation report, scientific and technological achievement, books handbook etc.; And, these power information resources can be carried out the division of classification, the concrete classification and the quantity of classification can independently be set and divide in actual applications, do not limit here.
Step 101: the participle dictionary according to default electric power thesaurus carries out word segmentation processing to each document in the default electric power data storehouse, sets up index data base according to the word segmentation result that obtains, and described index data base is stored in the cloud storage system;
Step 102: obtain user's retrieving information, according to the keyword of described retrieving information deterministic retrieval information, and the logical relation between the keyword;
Step 103: from index data base, search the collection of document that obtains each keyword correspondence;
Step 104: according to the logical relation between the keyword described collection of document is handled, obtained the document information of retrieving information correspondence;
Step 105: described document information is represented to the user.
In the embodiment of the invention search method shown in Figure 1, foundation is based on the electric power data storehouse of power information, participle dictionary according to default electric power thesaurus carries out word segmentation processing to each document in the default electric power data storehouse, set up index data base according to the word segmentation result that obtains, described index data base is stored in the cloud storage system, thereby subsequent user can be imported the retrieving information that is made of the lexical item in the electric power thesaurus, carry out the retrieval of power information document, thereby realized special retrieval at power information; And, electric power data storehouse and index data base are stored in the cloud storage system, thereby can improve retrieval rate for power information.
On the basis of Fig. 1, by Fig. 2 the power information search method of the embodiment of the invention based on the cloud storage is described in more detail, as shown in Figure 2, this method comprises:
Step 201: set in advance electric power data storehouse, comprise various documents in the described database based on the power information resource based on power information; Described electric power data stock is stored in the cloud storage system.
Wherein, described cloud storage system can use existing various cloud storage system to realize, does not give unnecessary details here.
Step 202: the participle dictionary according to default electric power thesaurus carries out word segmentation processing to each document in the electric power data storehouse, the word segmentation result that obtains is stored according to arranging structure, constitute index data base, described index data base is stored in the cloud storage system.
Comprise in the described index data base: lexical item and this comprise the collection of document of the document formation of this lexical item.Wherein, can only put down in writing the document information of document correspondence in the described collection of document, for example document identification (ID) and/or document title etc.
Wherein, according to the participle dictionary of default electric power thesaurus each document in the electric power data storehouse being carried out word segmentation processing comprises:
To treat that successively each entry in the character string and participle dictionary mates in the participle document,, then store the information of this entry and entry correspondence, proceed the coupling of this entry successive character string again, up to the document end if on the coupling.
The word segmentation result of carrying out obtaining after the word segmentation processing is just to arrange structure, and word segmentation result is a main fields with the document, has write down the lexical item that comprises in each document, that is to say: the corresponding lexical item set that is made of lexical item that the document comprises of each document.Arranging structure is main fields with the lexical item then, has write down all documents that comprise lexical item, that is to say: the corresponding collection of document that is made of all documents that comprise this lexical item of each lexical item.For example, just arranging structure is to comprise: document 1, lexical item set { t1, t2, t3}; Document 2, lexical item set { t1, t2, t4}; The structure of arranging of its correspondence then is: lexical item t1, collection of document 1,2}; Lexical item t2, collection of document 1,2}; Lexical item t3, collection of document { 1}; Lexical item t4, collection of document { 2}.
Step 203: obtain the retrieving information of user's input, according to the keyword of described retrieving information deterministic retrieval information, and the logical relation between the keyword.
Wherein, the retrieving information of user input can be entry or statement, at this moment, generally needs the keyword by the mode deterministic retrieval information of participle, and concrete segmenting method is not given unnecessary details here.At this moment, generally need with in the word segmentation result for example " " " " etc. the lexical item that retrieval not have to help is deleted, concrete, can set in advance the dictionary that to delete, by the lexical item in the lexical item behind the participle and this dictionary is compared, thereby obtain the keyword of final retrieving information correspondence.At this moment, the logical relation between the keyword can for: comprise all keywords in the document.
Perhaps, also can set in advance the how rule of retrieving information of user, for example, set need between the different search keys by logical connective " and " " or " " * " "? " or the like indicate logical relation between the keyword, at this moment, directly the keyword of the lexical item between adjacent two conjunctions as retrieving information got final product.At this moment, the logical relation between the keyword can be determined by the logical connective in the retrieving information.
Wherein, the relation of each conjunction representative can independently be set, and is general, and in actual applications, the implication of various conjunctions is as follows:
And: make up several keywords, make to comprise all keywords in the result for retrieval, but with the sequencing and the location independent of keyword.For example, inevitable education and the technology of both having comprised among the result of retrieval education and technology.
Or: make up several keywords, make to comprise a keyword in the result for retrieval at least, with the sequencing and the location independent of keyword.For example, must comprise among education or the technology one among the result of retrieval education or technology.
*: asterisk wildcard is used for a plurality of characters of wildcard, but can only be used for English and numeral.For example, use aero* can retrieve the document that all comprise aero beginning word (as aerospace, aerobes etc.).Be noted that native system do not support preceding truncation, that is to say that " * " can not place the expression formula beginning.
: asterisk wildcard is used for the single character of wildcard, but can only be used for English and numeral.Do you for example, use aero? all be can retrieve and aero beginning, the document of the word of totally 7 characters (as aerobic, aerobes etc.) comprised.
Other conjunctions repeat no more here, and adaptability is selected in actual applications.
Step 204: search from index data base according to described keyword and to obtain the pairing collection of document of each keyword.
Wherein, described keyword need mate with lexical item, and also promptly: the keyword of input also must be the lexical item that comprises in the participle dictionary of electric power thesaurus.
Step 205: according to the logical relation between the described keyword, the collection of document of described keyword correspondence is handled, obtained the document information of retrieving information correspondence;
When handling according to logical relation, for example, the logical relation between the keyword is and, then needs to calculate the common factor of the collection of document of two keyword correspondences; Perhaps, the logical relation between the keyword is or, then needs to calculate the intersection of the collection of document of two keyword correspondences; Or the like, do not give unnecessary details here.
Further, can also during retrieving information, specify the information such as classification of required search file, at this moment in input in advance by the user, also need between step 205 and the step 206 to comprise:, do not give unnecessary details here to the step that the document information that obtains further further screens by information such as classifications.
In addition, in actual applications, can also result for retrieval is analyzed, excavated, and the binding analysis result be optimized above-mentioned result for retrieval according to user's the behavior daily record and the participle dictionary of electric power thesaurus.For example, when user's input " power transformer " inquiry, can be by reading the related data in the participle dictionary, learn single transformer, subway transformer, transformer with split winding, dry-type transformer, converter power transformer, step-down transformer, grounding transformer, shell type transformer, connecting transformer, substation transformer, gas-insulated transformer, tractive transformer, three-winding transformer, core type transformer, oil-immersed type transformer, rectifier transformer, autotransformer, close electric power specialized vocabularies such as power transformers, analysis user behavior daily record storehouse then, provide Comprehensive analysis results, in conjunction with this analysis result above-mentioned result for retrieval is optimized, thereby make the user obtain behavior daily record based on self, result for retrieval after participle dictionary and retrieving information are optimized, thereby make the document that finally returns to the user more meet user's actual demand, concrete realization is not given unnecessary details here.
Step 206: with the degree of association between the retrieving information the corresponding document information of described retrieving information is sorted according to the document of described document information correspondence;
The specific implementation of this step can comprise:
Calculate the degree of association between each document and the retrieving information respectively;
Non-ascending order order according to the degree of association sorts to described document information.
Wherein, the degree of association between document and the retrieving information can be determined according to the number of times that occurs key word in the retrieving information in the document.
Step 207: the document information after the user shows ordering.
In addition, after having shown document information to the user, can also further start quadratic search by the user, further limit by the document information of other keywords the retrieving information correspondence of the input first time, to obtain desirable result for retrieval, the method of carrying out quadratic search is similar with aforesaid search method, and difference only is, only need retrieve the document of the document information correspondence that obtains in the result for retrieval to get final product.
Step 208: according to user's clicking operation, from the electric power data storehouse, obtain the document of the document information correspondence of user's indication, described document is represented to the user.
Wherein, can pass through the newly-built new page, perhaps allow the user download, perhaps other modes are carried out representing of document, do not limit here.
Corresponding with above-mentioned search method, the embodiment of the invention also provides a kind of power information searching system based on the cloud storage, and as shown in Figure 3, this system can comprise:
Set up unit 310, be used for each document in default electric power data storehouse being carried out word segmentation processing, set up index data base according to the word segmentation result that obtains according to the participle dictionary of default electric power thesaurus;
Cloud storage system 320 is used for the cloud storage is carried out in electric power data storehouse and index data base;
Determining unit 330 is used to obtain user's retrieving information, according to the keyword of described retrieving information deterministic retrieval information, and the logical relation between the keyword;
Search unit 340, be used for searching the collection of document that obtains each keyword correspondence from index data base;
Processing unit 350 is used for according to the logical relation between the keyword described collection of document being handled, and obtains the document information of retrieving information correspondence;
First represents unit 360, is used for described document information is represented to the user.
Preferably, as shown in Figure 3, this system can also comprise:
Second represents unit 370, is used for the clicking operation according to the user, obtains the document of the document information correspondence of user's click from the electric power data storehouse, and described document is represented to the user.
Preferably, setting up unit 310 can comprise:
The participle subelement is used for according to the participle dictionary of default electric power thesaurus each document in default electric power data storehouse being carried out word segmentation processing;
Setting up subelement, be used for determining the collection of document that each lexical item is pairing, comprise this lexical item according to the word segmentation result that obtains, is main fields with the lexical item, sets up described index data base.
Preferably, as shown in Figure 3, this system can also comprise:
Sequencing unit 380 is used for the document information to the retrieving information correspondence, according to document information the degree of association between corresponding document and the retrieving information sort.
In the searching system shown in Figure 3, foundation is based on the electric power data storehouse of power information, participle dictionary according to default electric power thesaurus carries out word segmentation processing to each document in the default electric power data storehouse, set up index data base according to the word segmentation result that obtains, described index data base is stored in the cloud storage system, thereby subsequent user can be imported the retrieving information that is made of the lexical item in the electric power thesaurus, carry out the retrieval of power information document, thereby realized special retrieval at power information; And, electric power data storehouse and index data base are stored in the cloud storage system, thereby can improve retrieval rate for power information.
One of ordinary skill in the art will appreciate that, the process of the method for realization the foregoing description can be finished by the relevant hardware of programmed instruction, described program can be stored in the read/write memory medium, and this program is carried out the corresponding step in the said method when carrying out.Described storage medium can be as ROM/RAM, magnetic disc, CD etc.
The above only is a preferred implementation of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.