CN108959207A

CN108959207A - Data information storage method and system based on similarity

Info

Publication number: CN108959207A
Application number: CN201810709543.9A
Authority: CN
Inventors: 孙英辉; 姚天
Original assignee: Wuhu Wisdom Big Data Operation Co Ltd
Current assignee: Wuhu Wisdom Big Data Operation Co Ltd
Priority date: 2018-07-02
Filing date: 2018-07-02
Publication date: 2018-12-07

Abstract

Disclose a kind of data information storage method and system based on similarity.This method may include: to obtain corresponding summary character string according to information to be stored, and extract multiple keywords of summary character string；Multiple keywords are retrieved, multiple known strings are obtained；It is calculated respectively with each known strings based on summary character string, obtains the corresponding coefficient of similarity of known strings；Similarity threshold is set, the known strings that coefficient of similarity is less than similarity threshold are deleted, obtains known character set of strings；In known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison；The corresponding fields of character string will be compared as the fields of information to be stored.The present invention simultaneously calculates similitude by comparison summary character string and known strings, sort key word, and the information of storage is classified, and promotes the efficiency and precision for storing and searching.

Description

Data information storage method and system based on similarity

Technical field

The present invention relates to information technology fields, more particularly, to a kind of data information storage method based on similarity And system.

Background technique

Big data (big data), refer to can not be captured within certain time with conventional software tool, manage and The data acquisition system of processing is to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability Magnanimity, high growth rate and diversified information assets, have the characteristics that 5 is big: a large amount of, high speed, multiplicity, value, authenticity.But It is that current big data inquiry is mostly manpower manual, and efficiency is lower.Therefore, it is necessary to develop a kind of data based on similarity Information storing method and system.

The information for being disclosed in background of invention part is merely intended to deepen the reason to general background technique of the invention Solution, and it is known to those skilled in the art existing to be not construed as recognizing or imply that the information is constituted in any form Technology.

Summary of the invention

The invention proposes a kind of data information storage method and system based on similarity can pass through comparison summary Character string and known strings, sort key word simultaneously calculate similitude, and the information of storage is classified, and promote the effect of storage with lookup Rate and precision.

According to an aspect of the invention, it is proposed that a kind of data information storage method based on similarity.The method can To include: to obtain corresponding summary character string according to information to be stored, and extract multiple keywords of the summary character string； The multiple keyword is retrieved, multiple known strings are obtained；Based on the summary character string respectively with it is known described in each Character string is calculated, and the corresponding coefficient of similarity of the known strings is obtained, and similarity threshold is arranged, and is deleted described similar The known strings that coefficient is less than the similarity threshold are spent, known character set of strings is obtained；In the known character set of strings In, by the maximum known strings of coefficient of similarity character string as a comparison；By the corresponding fields of the comparison character string Fields as the information to be stored.

Preferably, each described known strings includes at least one described keyword.

Preferably, further includes: the multiple keyword root is ranked up according to significance level, and each keyword is assigned Give the emphasis factor.

Preferably, the coefficient of similarity are as follows:

F_j=∑ A_iw_i (1)

Wherein, F_jIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings Number, w_iIndicate known strings keyword identical with summary character string, A_iIndicate the corresponding emphasis factor of the keyword, i takes [1, N], N indicate the number of keyword.

According to another aspect of the invention, it is proposed that a kind of data information stocking system based on similarity, stores thereon There is computer program, wherein performed the steps of when described program is executed by processor according to information to be stored, corresponded to Summary character string, and extract multiple keywords of the summary character string；The multiple keyword is retrieved, is obtained multiple known Character string；It is calculated respectively with known strings described in each based on the summary character string, obtains the known character Go here and there corresponding coefficient of similarity；Similarity threshold is set, it is known less than the similarity threshold to delete the coefficient of similarity Character string obtains known character set of strings；In the known character set of strings, by the maximum known strings of coefficient of similarity Character string as a comparison；Using the corresponding fields of the comparison character string as the fields of the information to be stored.

Preferably, the coefficient of similarity are as follows:

F_j=∑ A_iw_i (1)

Methods and apparatus of the present invention has other characteristics and advantages, these characteristics and advantages are attached from what is be incorporated herein It will be apparent in figure and subsequent specific embodiment, or will be in the attached drawing being incorporated herein and subsequent specific reality It applies in mode and is stated in detail, the drawings and the detailed description together serve to explain specific principles of the invention.

Detailed description of the invention

Exemplary embodiment of the invention is described in more detail in conjunction with the accompanying drawings, it is of the invention above-mentioned and its Its purpose, feature and advantage will be apparent, wherein in exemplary embodiment of the invention, identical reference label Typically represent same parts.

Fig. 1 shows the flow chart of the step of data information storage method according to the present invention based on similarity.

Specific embodiment

The present invention will be described in more detail below with reference to accompanying drawings.Although showing preferred implementation side of the invention in attached drawing Formula, however, it is to be appreciated that may be realized in various forms the present invention without that should be limited by the embodiments set forth herein.Phase Instead, these embodiments are provided so that the present invention is more thorough and complete, and can be by the scope of the present invention completely It is communicated to those skilled in the art.

In this embodiment, the data information storage method according to the present invention based on similarity may include: step 101, according to information to be stored, corresponding summary character string is obtained, and extract multiple keywords of summary character string；Step 102, Multiple keywords are retrieved, multiple known strings are obtained；Step 103, based on summary character string respectively with each known character String is calculated, and the corresponding coefficient of similarity of known strings is obtained；Step 104, similarity threshold is set, similarity system is deleted Number is less than the known strings of similarity threshold, obtains known character set of strings；It step 105, will in known character set of strings The maximum known strings of coefficient of similarity character string as a comparison；Step 106, the corresponding fields of character string will be compared to make For the fields of information to be stored.

In one example, each known strings includes at least one keyword.

In one example, further includes: be ranked up multiple keyword roots according to significance level, and to each keyword Assign the emphasis factor.

In one example, coefficient of similarity are as follows:

F_j=∑ A_iw_i (1)

Specifically, the data information storage method according to the present invention based on similarity may include: according to letter to be stored Breath, obtains corresponding summary character string, by analysis, multiple keywords of summary character string is extracted, by multiple keyword root evidences Significance level is ranked up, and assigns the emphasis factor to each keyword, is based on multiple keywords, by retrieval, is obtained more A known strings, wherein each known strings includes at least one keyword, by known strings and summary character string Identical keyword and its corresponding emphasis factor substitute into formula (1), and it is corresponding similar that each known strings is sought in calculating Coefficient is spent, similarity threshold is set, deletes the known strings that coefficient of similarity is less than similarity threshold, workload is reduced, obtains Obtain known character set of strings；In known character set of strings, by the maximum known strings of coefficient of similarity character as a comparison String；The corresponding fields of character string will be compared as the fields of information to be stored.

This method is by comparison summary character string and known strings, and sort key word simultaneously calculates similitude, by storage Information classification, promotes the efficiency and precision of storage with lookup.

Using example

A concrete application example is given below in the scheme and its effect of embodiment of the present invention for ease of understanding.Ability Field technique personnel should be understood that the example only for the purposes of understanding that the present invention, any detail are not intended in any way The limitation present invention.

Data information storage method according to the present invention based on similarity includes:

According to information to be stored, acquisition summary character string is that Huawei P20 (aurora color, 6GB, 128GB) is mentioned by analysis 5 keywords of summary character string are taken, and 5 keyword roots are ranked up according to significance level as Huawei, P20,128GB, pole Photochromic, 6GB, and the emphasis factor: Huawei 0.3, P20 0.25,128GB 0.25, aurora color is assigned to each keyword For 0.1,6GB 0.1, be based on 5 keywords, by retrieval, obtain 3 known strings be Huawei P20 black 6GB64GB, Huawei Mate10 and P20Pro substitute into known strings keyword identical with summary character string and its corresponding emphasis factor Formula (1), it is 0.65 that the corresponding coefficient of similarity of Huawei P20 black 6GB 64GB is sought in calculating, the corresponding phase of Huawei Mate10 Be the corresponding coefficient of similarity of 0.3, P20Pro be 0.25 like degree coefficient, setting similarity threshold is 0.3, deletion coefficient of similarity Less than the known strings of similarity threshold, known character set of strings is obtained, in known character set of strings, by coefficient of similarity Maximum known strings Huawei P20 black 6GB 64GB character string as a comparison, will compare the corresponding fields of character string Fields as information to be stored.

In conclusion the present invention simultaneously calculates similitude by comparison summary character string and known strings, sort key word, The information of storage is classified, the efficiency and precision of storage with lookup is promoted.

It will be understood by those skilled in the art that above to the purpose of the description of embodiments of the present invention only for illustratively The beneficial effect for illustrating embodiments of the present invention is not intended to for embodiments of the present invention to be limited to given any show Example.

Embodiment according to the present invention provides a kind of data information stocking system based on similarity, stores thereon There is computer program, wherein performed the steps of when described program is executed by processor according to information to be stored, corresponded to Summary character string, and extract multiple keywords of the summary character string；The multiple keyword is retrieved, is obtained multiple known Character string；It is calculated respectively with known strings described in each based on the summary character string, obtains the known character Go here and there corresponding coefficient of similarity；Similarity threshold is set, it is known less than the similarity threshold to delete the coefficient of similarity Character string obtains known character set of strings；In the known character set of strings, by the maximum known strings of coefficient of similarity Character string as a comparison；Using the corresponding fields of the comparison character string as the fields of the information to be stored.

In one example, each known strings includes at least one keyword.

In one example, coefficient of similarity are as follows:

F_j=∑ A_iw_i (1)

The present invention is by comparison summary character string and known strings, and sort key word simultaneously calculates similitude, by storage Information classification, promotes the efficiency and precision of storage with lookup.

The embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is also not necessarily limited to disclosed each embodiment.It is right without departing from the scope and spirit of illustrated each embodiment Many modifications and changes are obvious for those skilled in the art.

Claims

1. a kind of data information storage method based on similarity, comprising:

According to information to be stored, corresponding summary character string is obtained, and extracts multiple keywords of the summary character string；

The multiple keyword is retrieved, multiple known strings are obtained；

It is calculated respectively with known strings described in each based on the summary character string, obtains the known strings pair The coefficient of similarity answered；

Similarity threshold is set, the known strings that the coefficient of similarity is less than the similarity threshold are deleted, known to acquisition String assemble；

In the known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison；

Using the corresponding fields of the comparison character string as the fields of the information to be stored.

2. the data information storage method according to claim 1 based on similarity, wherein each described known character String includes at least one described keyword.

3. the data information storage method according to claim 1 based on similarity, wherein further include: it will be the multiple Keyword root is ranked up according to significance level, and assigns the emphasis factor to each keyword.

4. the data information storage method according to claim 3 based on similarity, wherein the coefficient of similarity are as follows:

F_j=∑ A_iw_i (1)

Wherein, F_jIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates the number of known strings, w_iTable Show known strings keyword identical with summary character string, A_iIndicating the corresponding emphasis factor of the keyword, i takes [1, N], The number of N expression keyword.

5. a kind of data information stocking system based on similarity, is stored thereon with computer program, wherein described program is located Reason device performs the steps of when executing

The multiple keyword is retrieved, multiple known strings are obtained；

It is calculated respectively with known strings described in each based on the summary character string, obtains the known strings pair The coefficient of similarity answered,

6. the data information stocking system according to claim 5 based on similarity, wherein each described known character String includes at least one described keyword.

7. the data information stocking system according to claim 5 based on similarity, wherein further include: it will be the multiple Keyword root is ranked up according to significance level, and assigns the emphasis factor to each keyword.

8. the data information stocking system according to claim 7 based on similarity, wherein the coefficient of similarity are as follows:

F_j=∑ A_iw_i (1)