CN108959207A - Data information storage method and system based on similarity - Google Patents
Data information storage method and system based on similarity Download PDFInfo
- Publication number
- CN108959207A CN108959207A CN201810709543.9A CN201810709543A CN108959207A CN 108959207 A CN108959207 A CN 108959207A CN 201810709543 A CN201810709543 A CN 201810709543A CN 108959207 A CN108959207 A CN 108959207A
- Authority
- CN
- China
- Prior art keywords
- similarity
- character string
- strings
- keyword
- coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclose a kind of data information storage method and system based on similarity.This method may include: to obtain corresponding summary character string according to information to be stored, and extract multiple keywords of summary character string;Multiple keywords are retrieved, multiple known strings are obtained;It is calculated respectively with each known strings based on summary character string, obtains the corresponding coefficient of similarity of known strings;Similarity threshold is set, the known strings that coefficient of similarity is less than similarity threshold are deleted, obtains known character set of strings;In known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison;The corresponding fields of character string will be compared as the fields of information to be stored.The present invention simultaneously calculates similitude by comparison summary character string and known strings, sort key word, and the information of storage is classified, and promotes the efficiency and precision for storing and searching.
Description
Technical field
The present invention relates to information technology fields, more particularly, to a kind of data information storage method based on similarity
And system.
Background technique
Big data (big data), refer to can not be captured within certain time with conventional software tool, manage and
The data acquisition system of processing is to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability
Magnanimity, high growth rate and diversified information assets, have the characteristics that 5 is big: a large amount of, high speed, multiplicity, value, authenticity.But
It is that current big data inquiry is mostly manpower manual, and efficiency is lower.Therefore, it is necessary to develop a kind of data based on similarity
Information storing method and system.
The information for being disclosed in background of invention part is merely intended to deepen the reason to general background technique of the invention
Solution, and it is known to those skilled in the art existing to be not construed as recognizing or imply that the information is constituted in any form
Technology.
Summary of the invention
The invention proposes a kind of data information storage method and system based on similarity can pass through comparison summary
Character string and known strings, sort key word simultaneously calculate similitude, and the information of storage is classified, and promote the effect of storage with lookup
Rate and precision.
According to an aspect of the invention, it is proposed that a kind of data information storage method based on similarity.The method can
To include: to obtain corresponding summary character string according to information to be stored, and extract multiple keywords of the summary character string;
The multiple keyword is retrieved, multiple known strings are obtained;Based on the summary character string respectively with it is known described in each
Character string is calculated, and the corresponding coefficient of similarity of the known strings is obtained, and similarity threshold is arranged, and is deleted described similar
The known strings that coefficient is less than the similarity threshold are spent, known character set of strings is obtained;In the known character set of strings
In, by the maximum known strings of coefficient of similarity character string as a comparison;By the corresponding fields of the comparison character string
Fields as the information to be stored.
Preferably, each described known strings includes at least one described keyword.
Preferably, further includes: the multiple keyword root is ranked up according to significance level, and each keyword is assigned
Give the emphasis factor.
Preferably, the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings
Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes
[1, N], N indicate the number of keyword.
According to another aspect of the invention, it is proposed that a kind of data information stocking system based on similarity, stores thereon
There is computer program, wherein performed the steps of when described program is executed by processor according to information to be stored, corresponded to
Summary character string, and extract multiple keywords of the summary character string;The multiple keyword is retrieved, is obtained multiple known
Character string;It is calculated respectively with known strings described in each based on the summary character string, obtains the known character
Go here and there corresponding coefficient of similarity;Similarity threshold is set, it is known less than the similarity threshold to delete the coefficient of similarity
Character string obtains known character set of strings;In the known character set of strings, by the maximum known strings of coefficient of similarity
Character string as a comparison;Using the corresponding fields of the comparison character string as the fields of the information to be stored.
Preferably, each described known strings includes at least one described keyword.
Preferably, further includes: the multiple keyword root is ranked up according to significance level, and each keyword is assigned
Give the emphasis factor.
Preferably, the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings
Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes
[1, N], N indicate the number of keyword.
Methods and apparatus of the present invention has other characteristics and advantages, these characteristics and advantages are attached from what is be incorporated herein
It will be apparent in figure and subsequent specific embodiment, or will be in the attached drawing being incorporated herein and subsequent specific reality
It applies in mode and is stated in detail, the drawings and the detailed description together serve to explain specific principles of the invention.
Detailed description of the invention
Exemplary embodiment of the invention is described in more detail in conjunction with the accompanying drawings, it is of the invention above-mentioned and its
Its purpose, feature and advantage will be apparent, wherein in exemplary embodiment of the invention, identical reference label
Typically represent same parts.
Fig. 1 shows the flow chart of the step of data information storage method according to the present invention based on similarity.
Specific embodiment
The present invention will be described in more detail below with reference to accompanying drawings.Although showing preferred implementation side of the invention in attached drawing
Formula, however, it is to be appreciated that may be realized in various forms the present invention without that should be limited by the embodiments set forth herein.Phase
Instead, these embodiments are provided so that the present invention is more thorough and complete, and can be by the scope of the present invention completely
It is communicated to those skilled in the art.
Fig. 1 shows the flow chart of the step of data information storage method according to the present invention based on similarity.
In this embodiment, the data information storage method according to the present invention based on similarity may include: step
101, according to information to be stored, corresponding summary character string is obtained, and extract multiple keywords of summary character string;Step 102,
Multiple keywords are retrieved, multiple known strings are obtained;Step 103, based on summary character string respectively with each known character
String is calculated, and the corresponding coefficient of similarity of known strings is obtained;Step 104, similarity threshold is set, similarity system is deleted
Number is less than the known strings of similarity threshold, obtains known character set of strings;It step 105, will in known character set of strings
The maximum known strings of coefficient of similarity character string as a comparison;Step 106, the corresponding fields of character string will be compared to make
For the fields of information to be stored.
In one example, each known strings includes at least one keyword.
In one example, further includes: be ranked up multiple keyword roots according to significance level, and to each keyword
Assign the emphasis factor.
In one example, coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings
Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes
[1, N], N indicate the number of keyword.
Specifically, the data information storage method according to the present invention based on similarity may include: according to letter to be stored
Breath, obtains corresponding summary character string, by analysis, multiple keywords of summary character string is extracted, by multiple keyword root evidences
Significance level is ranked up, and assigns the emphasis factor to each keyword, is based on multiple keywords, by retrieval, is obtained more
A known strings, wherein each known strings includes at least one keyword, by known strings and summary character string
Identical keyword and its corresponding emphasis factor substitute into formula (1), and it is corresponding similar that each known strings is sought in calculating
Coefficient is spent, similarity threshold is set, deletes the known strings that coefficient of similarity is less than similarity threshold, workload is reduced, obtains
Obtain known character set of strings;In known character set of strings, by the maximum known strings of coefficient of similarity character as a comparison
String;The corresponding fields of character string will be compared as the fields of information to be stored.
This method is by comparison summary character string and known strings, and sort key word simultaneously calculates similitude, by storage
Information classification, promotes the efficiency and precision of storage with lookup.
Using example
A concrete application example is given below in the scheme and its effect of embodiment of the present invention for ease of understanding.Ability
Field technique personnel should be understood that the example only for the purposes of understanding that the present invention, any detail are not intended in any way
The limitation present invention.
Data information storage method according to the present invention based on similarity includes:
According to information to be stored, acquisition summary character string is that Huawei P20 (aurora color, 6GB, 128GB) is mentioned by analysis
5 keywords of summary character string are taken, and 5 keyword roots are ranked up according to significance level as Huawei, P20,128GB, pole
Photochromic, 6GB, and the emphasis factor: Huawei 0.3, P20 0.25,128GB 0.25, aurora color is assigned to each keyword
For 0.1,6GB 0.1, be based on 5 keywords, by retrieval, obtain 3 known strings be Huawei P20 black 6GB64GB,
Huawei Mate10 and P20Pro substitute into known strings keyword identical with summary character string and its corresponding emphasis factor
Formula (1), it is 0.65 that the corresponding coefficient of similarity of Huawei P20 black 6GB 64GB is sought in calculating, the corresponding phase of Huawei Mate10
Be the corresponding coefficient of similarity of 0.3, P20Pro be 0.25 like degree coefficient, setting similarity threshold is 0.3, deletion coefficient of similarity
Less than the known strings of similarity threshold, known character set of strings is obtained, in known character set of strings, by coefficient of similarity
Maximum known strings Huawei P20 black 6GB 64GB character string as a comparison, will compare the corresponding fields of character string
Fields as information to be stored.
In conclusion the present invention simultaneously calculates similitude by comparison summary character string and known strings, sort key word,
The information of storage is classified, the efficiency and precision of storage with lookup is promoted.
It will be understood by those skilled in the art that above to the purpose of the description of embodiments of the present invention only for illustratively
The beneficial effect for illustrating embodiments of the present invention is not intended to for embodiments of the present invention to be limited to given any show
Example.
Embodiment according to the present invention provides a kind of data information stocking system based on similarity, stores thereon
There is computer program, wherein performed the steps of when described program is executed by processor according to information to be stored, corresponded to
Summary character string, and extract multiple keywords of the summary character string;The multiple keyword is retrieved, is obtained multiple known
Character string;It is calculated respectively with known strings described in each based on the summary character string, obtains the known character
Go here and there corresponding coefficient of similarity;Similarity threshold is set, it is known less than the similarity threshold to delete the coefficient of similarity
Character string obtains known character set of strings;In the known character set of strings, by the maximum known strings of coefficient of similarity
Character string as a comparison;Using the corresponding fields of the comparison character string as the fields of the information to be stored.
In one example, each known strings includes at least one keyword.
In one example, further includes: be ranked up multiple keyword roots according to significance level, and to each keyword
Assign the emphasis factor.
In one example, coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings
Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes
[1, N], N indicate the number of keyword.
The present invention is by comparison summary character string and known strings, and sort key word simultaneously calculates similitude, by storage
Information classification, promotes the efficiency and precision of storage with lookup.
It will be understood by those skilled in the art that above to the purpose of the description of embodiments of the present invention only for illustratively
The beneficial effect for illustrating embodiments of the present invention is not intended to for embodiments of the present invention to be limited to given any show
Example.
The embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is also not necessarily limited to disclosed each embodiment.It is right without departing from the scope and spirit of illustrated each embodiment
Many modifications and changes are obvious for those skilled in the art.
Claims (8)
1. a kind of data information storage method based on similarity, comprising:
According to information to be stored, corresponding summary character string is obtained, and extracts multiple keywords of the summary character string;
The multiple keyword is retrieved, multiple known strings are obtained;
It is calculated respectively with known strings described in each based on the summary character string, obtains the known strings pair
The coefficient of similarity answered;
Similarity threshold is set, the known strings that the coefficient of similarity is less than the similarity threshold are deleted, known to acquisition
String assemble;
In the known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison;
Using the corresponding fields of the comparison character string as the fields of the information to be stored.
2. the data information storage method according to claim 1 based on similarity, wherein each described known character
String includes at least one described keyword.
3. the data information storage method according to claim 1 based on similarity, wherein further include: it will be the multiple
Keyword root is ranked up according to significance level, and assigns the emphasis factor to each keyword.
4. the data information storage method according to claim 3 based on similarity, wherein the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates the number of known strings, wiTable
Show known strings keyword identical with summary character string, AiIndicating the corresponding emphasis factor of the keyword, i takes [1, N],
The number of N expression keyword.
5. a kind of data information stocking system based on similarity, is stored thereon with computer program, wherein described program is located
Reason device performs the steps of when executing
According to information to be stored, corresponding summary character string is obtained, and extracts multiple keywords of the summary character string;
The multiple keyword is retrieved, multiple known strings are obtained;
It is calculated respectively with known strings described in each based on the summary character string, obtains the known strings pair
The coefficient of similarity answered,
Similarity threshold is set, the known strings that the coefficient of similarity is less than the similarity threshold are deleted, known to acquisition
String assemble;
In the known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison;
Using the corresponding fields of the comparison character string as the fields of the information to be stored.
6. the data information stocking system according to claim 5 based on similarity, wherein each described known character
String includes at least one described keyword.
7. the data information stocking system according to claim 5 based on similarity, wherein further include: it will be the multiple
Keyword root is ranked up according to significance level, and assigns the emphasis factor to each keyword.
8. the data information stocking system according to claim 7 based on similarity, wherein the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates the number of known strings, wiTable
Show known strings keyword identical with summary character string, AiIndicating the corresponding emphasis factor of the keyword, i takes [1, N],
The number of N expression keyword.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810709543.9A CN108959207A (en) | 2018-07-02 | 2018-07-02 | Data information storage method and system based on similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810709543.9A CN108959207A (en) | 2018-07-02 | 2018-07-02 | Data information storage method and system based on similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108959207A true CN108959207A (en) | 2018-12-07 |
Family
ID=64484568
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810709543.9A Withdrawn CN108959207A (en) | 2018-07-02 | 2018-07-02 | Data information storage method and system based on similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108959207A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113643582A (en) * | 2021-10-14 | 2021-11-12 | 南京极域信息科技有限公司 | Multi-source wireless interactive feedback system |
-
2018
- 2018-07-02 CN CN201810709543.9A patent/CN108959207A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113643582A (en) * | 2021-10-14 | 2021-11-12 | 南京极域信息科技有限公司 | Multi-source wireless interactive feedback system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102722709B (en) | Method and device for identifying garbage pictures | |
CN106202248B (en) | Phrase-based search in information retrieval system | |
CN105045875B (en) | Personalized search and device | |
CN103984738A (en) | Role labelling method based on search matching | |
CN108897842A (en) | Computer readable storage medium and computer system | |
CN102867049B (en) | Chinese PINYIN quick word segmentation method based on word search tree | |
CN106682012A (en) | Commodity object information searching method and device | |
CN103810168A (en) | Search application method, device and terminal | |
CN102955912B (en) | Method and server for identifying application malicious attribute | |
CN102982076A (en) | Multi-dimensionality content labeling method based on semanteme label database | |
CN101980210A (en) | Marked word classifying and grading method and system | |
CN109800416A (en) | A kind of power equipment title recognition methods | |
CN103177022A (en) | Method and device of malicious file search | |
CN102855245A (en) | Image similarity determining method and image similarity determining equipment | |
CN112836029A (en) | Graph-based document retrieval method, system and related components thereof | |
CN108319518A (en) | File fragmentation sorting technique based on Recognition with Recurrent Neural Network and device | |
CN105512301A (en) | User grouping method based on social content | |
CN103294820A (en) | WEB page classifying method and system based on semantic extension | |
CN103902599A (en) | Fuzzy search method and fuzzy search device | |
CN105678244A (en) | Approximate video retrieval method based on improvement of editing distance | |
CN103870489B (en) | Chinese personal name based on search daily record is from extending recognition methods | |
CN108959207A (en) | Data information storage method and system based on similarity | |
CN116032741A (en) | Equipment identification method and device, electronic equipment and computer storage medium | |
CN107590233B (en) | File management method and device | |
CN107301186A (en) | A kind of recognition methods of invalid data and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181207 |