CN108959207A - Data information storage method and system based on similarity - Google Patents

Data information storage method and system based on similarity Download PDF

Info

Publication number
CN108959207A
CN108959207A CN201810709543.9A CN201810709543A CN108959207A CN 108959207 A CN108959207 A CN 108959207A CN 201810709543 A CN201810709543 A CN 201810709543A CN 108959207 A CN108959207 A CN 108959207A
Authority
CN
China
Prior art keywords
similarity
character string
strings
keyword
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810709543.9A
Other languages
Chinese (zh)
Inventor
孙英辉
姚天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhu Wisdom Big Data Operation Co Ltd
Original Assignee
Wuhu Wisdom Big Data Operation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhu Wisdom Big Data Operation Co Ltd filed Critical Wuhu Wisdom Big Data Operation Co Ltd
Priority to CN201810709543.9A priority Critical patent/CN108959207A/en
Publication of CN108959207A publication Critical patent/CN108959207A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclose a kind of data information storage method and system based on similarity.This method may include: to obtain corresponding summary character string according to information to be stored, and extract multiple keywords of summary character string;Multiple keywords are retrieved, multiple known strings are obtained;It is calculated respectively with each known strings based on summary character string, obtains the corresponding coefficient of similarity of known strings;Similarity threshold is set, the known strings that coefficient of similarity is less than similarity threshold are deleted, obtains known character set of strings;In known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison;The corresponding fields of character string will be compared as the fields of information to be stored.The present invention simultaneously calculates similitude by comparison summary character string and known strings, sort key word, and the information of storage is classified, and promotes the efficiency and precision for storing and searching.

Description

Data information storage method and system based on similarity
Technical field
The present invention relates to information technology fields, more particularly, to a kind of data information storage method based on similarity And system.
Background technique
Big data (big data), refer to can not be captured within certain time with conventional software tool, manage and The data acquisition system of processing is to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization ability Magnanimity, high growth rate and diversified information assets, have the characteristics that 5 is big: a large amount of, high speed, multiplicity, value, authenticity.But It is that current big data inquiry is mostly manpower manual, and efficiency is lower.Therefore, it is necessary to develop a kind of data based on similarity Information storing method and system.
The information for being disclosed in background of invention part is merely intended to deepen the reason to general background technique of the invention Solution, and it is known to those skilled in the art existing to be not construed as recognizing or imply that the information is constituted in any form Technology.
Summary of the invention
The invention proposes a kind of data information storage method and system based on similarity can pass through comparison summary Character string and known strings, sort key word simultaneously calculate similitude, and the information of storage is classified, and promote the effect of storage with lookup Rate and precision.
According to an aspect of the invention, it is proposed that a kind of data information storage method based on similarity.The method can To include: to obtain corresponding summary character string according to information to be stored, and extract multiple keywords of the summary character string; The multiple keyword is retrieved, multiple known strings are obtained;Based on the summary character string respectively with it is known described in each Character string is calculated, and the corresponding coefficient of similarity of the known strings is obtained, and similarity threshold is arranged, and is deleted described similar The known strings that coefficient is less than the similarity threshold are spent, known character set of strings is obtained;In the known character set of strings In, by the maximum known strings of coefficient of similarity character string as a comparison;By the corresponding fields of the comparison character string Fields as the information to be stored.
Preferably, each described known strings includes at least one described keyword.
Preferably, further includes: the multiple keyword root is ranked up according to significance level, and each keyword is assigned Give the emphasis factor.
Preferably, the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes [1, N], N indicate the number of keyword.
According to another aspect of the invention, it is proposed that a kind of data information stocking system based on similarity, stores thereon There is computer program, wherein performed the steps of when described program is executed by processor according to information to be stored, corresponded to Summary character string, and extract multiple keywords of the summary character string;The multiple keyword is retrieved, is obtained multiple known Character string;It is calculated respectively with known strings described in each based on the summary character string, obtains the known character Go here and there corresponding coefficient of similarity;Similarity threshold is set, it is known less than the similarity threshold to delete the coefficient of similarity Character string obtains known character set of strings;In the known character set of strings, by the maximum known strings of coefficient of similarity Character string as a comparison;Using the corresponding fields of the comparison character string as the fields of the information to be stored.
Preferably, each described known strings includes at least one described keyword.
Preferably, further includes: the multiple keyword root is ranked up according to significance level, and each keyword is assigned Give the emphasis factor.
Preferably, the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes [1, N], N indicate the number of keyword.
Methods and apparatus of the present invention has other characteristics and advantages, these characteristics and advantages are attached from what is be incorporated herein It will be apparent in figure and subsequent specific embodiment, or will be in the attached drawing being incorporated herein and subsequent specific reality It applies in mode and is stated in detail, the drawings and the detailed description together serve to explain specific principles of the invention.
Detailed description of the invention
Exemplary embodiment of the invention is described in more detail in conjunction with the accompanying drawings, it is of the invention above-mentioned and its Its purpose, feature and advantage will be apparent, wherein in exemplary embodiment of the invention, identical reference label Typically represent same parts.
Fig. 1 shows the flow chart of the step of data information storage method according to the present invention based on similarity.
Specific embodiment
The present invention will be described in more detail below with reference to accompanying drawings.Although showing preferred implementation side of the invention in attached drawing Formula, however, it is to be appreciated that may be realized in various forms the present invention without that should be limited by the embodiments set forth herein.Phase Instead, these embodiments are provided so that the present invention is more thorough and complete, and can be by the scope of the present invention completely It is communicated to those skilled in the art.
Fig. 1 shows the flow chart of the step of data information storage method according to the present invention based on similarity.
In this embodiment, the data information storage method according to the present invention based on similarity may include: step 101, according to information to be stored, corresponding summary character string is obtained, and extract multiple keywords of summary character string;Step 102, Multiple keywords are retrieved, multiple known strings are obtained;Step 103, based on summary character string respectively with each known character String is calculated, and the corresponding coefficient of similarity of known strings is obtained;Step 104, similarity threshold is set, similarity system is deleted Number is less than the known strings of similarity threshold, obtains known character set of strings;It step 105, will in known character set of strings The maximum known strings of coefficient of similarity character string as a comparison;Step 106, the corresponding fields of character string will be compared to make For the fields of information to be stored.
In one example, each known strings includes at least one keyword.
In one example, further includes: be ranked up multiple keyword roots according to significance level, and to each keyword Assign the emphasis factor.
In one example, coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes [1, N], N indicate the number of keyword.
Specifically, the data information storage method according to the present invention based on similarity may include: according to letter to be stored Breath, obtains corresponding summary character string, by analysis, multiple keywords of summary character string is extracted, by multiple keyword root evidences Significance level is ranked up, and assigns the emphasis factor to each keyword, is based on multiple keywords, by retrieval, is obtained more A known strings, wherein each known strings includes at least one keyword, by known strings and summary character string Identical keyword and its corresponding emphasis factor substitute into formula (1), and it is corresponding similar that each known strings is sought in calculating Coefficient is spent, similarity threshold is set, deletes the known strings that coefficient of similarity is less than similarity threshold, workload is reduced, obtains Obtain known character set of strings;In known character set of strings, by the maximum known strings of coefficient of similarity character as a comparison String;The corresponding fields of character string will be compared as the fields of information to be stored.
This method is by comparison summary character string and known strings, and sort key word simultaneously calculates similitude, by storage Information classification, promotes the efficiency and precision of storage with lookup.
Using example
A concrete application example is given below in the scheme and its effect of embodiment of the present invention for ease of understanding.Ability Field technique personnel should be understood that the example only for the purposes of understanding that the present invention, any detail are not intended in any way The limitation present invention.
Data information storage method according to the present invention based on similarity includes:
According to information to be stored, acquisition summary character string is that Huawei P20 (aurora color, 6GB, 128GB) is mentioned by analysis 5 keywords of summary character string are taken, and 5 keyword roots are ranked up according to significance level as Huawei, P20,128GB, pole Photochromic, 6GB, and the emphasis factor: Huawei 0.3, P20 0.25,128GB 0.25, aurora color is assigned to each keyword For 0.1,6GB 0.1, be based on 5 keywords, by retrieval, obtain 3 known strings be Huawei P20 black 6GB64GB, Huawei Mate10 and P20Pro substitute into known strings keyword identical with summary character string and its corresponding emphasis factor Formula (1), it is 0.65 that the corresponding coefficient of similarity of Huawei P20 black 6GB 64GB is sought in calculating, the corresponding phase of Huawei Mate10 Be the corresponding coefficient of similarity of 0.3, P20Pro be 0.25 like degree coefficient, setting similarity threshold is 0.3, deletion coefficient of similarity Less than the known strings of similarity threshold, known character set of strings is obtained, in known character set of strings, by coefficient of similarity Maximum known strings Huawei P20 black 6GB 64GB character string as a comparison, will compare the corresponding fields of character string Fields as information to be stored.
In conclusion the present invention simultaneously calculates similitude by comparison summary character string and known strings, sort key word, The information of storage is classified, the efficiency and precision of storage with lookup is promoted.
It will be understood by those skilled in the art that above to the purpose of the description of embodiments of the present invention only for illustratively The beneficial effect for illustrating embodiments of the present invention is not intended to for embodiments of the present invention to be limited to given any show Example.
Embodiment according to the present invention provides a kind of data information stocking system based on similarity, stores thereon There is computer program, wherein performed the steps of when described program is executed by processor according to information to be stored, corresponded to Summary character string, and extract multiple keywords of the summary character string;The multiple keyword is retrieved, is obtained multiple known Character string;It is calculated respectively with known strings described in each based on the summary character string, obtains the known character Go here and there corresponding coefficient of similarity;Similarity threshold is set, it is known less than the similarity threshold to delete the coefficient of similarity Character string obtains known character set of strings;In the known character set of strings, by the maximum known strings of coefficient of similarity Character string as a comparison;Using the corresponding fields of the comparison character string as the fields of the information to be stored.
In one example, each known strings includes at least one keyword.
In one example, further includes: be ranked up multiple keyword roots according to significance level, and to each keyword Assign the emphasis factor.
In one example, coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates of known strings Number, wiIndicate known strings keyword identical with summary character string, AiIndicate the corresponding emphasis factor of the keyword, i takes [1, N], N indicate the number of keyword.
The present invention is by comparison summary character string and known strings, and sort key word simultaneously calculates similitude, by storage Information classification, promotes the efficiency and precision of storage with lookup.
It will be understood by those skilled in the art that above to the purpose of the description of embodiments of the present invention only for illustratively The beneficial effect for illustrating embodiments of the present invention is not intended to for embodiments of the present invention to be limited to given any show Example.
The embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is also not necessarily limited to disclosed each embodiment.It is right without departing from the scope and spirit of illustrated each embodiment Many modifications and changes are obvious for those skilled in the art.

Claims (8)

1. a kind of data information storage method based on similarity, comprising:
According to information to be stored, corresponding summary character string is obtained, and extracts multiple keywords of the summary character string;
The multiple keyword is retrieved, multiple known strings are obtained;
It is calculated respectively with known strings described in each based on the summary character string, obtains the known strings pair The coefficient of similarity answered;
Similarity threshold is set, the known strings that the coefficient of similarity is less than the similarity threshold are deleted, known to acquisition String assemble;
In the known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison;
Using the corresponding fields of the comparison character string as the fields of the information to be stored.
2. the data information storage method according to claim 1 based on similarity, wherein each described known character String includes at least one described keyword.
3. the data information storage method according to claim 1 based on similarity, wherein further include: it will be the multiple Keyword root is ranked up according to significance level, and assigns the emphasis factor to each keyword.
4. the data information storage method according to claim 3 based on similarity, wherein the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates the number of known strings, wiTable Show known strings keyword identical with summary character string, AiIndicating the corresponding emphasis factor of the keyword, i takes [1, N], The number of N expression keyword.
5. a kind of data information stocking system based on similarity, is stored thereon with computer program, wherein described program is located Reason device performs the steps of when executing
According to information to be stored, corresponding summary character string is obtained, and extracts multiple keywords of the summary character string;
The multiple keyword is retrieved, multiple known strings are obtained;
It is calculated respectively with known strings described in each based on the summary character string, obtains the known strings pair The coefficient of similarity answered,
Similarity threshold is set, the known strings that the coefficient of similarity is less than the similarity threshold are deleted, known to acquisition String assemble;
In the known character set of strings, by the maximum known strings of coefficient of similarity character string as a comparison;
Using the corresponding fields of the comparison character string as the fields of the information to be stored.
6. the data information stocking system according to claim 5 based on similarity, wherein each described known character String includes at least one described keyword.
7. the data information stocking system according to claim 5 based on similarity, wherein further include: it will be the multiple Keyword root is ranked up according to significance level, and assigns the emphasis factor to each keyword.
8. the data information stocking system according to claim 7 based on similarity, wherein the coefficient of similarity are as follows:
Fj=∑ Aiwi (1)
Wherein, FjIndicate that the coefficient of similarity of j-th of known strings, j take [1, M], M indicates the number of known strings, wiTable Show known strings keyword identical with summary character string, AiIndicating the corresponding emphasis factor of the keyword, i takes [1, N], The number of N expression keyword.
CN201810709543.9A 2018-07-02 2018-07-02 Data information storage method and system based on similarity Withdrawn CN108959207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810709543.9A CN108959207A (en) 2018-07-02 2018-07-02 Data information storage method and system based on similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810709543.9A CN108959207A (en) 2018-07-02 2018-07-02 Data information storage method and system based on similarity

Publications (1)

Publication Number Publication Date
CN108959207A true CN108959207A (en) 2018-12-07

Family

ID=64484568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810709543.9A Withdrawn CN108959207A (en) 2018-07-02 2018-07-02 Data information storage method and system based on similarity

Country Status (1)

Country Link
CN (1) CN108959207A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643582A (en) * 2021-10-14 2021-11-12 南京极域信息科技有限公司 Multi-source wireless interactive feedback system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643582A (en) * 2021-10-14 2021-11-12 南京极域信息科技有限公司 Multi-source wireless interactive feedback system

Similar Documents

Publication Publication Date Title
CN102722709B (en) Method and device for identifying garbage pictures
CN106202248B (en) Phrase-based search in information retrieval system
CN105045875B (en) Personalized search and device
CN103984738A (en) Role labelling method based on search matching
CN108897842A (en) Computer readable storage medium and computer system
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
CN106682012A (en) Commodity object information searching method and device
CN103810168A (en) Search application method, device and terminal
CN102955912B (en) Method and server for identifying application malicious attribute
CN102982076A (en) Multi-dimensionality content labeling method based on semanteme label database
CN101980210A (en) Marked word classifying and grading method and system
CN109800416A (en) A kind of power equipment title recognition methods
CN103177022A (en) Method and device of malicious file search
CN102855245A (en) Image similarity determining method and image similarity determining equipment
CN112836029A (en) Graph-based document retrieval method, system and related components thereof
CN108319518A (en) File fragmentation sorting technique based on Recognition with Recurrent Neural Network and device
CN105512301A (en) User grouping method based on social content
CN103294820A (en) WEB page classifying method and system based on semantic extension
CN103902599A (en) Fuzzy search method and fuzzy search device
CN105678244A (en) Approximate video retrieval method based on improvement of editing distance
CN103870489B (en) Chinese personal name based on search daily record is from extending recognition methods
CN108959207A (en) Data information storage method and system based on similarity
CN116032741A (en) Equipment identification method and device, electronic equipment and computer storage medium
CN107590233B (en) File management method and device
CN107301186A (en) A kind of recognition methods of invalid data and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20181207