CN105373605A

CN105373605A - Batch storage method and system for data files

Info

Publication number: CN105373605A
Application number: CN201510767586.9A
Authority: CN
Inventors: 高万林; 赵龙; 任延昭; 陈雪瑞; 段晶洁; 李静
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2015-11-11
Filing date: 2015-11-11
Publication date: 2016-03-02

Abstract

The invention provides a batch storage method and system for data files. The method comprises the following steps: collecting multiple pieces of text data and multimedia data matched with each piece of text data; carrying out duplicate checking on all text data to obtain multiple groups of duplicated text data and correspondingly matched multimedia data; reserving one piece of text data in each group of duplicated text data and carrying out identifying storage on the text data and initial single text data; and sorting and storing the multimedia data matched with each piece of text data with the same identification code. According to the batch storage method and system, storage is carried out on a large number of scientific and technological achievement texts, pictures and audio and video data; each text data is identified after duplicate checking and deleting are carried out on the text data; and the pictures and the audio or video data matched with the text data are endowed with the same identification code and are stored into a database respectively, so that effective storage of the data files is finished.

Description

Data file batch storage means and system

Technical field

The present invention relates to computer application field, particularly relating to a kind of data file for managing agricultural science and technology achievement storage means and system in batches.

Background technology

In the last few years, country is annual all in concern rural economy, rural development and rural demography, investment in agriculture is also in steady increasing, scientific research institutions and universities and colleges are all further exploring agricultural and are researching and developing, annual Technology value is the scientific and technological achievement data that quantity is various, these achievement datas comprise text, picture, the various ways such as Voice & Video and form, how these a large amount of random performance data are carried out effective store and management, how to screen and import the key factor becoming restriction performance data rapid saving, therefore a kind of more effective mode is needed to carry out the importing of data.

Summary of the invention

The invention provides a kind of data file batch storage means and system, for solving in prior art the problem that random redundant data in enormous quantities imports.

On the one hand, the invention provides a kind of data file batch storage means, the step of described storage means comprises:

Gather many text datas and the multi-medium data with every bar matches text data;

All text datas are looked into heavy with the multi-medium data obtaining the many groups of text datas repeated each other and Corresponding matching;

Store often organizing in the text data that repeats each other all to retain a text data and carry out mark with initially single text data;

The multi-medium data of every bar matches text data is given same identification code classification to store.

Further, all text datas are looked into heavily comprise: the multiple typing conditions according to text data are looked into heavily the content of text under each typing condition.

Further, comprise there is the step that the many groups of text datas repeated each other process:

The text data often organizing typing at first in text data and the multi-medium data that mates with text data are retained;

Delete by all the other text datas and with the multi-medium data of all the other matches text data.

Further, described multi-medium data comprises image data, voice data and/or video data.

Further, described typing condition is scientific and technological achievement title, author, unit, research beginning and ending time, text.

Further, described textual data be it is investigated that major punishment is broken and be there is the condition of text data repeated each other and comprise:

The ratio of the title number of words that continuous repetition number of words is less with number of words is greater than preset ratio;

And/or,

Author all identical or exist an author identical;

And/or,

Unit all identical or exist a unit identical;

And/or,

The research beginning and ending time is identical or overlapping;

And/or,

Every section, text repeats number of words continuously and is greater than preset ratio with the ratio of every section of total number of word.

Further, also comprise the step of data search, comprising:

According to multi-field retrieval method, the text data in text database is searched for, and judge whether to there is search text data;

If exist, then determine the identification code searching for text data, and in multimedia database, search out corresponding multi-medium data with this identification code, then Search Results is shown;

Otherwise, not display of search results.

On the other hand, the invention provides a kind of data file batch storage system, comprising:

Acquisition module, for gathering many text datas and the multi-medium data with every bar matches text data;

Textual data it is investigated that major punishment is broken module, heavy with the multi-medium data obtaining the many groups of text datas repeated each other and Corresponding matching for looking into all text datas;

Code storage module, stores often organizing in the text data that repeats each other all to retain a text data and carry out mark with initially single text data, the multi-medium data of every bar matches text data is given same identification code classification simultaneously and stores.

Further, also comprise typing condition memory module, for editing or store the typing condition of text data.

Further, also comprising repeated text data processing module, for often organizing the text data retaining typing at first in the text data that repeats each other, and all the other text datas being deleted.

As shown from the above technical solution, the present invention stores a large amount of text data and multi-medium data, after heavily deletion is looked into text data, each text data is identified, and same identification code is given to the multi-medium data that text data matches, be stored in respectively in taxonomy database afterwards, complete paired data file effectively stores.In addition, in search procedure, search for text data, determine text data, then obtain multi-medium data with Search Flags coding mode, complete paired data file effectively searches for display.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of storage means described in the embodiment of the present invention 1;

Fig. 2 is the concrete implementing procedure figure of one of storage means shown in the embodiment of the present invention 1;

Fig. 3 is the structured flowchart of storage system described in the embodiment of the present invention 2.

Embodiment

Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.Following examples for illustration of the present invention, but are not used for limiting the scope of the invention.

Fig. 1 indicates a kind of data file batch storage means that the embodiment of the present invention 1 provides, and the step of described storage means comprises:

1, many text datas and the multi-medium data with every bar matches text data is gathered;

2, look into heavily with the multi-medium data obtaining the many groups of text datas repeated each other and Corresponding matching all text datas, wherein, multi-medium data can comprise image data, voice data and video data;

3, store often organizing in the text data that repeats each other all to retain a text data and carry out mark with initially single text data;

4, the multi-medium data of every bar matches text data is given same identification code classification to store.

Be illustrated in figure 2 the specific embodiment made for above-mentioned storage means:

S1, the image data, voice data and/or the video data that gather many text datas and match with every bar text data;

Multiple typing conditions of S2, foundation text data are looked into the content of text of all text datas under each typing condition and are weighed and judge whether to there are repeated text data;

If there are the many groups of text datas repeated each other in S21, then the text data often organizing typing at first in text data and the image data, voice data and/or the video data that mate with text data are retained, all the other text datas and the image data, voice data and/or the video data that mate with these all the other text datas are deleted;

Otherwise S22, retains all text datas, image data, voice data and video data;

S3, all retain a text data by often organizing in the text data that repeats each other and all carry out identifying stored in text database with initially single text data, wherein, initially single text data refers to the text data not having to repeat, and does not repeat to be single;

S4, image data, voice data and/or video data that bar text data every after mark mates are given same identification code be stored in respectively in picture database, audio database and video database.

Explain explanation further to said method, all agricultural science and technology achievement text datas can be carried out typing according to the excel template of systemic presupposition when image data file by the storage means described in the present embodiment.In the process of typing text data, each text data has unique numbering.After typing, the template having all text datas is submitted to system, system can be looked into all text datas and heavily process.Looking in heavy process, needing the multiple typing conditions according to text data to look into heavily the content of text of all text datas under each typing condition.This typing condition is the content typing criterion in Input Process.Look into heavily to the content of text under each typing condition, from all angles, scientific and technological achievement text data is looked into heavily, the text data that may duplicate is not also missed, improve and look into heavy accuracy.In the present embodiment, typing condition can be scientific and technological achievement title, author, unit, research beginning and ending time, text.When looking into heavy successively according to scientific and technological achievement title-author-unit-research beginning and ending time-key word of the text sequence secondary ordered pair content separately of carrying out looking into weight compares, and need judge whether to there are repeated text data in comparison process according to default judgment criterion.Default judgment criterion can be: the ratio repeating the number of words title number of words less with number of words is continuously greater than preset ratio; Author all identical or exist an author identical; Unit all identical or exist a unit identical; The research beginning and ending time is identical or overlapping; Every section, text repeats number of words continuously and is greater than preset ratio with the ratio of every section of total number of word.Above-mentioned judgment criterion can be meet one or more criterions can judge that text data is the text data repeated each other, look into heavily from minimum judge point to scientific and technological achievement text data, the text data that may duplicate one is not also missed, improves and look into heavy accuracy.

There are the many groups of text datas repeated each other if judge, then the text data often organizing typing at first in the text data (having two sections of text datas at least) that repeats each other and the image data, voice data and/or the video data that mate with text data are retained, all the other text datas and the image data, voice data and/or the video data that mate with these all the other text datas are deleted.In such situation, identical text data is just only left a text data, avoids data redundancy.

The repeated text data retained and the text data that there is not repetition are identified according to default identification means, makes each text data have uniqueness.Because text data may wear picture, audio or video data.Therefore, in order to ensure integrality and the correspondence of whole scientific and technological achievement, the image data of correspondence, voice data or video data can be kept consistency with the identification code of text data.For this reason, the data after all identifying can be stored into separately in corresponding database.

Do specific explanations with following table below to illustrate:

	Scientific and technological achievement title	Author	Unit	The research beginning and ending time	Text
						1	Wine-growing technology	C	P university	2014.05.06-2015.03.08	This shows slightly
2	Peach Apricot graft technology	E、F	L company	2013.01.15-2015.08.16	This shows slightly
						3	Large output wine-growing	C、B	P university, G research	2014.09.13-2015.01.20	This shows slightly

	Method		Institute
						4	Corn variety	A	Q research institute	2013.12.01-2014.12.10	This shows slightly
5	Corn variety	A	Q research institute	2013.12.01-2014.12.10	This shows slightly

In table, through looking into heavily, the text data being numbered 1 and 3 meets above-mentioned default judgment criterion on title, author, research institute, research beginning and ending time, through looking into heavily, text also meets presets judgment criterion, the text data being then numbered 1 and 3 is the two sections of text datas repeated each other, need retain the text data being numbered 1, be numbered 3 text data deleted.

In table, through looking into heavily, the text data being numbered 4 and 5 meets above-mentioned default judgment criterion on title, author, research institute, research beginning and ending time, through looking into heavily, text also meets presets judgment criterion, the text data being then numbered 4 and 5 is the two sections of text datas repeated each other, need retain the text data being numbered 4, be numbered 5 text data deleted.

In table, through looking into heavily, the text data that the text data being numbered 2 does not repeat each other, then continue to retain the text data being numbered 2.

The present invention also comprises the step of searching for text data, comprising:

If exist, then determine the identification code searching for text data, and in picture database, audio database and/or video database, search out corresponding image data, voice data and/or video data with this identification code, then Search Results is shown;

Otherwise, not display of search results.

The invention provides a kind of data file batch storage system according to above-mentioned storage method, as shown in Figure 3, this system comprises:

Acquisition module, for the image data, voice data and/or the video data that gather many text datas and match with every bar text data;

Textual data it is investigated that major punishment is broken module, heavy with the multi-medium data obtaining the many groups of text datas repeated each other and Corresponding matching for looking into all text datas.

Code storage module, for storing often organizing in the text data that repeats each other all to retain a text data and carry out mark with initially single text data, giving same identification code classification simultaneously and storing by the multi-medium data of every bar matches text data.

Further, also comprising typing condition memory module, for storing or the typing condition of Edit Text data, and providing typing condition to acquisition module in Input Process.

As can be known from the above technical solutions, the present invention stores a large amount of text data and multi-medium data, after heavily deletion is looked into text data, each text data is identified, and same identification code is given to the multi-medium data that text data matches, be stored in respectively in taxonomy database afterwards, complete paired data file effectively stores.In addition, in search procedure, search for text data, determine text data, then obtain multi-medium data with Search Flags coding mode, complete paired data file effectively searches for display.

In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.

The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

One of ordinary skill in the art will appreciate that: above each embodiment, only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of the claims in the present invention.

Claims

1. a data file batch storage means, it is characterized in that, the step of described storage means comprises:

2. storage means according to claim 1, is characterized in that, looks into heavily comprise all text datas: the multiple typing conditions according to text data are looked into heavily the content of text under each typing condition.

3. storage means according to claim 2, is characterized in that, described typing condition is scientific and technological achievement title, author, unit, research beginning and ending time, text.

4. storage means according to claim 1, is characterized in that, comprises there is the step that the many groups of text datas repeated each other process:

5. storage means according to claim 4, is characterized in that, it is investigated that major punishment is broken and be there is the condition of text data repeated each other and comprise to described textual data:

And/or,

Author all identical or exist an author identical;

And/or,

Unit all identical or exist a unit identical;

And/or,

The research beginning and ending time is identical or overlapping;

And/or,

6. storage means according to claim 1, is characterized in that, described multi-medium data comprises image data, voice data and/or video data.

7. storage means according to claim 1, is characterized in that, also comprises the step of data search, comprising:

Otherwise, not display of search results.

8. a data file batch storage system, is characterized in that, comprising:

9. storage system according to claim 8, is characterized in that, also comprises typing condition memory module, for editing or store the typing condition of text data.

10. storage system according to claim 8, is characterized in that, also comprises repeated text data processing module, for often organizing the text data retaining typing at first in the text data that repeats each other, and is deleted by all the other text datas.