CN109254957A - A kind of archive management system based on big data - Google Patents
A kind of archive management system based on big data Download PDFInfo
- Publication number
- CN109254957A CN109254957A CN201811105206.5A CN201811105206A CN109254957A CN 109254957 A CN109254957 A CN 109254957A CN 201811105206 A CN201811105206 A CN 201811105206A CN 109254957 A CN109254957 A CN 109254957A
- Authority
- CN
- China
- Prior art keywords
- output end
- picture
- big data
- archive management
- storage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06K—GRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K17/00—Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations
- G06K17/0022—Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations arrangements or provisious for transferring data to distant stations, e.g. from a sensing device
- G06K17/0025—Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations arrangements or provisious for transferring data to distant stations, e.g. from a sensing device the arrangement consisting of a wireless interrogation device in combination with a device for optically marking the record carrier
Abstract
The invention discloses a kind of archive management systems based on big data, including end of scan, the output end of end of scan is connected with picture processing terminal, the output end of picture processing terminal is connected with distributed storage server, the output end of distributed storage server is connected with file management server, the output end of picture compression system is connected with two dimensional code and generates system, the output end that two dimensional code generates system is connected with automatic clustering and ordering system, the output end of automatic clustering and ordering system is connected with picture uploading system, the output end of picture storage system is connected with picture recognition system, the present invention relates to archive administration technique fields.The archive management system based on big data, reach compared to traditional FTP storage mode, it is more efficient, more stable, safer, it substantially increases digitized efficiency and avoids the risk of omission, memory space is dramatically saved, realizes the purpose of the quick-searching according to arbitrary content keyword.
Description
Technical field
The present invention relates to archive administration technique field, specially a kind of archive management system based on big data.
Background technique
With society and economic fast development, explosive growth is all presented in the archival shape and quantity of each enterprises and institutions
Situation, traditional papery way to manage have been unable to satisfy routine work needs, and the digitlization and standardized management of archives are inevitable
Trend.
Archive management system is by establishing unified standard, the entire file management of specification, the complete Archive Resource letter of building
Sharing service platform is ceased, supports the information process- of file administration overall process, comprising: reception, filing, storage tube are transferred in acquisition
Reason is consulted and is utilized etc., while service management mode is gradually converted to serviceization management mode, using service model as business pipe
Reason basis, Business Stream and data flow are established to service on the system platform for model.
Archive management system provides total solution for the archives modern management of enterprises and institutions, both can be with self-contained
System provides complete file administration and function of remote query for user, can also mutually tie with systems such as the OA or MIS of our unit
It closes, forms more perfect modernization Information management networks.
But the existing generally existing following disadvantage of archive management system: 1. using ftp server modes carry out it is centrally stored,
There are single-point risk, 2., which store and access file, must be provided with or obtain its specific store path, comparatively laborious, 3. files
It is more original inefficient to acquire and upload means, and is easy to omit, 4. archive pictures are original size, and occupancy memory space is big,
5. can not be retrieved according to the keyword in file content.
Summary of the invention
(1) the technical issues of solving
In view of the deficiencies of the prior art, it the present invention provides a kind of archive management system based on big data, solves and adopts
Carried out with ftp server mode centrally stored, there are single-point risk, storing and accessing file, to must be provided with or obtain it specific
Store path, comparatively laborious, the acquisition of file and upload means are more original inefficient, and are easy to omit, and achieving picture is original
Beginning size, it is big to occupy memory space, 5. the problem of can not being retrieved according to the keyword in file content.
(2) technical solution
In order to achieve the above object, the present invention is achieved by the following technical programs: a kind of archives pipe based on big data
Reason system, including end of scan, the output end of the end of scan are connected with picture processing terminal, the picture processing terminal
Output end is connected with distributed storage server, and the output end of the distributed storage server is connected with file administration service
Device.
Preferably, the picture processing terminal includes picture compression system, the output end connection of the picture compression system
There is two dimensional code to generate system, the output end that the two dimensional code generates system is connected with automatic clustering and ordering system, described automatic
Sort out and the output end of ordering system is connected with picture uploading system.
Preferably, the distributed storage server includes picture storage system, the output end of the picture storage system
It is connected with picture recognition system, the output end of the picture recognition system is connected with text stocking system.
Preferably, the file management server includes that text retrieval system, authority control system, storage space are automatically pre-
Alert system and memory space forecasting system.
Preferably, the output end of the end of scan is connect with picture compression system input.
(3) beneficial effect
The present invention provides a kind of archive management systems based on big data.Have it is following the utility model has the advantages that
(1), it is somebody's turn to do the archive management system based on big data, by using the distributed file system stored based on big data
HDFS, advanced Technical Architecture, sound back mechanism have reached compared to traditional FTP storage mode, more efficient, more stable,
Safer purpose.
(2), it is somebody's turn to do the archive management system based on big data, by using mature OCR identification technology, completes the text of archives
This change has strided forward major step compared to simple storage picture, while having realized to the automatic identification of the archives page number, has reached and has mentioned significantly
The high purpose of digitized efficiency and the risk for avoiding omission.
(3), it is somebody's turn to do the archive management system based on big data, by using lossless batch picture compression technology, compared to direct
Original image is stored, has achieved the purpose that dramatically save memory space.
(4), it is somebody's turn to do the archive management system based on big data, by constructing full-text search engine based on archives textual, is reached
The purpose for realizing the quick-searching according to arbitrary content keyword is arrived.
Detailed description of the invention
Fig. 1 is overall structure block diagram of the invention.
In figure: 1 end of scan, 2 picture processing terminals, 21 picture compression systems, 22 two dimensional codes generation system, 23 are returned automatically
Class and ordering system, 24 picture uploading systems, 3 distributed storage servers, 31 picture storage systems, 32 picture recognition systems,
33 text stocking systems, 4 file management servers, 41 text retrieval systems, 42 authority control systems, 43 storage spaces are automatically pre-
Alert system, 44 memory space forecasting systems.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Fig. 1-1 is please referred to, the present invention provides a kind of technical solution: a kind of archive management system based on big data, including
End of scan 1, the output end of end of scan 1 are connected with picture processing terminal 2, and the output end of picture processing terminal 2 is connected with point
Cloth storage server 3, the output end of distributed storage server 3 are connected with file management server 4.Picture processing terminal 2
Including picture compression system 21, the output end of picture compression system 21 is connected with two dimensional code and generates system 22, and two dimensional code generates 22
The output end of system is connected with automatic clustering and ordering system 23, and the output end of automatic clustering and ordering system 23 is connected with picture
Uploading system 24.Distributed storage server 3 includes picture storage system 31, and the output end of picture storage system 31 is connected with figure
Piece identifying system 32, the output end of picture recognition system 32 are connected with text stocking system 33.File management server 4 includes complete
Literary searching system 41, authority control system 42, storage space automatic early-warning system 43 and memory space forecasting system 44.Scanning is eventually
The output end at end 1 is connect with 21 input terminal of picture compression system.
In use, paper document is scanned with high speed scanner one by one in archive office first to archives of paper quality file digitization, it is raw
At picture file storage to picture processing terminal.Picture batch lossless compression function is realized by client software, and scanning is given birth to
At picture carry out batch compression.The two dimensional code that client software carries out files automatically generates.Including each folder root
Every picture under a two dimensional code mark and folder, which is generated, according to its related service attribute information generates individual two dimensional code,
It include the volume information and page number information belonging to it in two dimensional code, wherein page number information is known by system by OCR function automatically
Not, without being manually entered.Automatic clustering: according to the two-dimensional barcode information of above-mentioned generation, the automatic clustering and archives of folder are realized
The automatic clustering of picture sorts, while can avoid omitting.Batch uploads local picture file to big data distributed file system
In HDFS, object storage technology, the corresponding unique ID of each picture file are used from the background.According to the ID when reading file
Can, without knowing the specific store path of file.Archives textual: system background carries out OCR knowledge to all files picture automatically
Not, it is identified as after text with the storage of text file format automatic clustering.Archives economy is stored again, on the basis of archives textual
Keyword is established to the reverse indexing of document, to construct full-text search engine, realization can be examined quickly according to any keyword
Rope carries out permission control to corresponding files, to all files file, can by role, by department, by post, press personnel
Equal carry out are flexible, grading authorized, the reliability that a threshold value is arranged for disk size to guarantee system, i.e., if distributed storage
The memory space utilization rate of any node is more than threshold value in system, then alarm can occur to allow manual intervention to extend disk appearance
Amount, above-mentioned automatic early-warning function belongs to Passive fault-tolerant control mechanism, if system manager is handled not in time, it is easy to appear events for system
Barrier;And by carrying out active predicting to disk size, it can avoid the generation of similar situation.In view of memory space changes over time
There are very strong relevances, so forecast analysis is carried out to disk size using time series analysis method, according to historical data structure
A time series models are built, the prediction of disk size is then carried out according to the model.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions.By sentence " element limited including one ..., it is not excluded that
There is also other identical elements in the process, method, article or apparatus that includes the element ".
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding
And modification, the scope of the present invention is defined by the appended.
Claims (5)
1. a kind of archive management system based on big data, including end of scan (1), it is characterised in that: the end of scan (1)
Output end be connected with picture processing terminal (2), the output end of the picture processing terminal (2) is connected with distributed storage service
The output end of device (3), the distributed storage server (3) is connected with file management server (4).
2. a kind of archive management system based on big data according to claim 1, it is characterised in that: the picture processing
Terminal (2) includes picture compression system (21), and the output end of the picture compression system (21) is connected with two dimensional code and generates system
(22), the output end that the two dimensional code generates (22) system is connected with automatic clustering and ordering system (23), the automatic clustering
And the output end of ordering system (23) is connected with picture uploading system (24).
3. a kind of archive management system based on big data according to claim 1, it is characterised in that: the distribution is deposited
Storing up server (3) includes picture storage system (31), and the output end of the picture storage system (31) is connected with picture recognition system
It unites (32), the output end of the picture recognition system (32) is connected with text stocking system (33).
4. a kind of archive management system based on big data according to claim 1, it is characterised in that: the file administration
Server (4) includes text retrieval system (41), authority control system (42), storage space automatic early-warning system (43) and storage
Spatial forecasting system (44).
5. a kind of archive management system based on big data according to claim 2, it is characterised in that: the end of scan
(1) output end is connect with picture compression system (21) input terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811105206.5A CN109254957A (en) | 2018-09-21 | 2018-09-21 | A kind of archive management system based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811105206.5A CN109254957A (en) | 2018-09-21 | 2018-09-21 | A kind of archive management system based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109254957A true CN109254957A (en) | 2019-01-22 |
Family
ID=65047603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811105206.5A Pending CN109254957A (en) | 2018-09-21 | 2018-09-21 | A kind of archive management system based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109254957A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414246A (en) * | 2019-06-19 | 2019-11-05 | 平安科技(深圳)有限公司 | Shared file method for managing security, device, terminal and storage medium |
CN112102117A (en) * | 2020-08-17 | 2020-12-18 | 武汉市润普网络科技有限公司 | Electronic file quality inspection and supervision system |
CN112506865A (en) * | 2020-12-21 | 2021-03-16 | 广东天亿马信息产业股份有限公司 | File digital management system and method thereof |
CN112527947A (en) * | 2019-09-19 | 2021-03-19 | 北京国双科技有限公司 | Method and device for filing electronic documents |
CN114201447A (en) * | 2021-12-08 | 2022-03-18 | 广州明动软件股份有限公司 | Archives classification total library based on cloud archives integration platform is realized |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090164790A1 (en) * | 2007-12-20 | 2009-06-25 | Andrey Pogodin | Method and system for storage of unstructured data for electronic discovery in external data stores |
CN105303281A (en) * | 2014-06-30 | 2016-02-03 | 国网上海市电力公司 | Electronic electricity customer archive system applying centralized-distributed architecture |
CN105630786A (en) * | 2014-10-27 | 2016-06-01 | 航天信息股份有限公司 | Car purchase tax electronic archive uploading, storing and querying system and method |
CN107749894A (en) * | 2017-11-09 | 2018-03-02 | 吴章义 | A kind of safety, simple, intelligence Internet of things system |
CN108491495A (en) * | 2018-03-19 | 2018-09-04 | 合肥泓泉档案信息科技有限公司 | A kind of archive digitization management system |
-
2018
- 2018-09-21 CN CN201811105206.5A patent/CN109254957A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090164790A1 (en) * | 2007-12-20 | 2009-06-25 | Andrey Pogodin | Method and system for storage of unstructured data for electronic discovery in external data stores |
CN105303281A (en) * | 2014-06-30 | 2016-02-03 | 国网上海市电力公司 | Electronic electricity customer archive system applying centralized-distributed architecture |
CN105630786A (en) * | 2014-10-27 | 2016-06-01 | 航天信息股份有限公司 | Car purchase tax electronic archive uploading, storing and querying system and method |
CN107749894A (en) * | 2017-11-09 | 2018-03-02 | 吴章义 | A kind of safety, simple, intelligence Internet of things system |
CN108491495A (en) * | 2018-03-19 | 2018-09-04 | 合肥泓泉档案信息科技有限公司 | A kind of archive digitization management system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414246A (en) * | 2019-06-19 | 2019-11-05 | 平安科技(深圳)有限公司 | Shared file method for managing security, device, terminal and storage medium |
CN110414246B (en) * | 2019-06-19 | 2023-05-30 | 平安科技(深圳)有限公司 | Shared file security management method, device, terminal and storage medium |
CN112527947A (en) * | 2019-09-19 | 2021-03-19 | 北京国双科技有限公司 | Method and device for filing electronic documents |
CN112102117A (en) * | 2020-08-17 | 2020-12-18 | 武汉市润普网络科技有限公司 | Electronic file quality inspection and supervision system |
CN112506865A (en) * | 2020-12-21 | 2021-03-16 | 广东天亿马信息产业股份有限公司 | File digital management system and method thereof |
CN114201447A (en) * | 2021-12-08 | 2022-03-18 | 广州明动软件股份有限公司 | Archives classification total library based on cloud archives integration platform is realized |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109254957A (en) | A kind of archive management system based on big data | |
CN102769781B (en) | Method and device for recommending television program | |
KR101435789B1 (en) | System and Method for Big Data Processing of DLP System | |
CN102662988B (en) | Method for filtering redundant data of RFID middleware | |
CN103678491A (en) | Method based on Hadoop small file optimization and reverse index establishment | |
CN105160039A (en) | Query method based on big data | |
CN101196900A (en) | Information searching method based on metadata | |
CN104239377A (en) | Platform-crossing data retrieval method and device | |
CN103218402A (en) | General database data structure, data migratory system and method thereof | |
CN104462185A (en) | Digital library cloud storage system based on mixed structure | |
CN111241056B (en) | Power energy data storage optimization method based on decision tree model | |
CN110147470B (en) | Cross-machine-room data comparison system and method | |
CN104615734A (en) | Community management service big data processing system and processing method thereof | |
CN111739613A (en) | Medical image cloud filing platform based on distributed computing technology | |
CN112084190A (en) | Big data based acquired data real-time storage and management system and method | |
CN102591878A (en) | Digital processing method of technical standard | |
CN111431821A (en) | Method for rapidly detecting and identifying specific information in network large flow | |
CN115509693A (en) | Data optimization method based on cluster Pod scheduling combined with data lake | |
CN105912621A (en) | Area building energy consumption platform data storing and query method | |
CN112306421B (en) | Method and system for storing MDF file in analysis and measurement data format | |
CN111079394A (en) | Internet-based government affair data form filling system and method | |
CN114003774A (en) | A big data information collection system of electric power for wisdom city | |
Wang et al. | The Construction Techniques of Artificial Intelligence Hierarchical Dataset in Power Industry | |
CN201570029U (en) | Information resources collection and management system based on business rule repository | |
CN111104416A (en) | Distributed electric power data management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190122 |
|
RJ01 | Rejection of invention patent application after publication |