CN109254957A - A kind of archive management system based on big data - Google Patents

A kind of archive management system based on big data Download PDF

Info

Publication number
CN109254957A
CN109254957A CN201811105206.5A CN201811105206A CN109254957A CN 109254957 A CN109254957 A CN 109254957A CN 201811105206 A CN201811105206 A CN 201811105206A CN 109254957 A CN109254957 A CN 109254957A
Authority
CN
China
Prior art keywords
output end
picture
big data
archive management
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811105206.5A
Other languages
Chinese (zh)
Inventor
胡军
武杨
黄奇
周慰
刘磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANHUI HEXIN TECHNOLOGY DEVELOPMENT Co Ltd
Original Assignee
ANHUI HEXIN TECHNOLOGY DEVELOPMENT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANHUI HEXIN TECHNOLOGY DEVELOPMENT Co Ltd filed Critical ANHUI HEXIN TECHNOLOGY DEVELOPMENT Co Ltd
Priority to CN201811105206.5A priority Critical patent/CN109254957A/en
Publication of CN109254957A publication Critical patent/CN109254957A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K17/00Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations
    • G06K17/0022Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations arrangements or provisious for transferring data to distant stations, e.g. from a sensing device
    • G06K17/0025Methods or arrangements for effecting co-operative working between equipments covered by two or more of main groups G06K1/00 - G06K15/00, e.g. automatic card files incorporating conveying and reading operations arrangements or provisious for transferring data to distant stations, e.g. from a sensing device the arrangement consisting of a wireless interrogation device in combination with a device for optically marking the record carrier

Abstract

The invention discloses a kind of archive management systems based on big data, including end of scan, the output end of end of scan is connected with picture processing terminal, the output end of picture processing terminal is connected with distributed storage server, the output end of distributed storage server is connected with file management server, the output end of picture compression system is connected with two dimensional code and generates system, the output end that two dimensional code generates system is connected with automatic clustering and ordering system, the output end of automatic clustering and ordering system is connected with picture uploading system, the output end of picture storage system is connected with picture recognition system, the present invention relates to archive administration technique fields.The archive management system based on big data, reach compared to traditional FTP storage mode, it is more efficient, more stable, safer, it substantially increases digitized efficiency and avoids the risk of omission, memory space is dramatically saved, realizes the purpose of the quick-searching according to arbitrary content keyword.

Description

A kind of archive management system based on big data
Technical field
The present invention relates to archive administration technique field, specially a kind of archive management system based on big data.
Background technique
With society and economic fast development, explosive growth is all presented in the archival shape and quantity of each enterprises and institutions Situation, traditional papery way to manage have been unable to satisfy routine work needs, and the digitlization and standardized management of archives are inevitable Trend.
Archive management system is by establishing unified standard, the entire file management of specification, the complete Archive Resource letter of building Sharing service platform is ceased, supports the information process- of file administration overall process, comprising: reception, filing, storage tube are transferred in acquisition Reason is consulted and is utilized etc., while service management mode is gradually converted to serviceization management mode, using service model as business pipe Reason basis, Business Stream and data flow are established to service on the system platform for model.
Archive management system provides total solution for the archives modern management of enterprises and institutions, both can be with self-contained System provides complete file administration and function of remote query for user, can also mutually tie with systems such as the OA or MIS of our unit It closes, forms more perfect modernization Information management networks.
But the existing generally existing following disadvantage of archive management system: 1. using ftp server modes carry out it is centrally stored, There are single-point risk, 2., which store and access file, must be provided with or obtain its specific store path, comparatively laborious, 3. files It is more original inefficient to acquire and upload means, and is easy to omit, 4. archive pictures are original size, and occupancy memory space is big, 5. can not be retrieved according to the keyword in file content.
Summary of the invention
(1) the technical issues of solving
In view of the deficiencies of the prior art, it the present invention provides a kind of archive management system based on big data, solves and adopts Carried out with ftp server mode centrally stored, there are single-point risk, storing and accessing file, to must be provided with or obtain it specific Store path, comparatively laborious, the acquisition of file and upload means are more original inefficient, and are easy to omit, and achieving picture is original Beginning size, it is big to occupy memory space, 5. the problem of can not being retrieved according to the keyword in file content.
(2) technical solution
In order to achieve the above object, the present invention is achieved by the following technical programs: a kind of archives pipe based on big data Reason system, including end of scan, the output end of the end of scan are connected with picture processing terminal, the picture processing terminal Output end is connected with distributed storage server, and the output end of the distributed storage server is connected with file administration service Device.
Preferably, the picture processing terminal includes picture compression system, the output end connection of the picture compression system There is two dimensional code to generate system, the output end that the two dimensional code generates system is connected with automatic clustering and ordering system, described automatic Sort out and the output end of ordering system is connected with picture uploading system.
Preferably, the distributed storage server includes picture storage system, the output end of the picture storage system It is connected with picture recognition system, the output end of the picture recognition system is connected with text stocking system.
Preferably, the file management server includes that text retrieval system, authority control system, storage space are automatically pre- Alert system and memory space forecasting system.
Preferably, the output end of the end of scan is connect with picture compression system input.
(3) beneficial effect
The present invention provides a kind of archive management systems based on big data.Have it is following the utility model has the advantages that
(1), it is somebody's turn to do the archive management system based on big data, by using the distributed file system stored based on big data HDFS, advanced Technical Architecture, sound back mechanism have reached compared to traditional FTP storage mode, more efficient, more stable, Safer purpose.
(2), it is somebody's turn to do the archive management system based on big data, by using mature OCR identification technology, completes the text of archives This change has strided forward major step compared to simple storage picture, while having realized to the automatic identification of the archives page number, has reached and has mentioned significantly The high purpose of digitized efficiency and the risk for avoiding omission.
(3), it is somebody's turn to do the archive management system based on big data, by using lossless batch picture compression technology, compared to direct Original image is stored, has achieved the purpose that dramatically save memory space.
(4), it is somebody's turn to do the archive management system based on big data, by constructing full-text search engine based on archives textual, is reached The purpose for realizing the quick-searching according to arbitrary content keyword is arrived.
Detailed description of the invention
Fig. 1 is overall structure block diagram of the invention.
In figure: 1 end of scan, 2 picture processing terminals, 21 picture compression systems, 22 two dimensional codes generation system, 23 are returned automatically Class and ordering system, 24 picture uploading systems, 3 distributed storage servers, 31 picture storage systems, 32 picture recognition systems, 33 text stocking systems, 4 file management servers, 41 text retrieval systems, 42 authority control systems, 43 storage spaces are automatically pre- Alert system, 44 memory space forecasting systems.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Fig. 1-1 is please referred to, the present invention provides a kind of technical solution: a kind of archive management system based on big data, including End of scan 1, the output end of end of scan 1 are connected with picture processing terminal 2, and the output end of picture processing terminal 2 is connected with point Cloth storage server 3, the output end of distributed storage server 3 are connected with file management server 4.Picture processing terminal 2 Including picture compression system 21, the output end of picture compression system 21 is connected with two dimensional code and generates system 22, and two dimensional code generates 22 The output end of system is connected with automatic clustering and ordering system 23, and the output end of automatic clustering and ordering system 23 is connected with picture Uploading system 24.Distributed storage server 3 includes picture storage system 31, and the output end of picture storage system 31 is connected with figure Piece identifying system 32, the output end of picture recognition system 32 are connected with text stocking system 33.File management server 4 includes complete Literary searching system 41, authority control system 42, storage space automatic early-warning system 43 and memory space forecasting system 44.Scanning is eventually The output end at end 1 is connect with 21 input terminal of picture compression system.
In use, paper document is scanned with high speed scanner one by one in archive office first to archives of paper quality file digitization, it is raw At picture file storage to picture processing terminal.Picture batch lossless compression function is realized by client software, and scanning is given birth to At picture carry out batch compression.The two dimensional code that client software carries out files automatically generates.Including each folder root Every picture under a two dimensional code mark and folder, which is generated, according to its related service attribute information generates individual two dimensional code, It include the volume information and page number information belonging to it in two dimensional code, wherein page number information is known by system by OCR function automatically Not, without being manually entered.Automatic clustering: according to the two-dimensional barcode information of above-mentioned generation, the automatic clustering and archives of folder are realized The automatic clustering of picture sorts, while can avoid omitting.Batch uploads local picture file to big data distributed file system In HDFS, object storage technology, the corresponding unique ID of each picture file are used from the background.According to the ID when reading file Can, without knowing the specific store path of file.Archives textual: system background carries out OCR knowledge to all files picture automatically Not, it is identified as after text with the storage of text file format automatic clustering.Archives economy is stored again, on the basis of archives textual Keyword is established to the reverse indexing of document, to construct full-text search engine, realization can be examined quickly according to any keyword Rope carries out permission control to corresponding files, to all files file, can by role, by department, by post, press personnel Equal carry out are flexible, grading authorized, the reliability that a threshold value is arranged for disk size to guarantee system, i.e., if distributed storage The memory space utilization rate of any node is more than threshold value in system, then alarm can occur to allow manual intervention to extend disk appearance Amount, above-mentioned automatic early-warning function belongs to Passive fault-tolerant control mechanism, if system manager is handled not in time, it is easy to appear events for system Barrier;And by carrying out active predicting to disk size, it can avoid the generation of similar situation.In view of memory space changes over time There are very strong relevances, so forecast analysis is carried out to disk size using time series analysis method, according to historical data structure A time series models are built, the prediction of disk size is then carried out according to the model.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions.By sentence " element limited including one ..., it is not excluded that There is also other identical elements in the process, method, article or apparatus that includes the element ".
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is defined by the appended.

Claims (5)

1. a kind of archive management system based on big data, including end of scan (1), it is characterised in that: the end of scan (1) Output end be connected with picture processing terminal (2), the output end of the picture processing terminal (2) is connected with distributed storage service The output end of device (3), the distributed storage server (3) is connected with file management server (4).
2. a kind of archive management system based on big data according to claim 1, it is characterised in that: the picture processing Terminal (2) includes picture compression system (21), and the output end of the picture compression system (21) is connected with two dimensional code and generates system (22), the output end that the two dimensional code generates (22) system is connected with automatic clustering and ordering system (23), the automatic clustering And the output end of ordering system (23) is connected with picture uploading system (24).
3. a kind of archive management system based on big data according to claim 1, it is characterised in that: the distribution is deposited Storing up server (3) includes picture storage system (31), and the output end of the picture storage system (31) is connected with picture recognition system It unites (32), the output end of the picture recognition system (32) is connected with text stocking system (33).
4. a kind of archive management system based on big data according to claim 1, it is characterised in that: the file administration Server (4) includes text retrieval system (41), authority control system (42), storage space automatic early-warning system (43) and storage Spatial forecasting system (44).
5. a kind of archive management system based on big data according to claim 2, it is characterised in that: the end of scan (1) output end is connect with picture compression system (21) input terminal.
CN201811105206.5A 2018-09-21 2018-09-21 A kind of archive management system based on big data Pending CN109254957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811105206.5A CN109254957A (en) 2018-09-21 2018-09-21 A kind of archive management system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811105206.5A CN109254957A (en) 2018-09-21 2018-09-21 A kind of archive management system based on big data

Publications (1)

Publication Number Publication Date
CN109254957A true CN109254957A (en) 2019-01-22

Family

ID=65047603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811105206.5A Pending CN109254957A (en) 2018-09-21 2018-09-21 A kind of archive management system based on big data

Country Status (1)

Country Link
CN (1) CN109254957A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414246A (en) * 2019-06-19 2019-11-05 平安科技(深圳)有限公司 Shared file method for managing security, device, terminal and storage medium
CN112102117A (en) * 2020-08-17 2020-12-18 武汉市润普网络科技有限公司 Electronic file quality inspection and supervision system
CN112506865A (en) * 2020-12-21 2021-03-16 广东天亿马信息产业股份有限公司 File digital management system and method thereof
CN112527947A (en) * 2019-09-19 2021-03-19 北京国双科技有限公司 Method and device for filing electronic documents
CN114201447A (en) * 2021-12-08 2022-03-18 广州明动软件股份有限公司 Archives classification total library based on cloud archives integration platform is realized

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164790A1 (en) * 2007-12-20 2009-06-25 Andrey Pogodin Method and system for storage of unstructured data for electronic discovery in external data stores
CN105303281A (en) * 2014-06-30 2016-02-03 国网上海市电力公司 Electronic electricity customer archive system applying centralized-distributed architecture
CN105630786A (en) * 2014-10-27 2016-06-01 航天信息股份有限公司 Car purchase tax electronic archive uploading, storing and querying system and method
CN107749894A (en) * 2017-11-09 2018-03-02 吴章义 A kind of safety, simple, intelligence Internet of things system
CN108491495A (en) * 2018-03-19 2018-09-04 合肥泓泉档案信息科技有限公司 A kind of archive digitization management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090164790A1 (en) * 2007-12-20 2009-06-25 Andrey Pogodin Method and system for storage of unstructured data for electronic discovery in external data stores
CN105303281A (en) * 2014-06-30 2016-02-03 国网上海市电力公司 Electronic electricity customer archive system applying centralized-distributed architecture
CN105630786A (en) * 2014-10-27 2016-06-01 航天信息股份有限公司 Car purchase tax electronic archive uploading, storing and querying system and method
CN107749894A (en) * 2017-11-09 2018-03-02 吴章义 A kind of safety, simple, intelligence Internet of things system
CN108491495A (en) * 2018-03-19 2018-09-04 合肥泓泉档案信息科技有限公司 A kind of archive digitization management system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414246A (en) * 2019-06-19 2019-11-05 平安科技(深圳)有限公司 Shared file method for managing security, device, terminal and storage medium
CN110414246B (en) * 2019-06-19 2023-05-30 平安科技(深圳)有限公司 Shared file security management method, device, terminal and storage medium
CN112527947A (en) * 2019-09-19 2021-03-19 北京国双科技有限公司 Method and device for filing electronic documents
CN112102117A (en) * 2020-08-17 2020-12-18 武汉市润普网络科技有限公司 Electronic file quality inspection and supervision system
CN112506865A (en) * 2020-12-21 2021-03-16 广东天亿马信息产业股份有限公司 File digital management system and method thereof
CN114201447A (en) * 2021-12-08 2022-03-18 广州明动软件股份有限公司 Archives classification total library based on cloud archives integration platform is realized

Similar Documents

Publication Publication Date Title
CN109254957A (en) A kind of archive management system based on big data
CN102769781B (en) Method and device for recommending television program
KR101435789B1 (en) System and Method for Big Data Processing of DLP System
CN102662988B (en) Method for filtering redundant data of RFID middleware
CN103678491A (en) Method based on Hadoop small file optimization and reverse index establishment
CN105160039A (en) Query method based on big data
CN101196900A (en) Information searching method based on metadata
CN104239377A (en) Platform-crossing data retrieval method and device
CN103218402A (en) General database data structure, data migratory system and method thereof
CN104462185A (en) Digital library cloud storage system based on mixed structure
CN111241056B (en) Power energy data storage optimization method based on decision tree model
CN110147470B (en) Cross-machine-room data comparison system and method
CN104615734A (en) Community management service big data processing system and processing method thereof
CN111739613A (en) Medical image cloud filing platform based on distributed computing technology
CN112084190A (en) Big data based acquired data real-time storage and management system and method
CN102591878A (en) Digital processing method of technical standard
CN111431821A (en) Method for rapidly detecting and identifying specific information in network large flow
CN115509693A (en) Data optimization method based on cluster Pod scheduling combined with data lake
CN105912621A (en) Area building energy consumption platform data storing and query method
CN112306421B (en) Method and system for storing MDF file in analysis and measurement data format
CN111079394A (en) Internet-based government affair data form filling system and method
CN114003774A (en) A big data information collection system of electric power for wisdom city
Wang et al. The Construction Techniques of Artificial Intelligence Hierarchical Dataset in Power Industry
CN201570029U (en) Information resources collection and management system based on business rule repository
CN111104416A (en) Distributed electric power data management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190122

RJ01 Rejection of invention patent application after publication