CN113239207A - Online document induction and storage system based on document data analysis - Google Patents

Online document induction and storage system based on document data analysis Download PDF

Info

Publication number
CN113239207A
CN113239207A CN202110782604.6A CN202110782604A CN113239207A CN 113239207 A CN113239207 A CN 113239207A CN 202110782604 A CN202110782604 A CN 202110782604A CN 113239207 A CN113239207 A CN 113239207A
Authority
CN
China
Prior art keywords
document
online
documents
initial
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110782604.6A
Other languages
Chinese (zh)
Other versions
CN113239207B (en
Inventor
楚龙兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Spacetime Shenzhen Intelligent Technology Co ltd
Original Assignee
Shenzhen Zhiku Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhiku Information Technology Co ltd filed Critical Shenzhen Zhiku Information Technology Co ltd
Priority to CN202110782604.6A priority Critical patent/CN113239207B/en
Publication of CN113239207A publication Critical patent/CN113239207A/en
Application granted granted Critical
Publication of CN113239207B publication Critical patent/CN113239207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Abstract

The invention discloses an on-line document induction and storage system based on document data analysis, belonging to the field of documents, the invention is used for solving the technical problem that online documents are not reasonably summarized and stored and the document operation authorities of visitors are not differentially set, and comprises a document identification module, a classification storage module, a popularity calculation module and an authority distribution module, wherein the document identification module is used for carrying out document identification on initial documents, the documents are used for identifying repeated documents and latest documents, the classification storage module is used for carrying out classification storage on the initial documents according to document identifications in initial document information, the popularity calculation module is used for carrying out popularity calculation on the online documents in a server, and the authority distribution module is used for distributing document operation authorities of a user terminal.

Description

Online document induction and storage system based on document data analysis
Technical Field
The invention belongs to the field of documents, relates to an inductive storage technology, and particularly relates to an online document inductive and storage system based on document data analysis.
Background
The literature refers to books, periodicals and chapters with historical significance or research value, which are necessary for researchers to read a large amount of literature in daily research work, when some influential literature is read, the researchers need to understand the literature more deeply through the research motivation or the subject thinking of the literature, and when some new researchers want to select a new research direction, the researchers need to refer to the relevant literature and the reference literature of the literature in a preparation stage;
in the prior art, document data is not analyzed powerfully, so that online documents are not summarized and stored reasonably, the summary and storage are disordered, and authority differentiation is not performed on visitors according to document access data.
Disclosure of Invention
In view of the shortcomings of the prior art, the invention aims to provide an online document induction and storage system based on document data analysis.
The technical problem to be solved by the invention is as follows:
(1) how to powerfully analyze the document data so that online documents are reasonably summarized and stored;
(2) how to set up the difference of the document operation authority of the visitor according to the document access data.
The purpose of the invention can be realized by the following technical scheme:
an online document induction and storage system based on document data analysis comprises a data acquisition module, a document identification module, a classification storage module, a heat calculation module and a server;
the server stores a plurality of online documents and an IP address library, the server is in communication connection with a user terminal, and a user uploads initial documents or performs document operation on the online documents in the server through a document uploading unit in the user terminal;
the server receives an initial document uploaded by a user terminal, the server sends the initial document to a document identification module, the document identification module receives the initial document sent by the server and is used for carrying out document identification on the initial document, the initial document is respectively marked as a repeated document and a latest document after the document is identified, the document identification module sends the initial document marked as the repeated document and the initial document of the latest document to the server, the server generates an uploading failure signal according to the repeated document, and feeds back the uploading failure signal and the uploading success signal to the user terminal according to the latest document; the document identification module is also used for sending the initial document marked as the latest document to the classification storage module;
the classification storage module receives the initial documents sent by the document identification module, and performs classification storage on the initial documents according to document identifications in the initial document information; the heat calculation module is used for calculating the heat of the online documents in the server to obtain an active value HYu of the online documents;
the popularity calculation module sends the active value of the online document to the server and the classification storage module, and the classification storage module gives a storage grade corresponding to the online document according to the active value, specifically as follows:
step SS 1: if HYu is more than or equal to Y2, the online documents are judged to be active documents, and the corresponding online documents are moved to the system active layer;
step SS 2: if Y2 is more than HYu and more than or equal to Y1, the online documents are judged to be common documents, and the corresponding online documents are moved to a system common layer;
step SS 3: and if Y1 is more than HYu, the online document is judged to be a cold door document, and the corresponding online document is moved to a system cold door layer.
Further, the document identification step of the document identification module is specifically as follows:
the method comprises the following steps: acquiring initial document information of an initial document, and acquiring six initial document characteristics of a document name, a document accountant, a document publisher, a document publishing date, a document page number and a document identification of the initial document;
step two: combining a document name, a document accountant, a document publisher, a document publication date, a document page number, and a document identification of an initial document into an initial document property group;
step three: similarly, online document information of online documents in the server is obtained, online document characteristics of the online documents are obtained, and an online document characteristic group of the online documents is established;
step four: comparing the initial document characteristic group with a plurality of online document characteristic groups, extracting any initial document characteristic in the initial document characteristic groups, and comparing the online document characteristics of the same type in the online document characteristic groups;
and if the initial document is judged to be a repeated document, the initial document is not approved to be uploaded, and if the initial document is judged to be the latest document, the initial document is approved to be uploaded.
Further, the heat degree calculation process of the heat degree calculation module is specifically as follows:
step S1: obtaining document uploading time of the online documents through a time recording unit, and obtaining document storage duration WCTu of the online documents by subtracting the document uploading time from current time;
step S2: acquiring the praise times DZu and the raffle times DCu in the document storage time length; calculating the approval rate DZLu of the online document in the document storage time length by using a formula DZLu = DZu/(DZu + DCu);
step S3: acquiring browsing times LLu and downloading times XZu of online documents in document storage time, and substituting browsing times LLu, downloading times XZu and like rate DZLU into a calculation formula
Figure 199826DEST_PATH_IMAGE001
Obtaining a use heat value SRu of the online literature within the storage time of the literature;
step S4: dividing the document storage duration of the online document into a plurality of equal-duration time periods Ti; randomly selecting two adjacent time periods Ti and Ti+1And Ti < Ti+1(ii) a Calculating the use heat value SRuTi of the online document in the time period Ti and the time period T according to the steps S2 to S3i+1Heat value of use SRuT of mesoline literaturei+1
Step S5: if SRuTi < SRuTi+1Using the formula
Figure 123786DEST_PATH_IMAGE002
Calculating to obtain a time period from Ti to Ti+1Storing the heat growth rate RZu of the document, and recording the number of the heat growth rate as CRZu;
if SRuTi > SRuTi+1Using the formula
Figure 477407DEST_PATH_IMAGE003
Calculating to obtain a heat reduction rate RJu of the storage documents from the time period Ti to the time period Ti +1, and recording the number CRJu of the heat reduction rate;
step S6: counting the heat increase rate and the heat decrease rate of the online documents between the time periods, adding and summing the heat increase rates, averaging to obtain a heat increase average rate RZJu, adding and summing the heat decrease rates, averaging to obtain a heat decrease average rate RJJu, and calculating by combining a formula RXu = (RZJu × CRZu)/(RJU × CRJu) to obtain a heat coefficient RXu of the online documents within the document storage time;
step S7: the heat coefficient RXu of the online document within the document storage time period is combined with the use of the heat value SRu to obtain the activity value HYu of the online document using the formula HYu = RXu × SRu.
Further, the system active layer is as follows: online documents are displayed in a hot search mode on a system home page without retrieval and query; the common layers of the system are: the online documents are summarized into the category items of the system, and the query can be retrieved by opening the category items; the system cold door layer is as follows: when online documents are summarized to the folder mapped and connected by the system, a person is required to input search keywords to search and query.
Further, the system further comprises an authority distribution module, wherein the authority distribution module is used for distributing the document operation authority of the user terminal, and the distribution process specifically comprises the following steps:
step P1: acquiring a terminal IP of a user terminal, if the terminal IP is matched with the terminal IP in the IP address library, judging the terminal IP as a re-access, and marking the user terminal as a re-access user;
if the terminal IP is not matched with the terminal IP in the IP address base, judging the terminal IP as the first access, and marking the user terminal as a first access user;
step P2: carrying out identity recognition on the re-access user according to the terminal IP, if the user terminal is a management user, marking the user terminal as a background management user, and if the user terminal is a common user, marking the user terminal as a common access user;
step P3: marking a primary access user, a background management user and a common access user as q, w and e respectively; acquiring the access times of the common access user documents, and marking the access times as FCe;
step P4: obtaining the visit dwell time of the ordinary visit user each time, and obtaining the visit average time FTe of the ordinary visit user by adding and summing the visit dwell time of each time and dividing the sum by the visit times; acquiring the document click times of the ordinary access user in each access, and adding and summing the document click times of each time and dividing the sum by the access times to obtain the click average times DJe of the ordinary access user;
step P5: a document access value FWe of a common access user is calculated by the formula FWe = FCe × b1+ FTe × b2+ DJe × b 3;
step P6: if the document access value FWe of the ordinary access user is compared with the access threshold value, dividing the ordinary access user into an active access user, a medium access user and a cold access user;
step P7: respectively distributing document operation permissions for a cold access user, a primary access user, an active access user, a medium access user and a background management user;
the authority distribution module records the authority levels of the cold door access user and the initial access user as a primary access level, the authority level of the medium access user as a secondary access level, the authority level of the active access user as a tertiary access level, and the authority level of the background management user as a quaternary access level.
The user terminal further comprises a registration login unit, an information acquisition unit and a document uploading unit, wherein the registration login unit is used for inputting personal information by a user to perform registration login and sending the personal information to the server, the document uploading unit is used for uploading initial documents by the user through the user terminal and sending the initial documents to the server, the information acquisition unit is used for acquiring information of the initial documents uploaded by the user terminal, and the acquired initial document information is sent to the server together with the terminal number added with the initial document information;
the personal information comprises the name, the mobile phone number, the terminal number and the terminal IP of a user; the initial document information includes document accountants, document names, reference types, document publishers, document publication dates, document page numbers, and document identifications.
Furthermore, the data acquisition module comprises a time recording unit, a time acquisition unit and an information acquisition unit, wherein the time recording unit is used for recording the time information of the online documents in the server and sending the time information to the server, the time acquisition unit is used for acquiring the time information of the online documents in the server and sending the time information to the server, and the information acquisition unit is used for acquiring the online document information of the online documents in the server; the data acquisition module is used for acquiring time information, frequency information and online document information of online documents in the server and sending the time information, the frequency information and the online document information to the server;
the time information comprises the start viewing time, the stop viewing time, the viewing duration, the document uploading time, the document downloading time, the document moving time, the document deleting time and the document storing duration of the online documents; the online document information includes document accountants, document names, reference types, document publishers, document publication dates, document page numbers, and document identifications; the frequency information comprises browsing frequency, moving frequency, praise frequency, lottery frequency and download frequency of the online documents;
the document operations include in particular online browsing, downloading, deleting, moving, replacing and unloading.
Further, the first-level access level specifically includes: browsing online documents;
the secondary access level is specifically: browsing and downloading online documents;
the third-level access level specifically comprises: browsing, downloading and uploading online documents;
the four levels of access levels are specifically: browsing, downloading, uploading, deleting, unloading and moving of online documents.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, the document identification module is used for identifying the initial document, so that the phenomenon that the user terminal uploads the repeated document is avoided, the document which is successfully uploaded is sent to the classification storage module, and the classification storage module is used for classifying and storing the initial document according to the document identification in the initial document information; according to the invention, the popularity calculation module is used for calculating the popularity of the online documents in the server to obtain the active values of the online documents in the document storage time, and the classification storage module is used for giving the corresponding storage grades to the online documents according to the active values;
2. the method comprises the steps of distributing document operation permissions of a user terminal through a permission distribution module, judging whether an accessor is a secondary access user or a primary access user according to the judgment, identifying the identity of the secondary access user, obtaining document access values of the ordinary access user according to the access times, the access average time and the click average time of documents of the ordinary access user, comparing the document access values with an access threshold value, and dividing the ordinary access user into an active access user, a medium access user and a cold door access user, so that the document operation permissions are distributed to the cold door access user, the primary access user, the active access user, the medium access user and a background management user.
Drawings
In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings;
FIG. 1 is an overall system block diagram of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an online document induction and storage system based on document data analysis includes a data acquisition module, a right assignment module, a document identification module, a class storage module, a heat calculation module, and a server;
the server is in communication connection with a user terminal, the user terminal comprises a registration login unit, an information acquisition unit and a document uploading unit, the registration login unit is used for inputting personal information by a user to perform registration login and sending the personal information to the server, the document uploading unit is used for uploading initial documents by the user through the user terminal and sending the initial documents to the server, the information acquisition unit is used for acquiring information of the initial documents uploaded by the user terminal, and the acquired initial document information is added with a mark terminal number and then sent to the server;
the personal information comprises the name, the mobile phone number, the terminal IP and the like of a user; the initial document information includes document accountants, document names, reference types, document publishers, document publication dates, document page numbers, document identifications, and the like;
specifically, a user can upload an initial document through a document uploading unit in the user terminal, and can also perform document operation on an online document in the server through the user terminal, wherein the document operation specifically comprises online browsing, downloading, deleting, moving, replacing, unloading and the like;
the server receives an initial document uploaded by the user terminal, the server sends the initial document to the document identification module, the document identification module receives the initial document sent by the server and is used for carrying out document identification on the initial document, and the document identification steps are as follows:
the method comprises the following steps: acquiring initial document information of an initial document, and acquiring six initial document characteristics of a document name, a document accountant, a document publisher, a document publishing date, a document page number and a document identification of the initial document;
step two: combining a document name, a document accountant, a document publisher, a document publication date, a document page number, and a document identification of an initial document into an initial document property group;
step three: similarly, online document information of online documents in the server is obtained, online document characteristics of the online documents are obtained, and an online document characteristic group of the online documents is established;
step four: comparing the initial document characteristic group with a plurality of online document characteristic groups, extracting any initial document characteristic in the initial document characteristic groups, and comparing the online document characteristics of the same type in the online document characteristic groups;
the method specifically comprises the following steps: the online literature is recorded as u, u =1, 2, … …, z and z are positive integers, the literature name in the online literature characteristic group is recorded as MCu, the literature name of the initial literature is recorded as MC, the literature name similarity X1u of the online literature is obtained by calculation according to the formula X1u = MCu/MC, and by analogy, the literature liability person similarity X2u, the literature publisher similarity X3u, the literature publication date similarity X4u, the literature page similarity X5u and the literature identification similarity X6u of the online literature are obtained by calculation in sequence;
comparing the six groups of similarity with corresponding similarity thresholds, entering the next step if the similarity of any online literature characteristic exceeds the similarity threshold, and if the similarity does not exceed the similarity threshold, judging that the initial literature is the latest literature and agreeing to upload the initial literature;
adding the six groups of similarity to obtain a similarity mean XJu of the online literature, if the similarity mean XJu is greater than or equal to a set threshold, judging that the initial literature is a duplicate literature and does not agree with the uploading of the initial literature, and if the similarity mean XJu is smaller than the set threshold, judging that the initial literature is the latest literature and agrees with the uploading of the initial literature;
the document identification module marks the initial documents as repeated documents and latest documents respectively, the document identification module sends the initial documents marked with the repeated documents and the latest documents to the server, the server generates an uploading failure signal according to the repeated documents, and the server feeds back the uploading failure signal and the uploading success signal to the user terminal according to the latest documents uploading success signal; the document identification module is also used for sending the initial document marked as the latest document to the classification storage module;
the data acquisition module comprises a time recording unit, a frequency acquisition unit and an information acquisition unit, wherein the time recording unit is used for recording the time information of the online documents in the server and sending the time information to the server, the frequency acquisition unit is used for acquiring the frequency information of the online documents in the server and sending the frequency information to the server, and the information acquisition unit is used for acquiring the online document information of the online documents in the server; the data acquisition module is used for acquiring time information, frequency information and online document information of online documents in the server and sending the time information, the frequency information and the online document information to the server;
in specific implementation, the time information comprises start viewing time, stop viewing time, viewing duration, document uploading time, document downloading time, document moving time, document deleting time, document storing duration and the like of the online documents; the online document information includes document accountants, document names, reference types, document publishers, document publication dates, document page numbers, document identifications, and the like; the frequency information comprises browsing frequency, moving frequency, praise frequency, lottery frequency, download frequency and the like of the online documents;
the classification storage module receives the initial documents sent by the document identification module, and performs classification storage on the initial documents according to document identifications in the initial document information;
meanwhile, the server stores a plurality of online documents and an IP address library, the IP address library stores a plurality of terminal IPs, the heat calculation module is used for performing heat calculation on the online documents in the server, and the heat calculation process specifically comprises the following steps:
step S1: obtaining document uploading time of the online documents through a time recording unit, and obtaining document storage duration WCTu of the online documents by subtracting the document uploading time from current time;
step S2: acquiring the praise times DZu and the raffle times DCu in the document storage time length; calculating the approval rate DZLu of the online document in the document storage time length by using a formula DZLu = DZu/(DZu + DCu);
step S3: acquiring browsing times LLu and downloading times XZu of online documents in the document storage duration, substituting the browsing times LLu, the downloading times XZu and the like into a calculation formula to obtain a heat of use value SRu of the online documents in the document storage duration, wherein the calculation formula is specifically as follows:
Figure 337916DEST_PATH_IMAGE004
(ii) a In the formula, a1 and a2 are both fixed values of proportionality coefficients, and values of a1 and a2 are both greater than zero, in specific implementation, a1 may be 0.1321, and a2 may be 1.0245511, as long as a1 and a2 are guaranteed to be fixed values greater than zero, which is not limited herein;
step S4: dividing the document storage duration of the online document into a plurality of equal-duration time periods Ti, i =1, 2, … …, x, i represents the number of the time periods, x is a positive integer, and the time periods Ti are sorted by time into T1 < T2 < … … < Tx; randomly selecting two adjacent time periods Ti and Ti+1And Ti < Ti+1(ii) a Calculating the use heat value SRuTi of the online document in the time period Ti and the time period T according to the steps S2 to S3i+1Heat value of use SRuT of mesoline literaturei+1
Step S5: if SRuTi < SRuTi+1Using the formula
Figure 333554DEST_PATH_IMAGE005
Calculating to obtain a time period from Ti to Ti+1Storing the heat growth rate RZu of the document, and recording the number of the heat growth rate as CRZu;
if SRuTi > SRuTi+1Using the formula
Figure 736853DEST_PATH_IMAGE006
Calculating to obtain a time period from Ti to Ti+1Storing the heat reduction rate RJu of the document, and recording the number CRJu of the heat reduction rate;
step S6: counting the heat increase rate and the heat decrease rate of the online documents between the time periods, adding and summing the heat increase rates, averaging to obtain a heat increase average rate RZJu, adding and summing the heat decrease rates, averaging to obtain a heat decrease average rate RJJu, and calculating by combining a formula RXu = (RZJu × CRZu)/(RJU × CRJu) to obtain a heat coefficient RXu of the online documents within the document storage time;
specifically, the following description is provided; the overall heat rate of increase or decrease of the online documents over the document storage period is not considered herein;
step S7: combining the heat coefficient RXu of the online document with the heat value SRu for the document storage time length to obtain an active value HYu of the online document by the formula HYu = RXu × SRu;
the heat calculation module sends the active value of the online document to the server and the classification storage module, and the classification storage module gives a corresponding storage grade to the online document according to the active value, specifically as follows:
step SS 1: if HYu is more than or equal to Y2, the online documents are judged to be active documents, and the corresponding online documents are moved to the system active layer; wherein, the system active layer specifically is: online documents are displayed in a hot search mode on a system home page without retrieval and query;
step SS 2: if Y2 is more than HYu and more than or equal to Y1, the online documents are judged to be ordinary documents, and the corresponding online documents are moved to a system common layer, wherein the system common layer specifically comprises the following steps: the online documents are summarized into the category items of the system, and the query can be retrieved by opening the category items;
step SS 3: if Y1 is more than HYu, the online documents are judged to be cold documents, and the corresponding online documents are moved to a system cold door layer, wherein the system cold door layer specifically comprises the following steps: the online documents are summarized into a folder mapped and connected with the system, and can be searched and inquired by inputting search keywords by using personnel; wherein Y1 and Y2 are both active thresholds, and Y1 < Y2;
the system also comprises an authority distribution module, wherein the authority distribution module is used for distributing the document operation authority of the user terminal, and the distribution process specifically comprises the following steps:
step P1: acquiring a terminal IP of a user terminal, if the terminal IP is matched with the terminal IP in the IP address library, judging the terminal IP as a re-access, and marking the user terminal as a re-access user;
if the terminal IP is not matched with the terminal IP in the IP address base, judging the terminal IP as the first access, and marking the user terminal as a first access user;
step P2: carrying out identity recognition on the re-access user according to the terminal IP, if the user terminal is a management user, marking the user terminal as a background management user, and if the user terminal is a common user, marking the user terminal as a common access user;
step P3: marking a primary access user, a background management user and a common access user as q, w and e respectively, wherein q =1, 2, … …, v, w =1, 2, … …, n, e =1, 2, … …, m, v, n and m are positive integers; acquiring the access times of the common access user documents, and marking the access times as FCe;
step P4: obtaining the visit dwell time of the ordinary visit user each time, and obtaining the visit average time FTe of the ordinary visit user by adding and summing the visit dwell time of each time and dividing the sum by the visit times; acquiring the document click times of the ordinary access user in each access, and adding and summing the document click times of each time and dividing the sum by the access times to obtain the click average times DJe of the ordinary access user;
step P5: a document access value FWe of a common access user is calculated by the formula FWe = FCe × b1+ FTe × b2+ DJe × b 3; in the formula, b1, b2 and b3 are all weight coefficients, b1+ b2+ b3=1, and the values of b1, b2 and b3 are all greater than zero, in specific implementation, b1 may be 0.2, b2 may be 0.28, and b3 may be 0.52;
step P6: if the document access value FWe of the ordinary access user is greater than or equal to K2, the ordinary access user is marked as an active access user;
if the document access value FWe of the ordinary access user is greater than or equal to K1 and less than K2, marking the ordinary access user as a medium access user;
if the document access value FWe of the ordinary access user is less than K1 and greater than zero, the ordinary access user is marked as a cold access user; wherein K1 and K2 are access thresholds, and K1 < K2;
step P7: respectively distributing document operation permissions for a cold access user, a primary access user, an active access user, a medium access user and a background management user;
the authority distribution module records the authority levels of cold door access users and initial access users as a primary access level, the authority level of medium access users as a secondary access level, the authority level of active access users as a tertiary access level and the authority level of background management users as a quaternary access level;
an online document induction and storage system based on document data analysis is characterized in that when in work, a user can upload initial documents through a document uploading unit in a user terminal and can also conduct document operation on the online documents in a server through the user terminal, the server sends the initial documents uploaded by the user terminal to a document identification module, the document identification module conducts document identification on the initial documents to obtain initial document information of the initial documents and obtain six initial document characteristics of the initial documents, the six initial document characteristics are combined into an initial document characteristic group, similarly, the online document information of the online documents in the server is obtained to obtain initial document characteristics of the online documents, an online document characteristic group of the initial documents is established, the initial document characteristic group is compared with a plurality of online document characteristic groups to extract any initial document characteristic in the initial document characteristic group, comparing the online document characteristics of the same type in the online document characteristic group, calculating the similarity between each online document characteristic and the initial document characteristic, comparing the similarity with corresponding similarity threshold values, if the similarity of any online document characteristic does not exceed the similarity threshold values, judging the initial document to be the latest document, agreeing to upload the initial document, if the similarity of any online document characteristic exceeds the similarity threshold values, adding and summing the similarities to obtain the similar mean value of the online documents, comparing the similar mean values to set threshold values, judging the initial document to be the duplicate document or the latest document, sending the duplicate document and the initial document of the latest document to a server by a document identification module, respectively generating an upload failure signal and an upload success signal by the server, simultaneously sending the initial document marked as the latest document to a classification storage module, receiving the initial document sent by the document identification module by the classification storage module, classifying and storing the initial documents according to the initial document information in the initial document information;
the method comprises the steps of carrying out heat calculation on online documents in a server through a heat calculation module, obtaining document uploading time of the online documents through a time recording unit, obtaining document storage duration WCTu of the online documents by subtracting the document uploading time from current time, and obtaining points in the document storage durationSubstituting the rating DZLu, browsing times LLu and downloading times XZu into the calculation formula
Figure 28420DEST_PATH_IMAGE008
Obtaining the use heat value SRu of the online literature in the literature storage time length, dividing the literature storage time length of the online literature into a plurality of equal-time sections Ti, and randomly selecting two adjacent time sections Ti and time sections Ti+1Calculating the use heat value SRuTi of the online literature in the time period Ti and the use heat value SRuTi in the time period Ti+1Heat value of use SRuT of mesoline literaturei+1If SRuTi < SRuTi+1Using the formula
Figure 313907DEST_PATH_IMAGE009
Calculating to obtain a time period from Ti to Ti+1Storing the heat growth rate RZu of the document, and recording the number of the heat growth rate as CRZu if SRuTi > SRuTi+1Using the formula
Figure DEST_PATH_IMAGE010
Calculating to obtain a time period from Ti to Ti+1Storing heat reduction rate RJu of a document, recording the number CRJu of the heat reduction rates, counting heat increase rates and heat reduction rates of online documents between time periods, adding and summing the heat increase rates to obtain a heat increase average rate RZJu, adding and summing the heat reduction rates to obtain a heat reduction average rate RJJu, calculating a heat coefficient RXu of the online documents in the document storage duration by combining a formula RXu = (RZJu × CRZu)/(RJU × CRJu), calculating an active value HYu of the online documents in the document storage duration by using the heat coefficient RXu of the online documents and combining a formula HYu = RXu × SRu, and sending the active value of the online documents to a server and a classification storage module by a heat calculation module;
the classification storage module gives a storage grade corresponding to the online literature according to the activity value, if HYu is not less than Y2, the online literature is judged to be the active literature, the corresponding online literature is moved to a system activity layer, if Y2 is more than HYu is not less than Y1, the online literature is judged to be a common literature, the corresponding online literature is moved to a system common layer, if Y1 is more than HYu, the online literature is judged to be a cold door literature, and the corresponding online literature is moved to a system cold door layer;
the application also allocates the document operation authority of the user terminal through the authority allocation module to obtain the terminal IP of the user terminal, the terminal IP and the terminal IP in the IP address library judge the user terminal as a secondary access user or a primary access user, the identity of the user who accesses again is identified to obtain a background management user or a common access user, the access times FCe, the average access time FTe and the average click time DJe of the document of the common access user are obtained, the document access value FWe of the ordinary access user is calculated through a formula FWe = FCe × b1+ FTe × b2+ DJe × b3, if the document access value FWe of the ordinary access user is compared with an access threshold value, the ordinary access user is divided into an active access user, a medium access user and a cold access user, and finally document operation rights are respectively allocated to the cold access user, a primary access user, the active access user, the medium access user and a background management user.
The above formulas are all calculated by taking the numerical value of the dimension, the formula is a formula which obtains the latest real situation by acquiring a large amount of data and performing software simulation, and the preset parameters in the formula are set by the technical personnel in the field according to the actual situation.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (8)

1. An online document induction and storage system based on document data analysis is characterized by comprising a data acquisition module, a document identification module, a class storage module, a heat calculation module and a server;
the server stores a plurality of online documents and an IP address library, the server is in communication connection with a user terminal, and a user uploads initial documents or performs document operation on the online documents in the server through a document uploading unit in the user terminal;
the server receives an initial document uploaded by a user terminal, the server sends the initial document to a document identification module, the document identification module receives the initial document sent by the server and is used for carrying out document identification on the initial document, the initial document is respectively marked as a repeated document and a latest document after the document is identified, the document identification module sends the initial document marked as the repeated document and the initial document of the latest document to the server, the server generates an uploading failure signal according to the repeated document, and feeds back the uploading failure signal and the uploading success signal to the user terminal according to the latest document; the document identification module is also used for sending the initial document marked as the latest document to the classification storage module;
the classification storage module receives the initial documents sent by the document identification module, and performs classification storage on the initial documents according to document identifications in the initial document information; the heat calculation module is used for calculating the heat of the online documents in the server to obtain an active value HYu of the online documents;
the popularity calculation module sends the active value of the online document to the server and the classification storage module, and the classification storage module gives a storage grade corresponding to the online document according to the active value, specifically as follows:
step SS 1: if HYu is more than or equal to Y2, the online documents are judged to be active documents, and the corresponding online documents are moved to the system active layer;
step SS 2: if Y2 is more than HYu and more than or equal to Y1, the online documents are judged to be common documents, and the corresponding online documents are moved to a system common layer;
step SS 3: and if Y1 is more than HYu, the online document is judged to be a cold door document, and the corresponding online document is moved to a system cold door layer.
2. A system for online summarization and storage of documents based on document data analysis according to claim 1, wherein the document identification module comprises the following steps:
the method comprises the following steps: acquiring initial document information of an initial document, and acquiring six initial document characteristics of a document name, a document accountant, a document publisher, a document publishing date, a document page number and a document identifier of the initial document;
step two: combining a document name, a document accountant, a document publisher, a document publication date, a document page number, and a document identification of an initial document into an initial document property group;
step three: similarly, online document information of online documents in the server is obtained, online document characteristics of the online documents are obtained, and an online document characteristic group of the online documents is established;
step four: comparing the initial document characteristic group with a plurality of online document characteristic groups, extracting any initial document characteristic in the initial document characteristic groups, and comparing the online document characteristics of the same type in the online document characteristic groups;
and if the initial document is judged to be a repeated document, the initial document is not approved to be uploaded, and if the initial document is judged to be the latest document, the initial document is approved to be uploaded.
3. The system of claim 1, wherein the heat calculation module performs the following steps:
step S1: obtaining document uploading time of the online documents through a time recording unit, and obtaining document storage duration WCTu of the online documents by subtracting the document uploading time from current time;
step S2: acquiring the praise times DZu and the raffle times DCu in the document storage time length; calculating the approval rate DZLu of the online document in the document storage time length by using a formula DZLu = DZu/(DZu + DCu);
step S3: acquiring browsing times LLu and downloading times XZu of online documents in document storage time, and substituting browsing times LLu, downloading times XZu and like rate DZLU into a calculation formula
Figure 703274DEST_PATH_IMAGE001
Obtaining a use heat value SRu of the online literature within the storage time of the literature;
step S4: dividing the document storage duration of the online document into a plurality of equal-duration time periods Ti; randomly selecting two adjacent time periods Ti and Ti+1And Ti < Ti+1(ii) a Calculating the use heat value SRuTi of the online document in the time period Ti and the time period T according to the steps S2 to S3i+1Heat value of use SRuT of mesoline literaturei+1
Step S5: if SRuTi < SRuTi+1Using the formula
Figure 354835DEST_PATH_IMAGE002
Calculating to obtain a time period from Ti to Ti+1Storing the heat growth rate RZu of the document, and recording the number of the heat growth rate as CRZu;
if SRuTi > SRuTi+1Using the formula
Figure 931310DEST_PATH_IMAGE003
Calculating to obtain a time period from Ti to Ti+1Storing the heat reduction rate RJu of the document, and recording the number CRJu of the heat reduction rate;
step S6: counting the heat increase rate and the heat decrease rate of the online documents between the time periods, adding and summing the heat increase rates, averaging to obtain a heat increase average rate RZJu, adding and summing the heat decrease rates, averaging to obtain a heat decrease average rate RJJu, and calculating by combining a formula RXu = (RZJu × CRZu)/(RJU × CRJu) to obtain a heat coefficient RXu of the online documents within the document storage time;
step S7: the heat coefficient RXu of the online document within the document storage time period is combined with the use of the heat value SRu to obtain the activity value HYu of the online document using the formula HYu = RXu × SRu.
4. A system for online document summarization and storage based on document data analysis according to claim 1 wherein the system activity layer is: online documents are displayed in a hot search mode on a system home page without retrieval and query; the common layers of the system are: the online documents are summarized into the category items of the system, and the query can be retrieved by opening the category items; the system cold door layer is as follows: when online documents are summarized to the folder mapped and connected by the system, a person is required to input search keywords to search and query.
5. The system for online summarizing and storing documents based on document data analysis according to claim 1, wherein the system further comprises an authority distribution module, the authority distribution module is used for distributing document operation authority of the user terminal, and the distribution process is as follows:
step P1: acquiring a terminal IP of a user terminal, if the terminal IP is matched with the terminal IP in the IP address library, judging the terminal IP as a re-access, and marking the user terminal as a re-access user;
if the terminal IP is not matched with the terminal IP in the IP address base, judging the terminal IP as the first access, and marking the user terminal as a first access user;
step P2: carrying out identity recognition on the re-access user according to the terminal IP, if the user terminal is a management user, marking the user terminal as a background management user, and if the user terminal is a common user, marking the user terminal as a common access user;
step P3: marking a primary access user, a background management user and a common access user as q, w and e respectively; acquiring the access times of the common access user documents, and marking the access times as FCe;
step P4: obtaining the visit dwell time of the ordinary visit user each time, and obtaining the visit average time FTe of the ordinary visit user by adding and summing the visit dwell time of each time and dividing the sum by the visit times; acquiring the document click times of the ordinary access user in each access, and adding and summing the document click times of each time and dividing the sum by the access times to obtain the click average times DJe of the ordinary access user;
step P5: a document access value FWe of a common access user is calculated by the formula FWe = FCe × b1+ FTe × b2+ DJe × b 3;
step P6: if the document access value FWe of the ordinary access user is compared with the access threshold value, dividing the ordinary access user into an active access user, a medium access user and a cold access user;
step P7: respectively distributing document operation permissions for a cold access user, a primary access user, an active access user, a medium access user and a background management user;
the authority distribution module records the authority levels of the cold door access user and the initial access user as a primary access level, the authority level of the medium access user as a secondary access level, the authority level of the active access user as a tertiary access level, and the authority level of the background management user as a quaternary access level.
6. The system of claim 1, wherein the user terminal comprises a registration login unit, an information collection unit and a document uploading unit, the registration login unit is used for inputting personal information by a user to perform registration login and sending the personal information to the server, the document uploading unit is used for using the user to upload initial documents through the user terminal and send the initial documents to the server, the information collection unit is used for collecting the initial documents uploaded by the user terminal, and the initial document information is sent to the server together with the terminal number after being collected and marked;
the personal information comprises the name, the mobile phone number, the terminal number and the terminal IP of a user; the initial document information includes document accountants, document names, reference types, document publishers, document publication dates, document page numbers, and document identifications.
7. A system for online documentation induction and storage based on document data analysis according to claim 1 wherein the data collection module includes a time recording unit for recording time information of online documentation in the server and sending the time information to the server, a times collection unit for collecting times information of online documentation in the server and sending the times information to the server, and an information collection unit for collecting online documentation information of online documentation in the server; the data acquisition module is used for acquiring time information, frequency information and online document information of online documents in the server and sending the time information, the frequency information and the online document information to the server;
the time information comprises the start viewing time, the stop viewing time, the viewing duration, the document uploading time, the document downloading time, the document moving time, the document deleting time and the document storing duration of the online documents; the online document information includes document accountants, document names, reference types, document publishers, document publication dates, document page numbers, and document identifications; the frequency information comprises browsing frequency, moving frequency, praise frequency, lottery frequency and download frequency of the online documents;
the document operations include in particular online browsing, downloading, deleting, moving, replacing and unloading.
8. A system for online summarization and storage of documents based on document data analysis according to claim 5 wherein the first level of access specifically is: browsing online documents;
the secondary access level is specifically: browsing and downloading online documents;
the third-level access level specifically comprises: browsing, downloading and uploading online documents;
the four levels of access levels are specifically: browsing, downloading, uploading, deleting, unloading and moving of online documents.
CN202110782604.6A 2021-07-12 2021-07-12 Online document induction and storage system based on document data analysis Active CN113239207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110782604.6A CN113239207B (en) 2021-07-12 2021-07-12 Online document induction and storage system based on document data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110782604.6A CN113239207B (en) 2021-07-12 2021-07-12 Online document induction and storage system based on document data analysis

Publications (2)

Publication Number Publication Date
CN113239207A true CN113239207A (en) 2021-08-10
CN113239207B CN113239207B (en) 2021-09-24

Family

ID=77135274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110782604.6A Active CN113239207B (en) 2021-07-12 2021-07-12 Online document induction and storage system based on document data analysis

Country Status (1)

Country Link
CN (1) CN113239207B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114035752A (en) * 2021-12-01 2022-02-11 特斯联科技集团有限公司 Urban carbon neutralization data processing system
CN114417099A (en) * 2022-01-21 2022-04-29 黑龙江中医药大学 Archive management system based on RFID (radio frequency identification) label
CN114915453A (en) * 2022-04-14 2022-08-16 浙江网商银行股份有限公司 Access response method and device
CN115357551A (en) * 2022-08-24 2022-11-18 福州年科信息科技有限公司 Big data-based data management system for enterprise management consultation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983246A (en) * 1997-02-14 1999-11-09 Nec Corporation Distributed document classifying system and machine readable storage medium recording a program for document classifying
CN101819601A (en) * 2010-05-11 2010-09-01 同方知网(北京)技术有限公司 Method for automatically classifying academic documents
CN103530388A (en) * 2013-10-22 2014-01-22 浪潮电子信息产业股份有限公司 Performance improving data processing method in cloud storage system
CN109918481A (en) * 2019-02-28 2019-06-21 深圳市海恒智能科技有限公司 The method and system of automatic stereowarehouse storage books
CN109977076A (en) * 2019-03-25 2019-07-05 段崇楷 A kind of historical document classification storage method based on big data analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983246A (en) * 1997-02-14 1999-11-09 Nec Corporation Distributed document classifying system and machine readable storage medium recording a program for document classifying
CN101819601A (en) * 2010-05-11 2010-09-01 同方知网(北京)技术有限公司 Method for automatically classifying academic documents
CN103530388A (en) * 2013-10-22 2014-01-22 浪潮电子信息产业股份有限公司 Performance improving data processing method in cloud storage system
CN109918481A (en) * 2019-02-28 2019-06-21 深圳市海恒智能科技有限公司 The method and system of automatic stereowarehouse storage books
CN109977076A (en) * 2019-03-25 2019-07-05 段崇楷 A kind of historical document classification storage method based on big data analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIAN LIPING ET.AL: "A Study on IT-security vocabulary for domain document classification", 《2011 SEVENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY》 *
李娟娟: "若干热门主题文献归类探析", 《福建师范大学学报(哲学社会科学版)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114035752A (en) * 2021-12-01 2022-02-11 特斯联科技集团有限公司 Urban carbon neutralization data processing system
CN114417099A (en) * 2022-01-21 2022-04-29 黑龙江中医药大学 Archive management system based on RFID (radio frequency identification) label
CN114417099B (en) * 2022-01-21 2022-09-09 黑龙江中医药大学 Archive management system based on RFID (radio frequency identification) label
CN114915453A (en) * 2022-04-14 2022-08-16 浙江网商银行股份有限公司 Access response method and device
CN115357551A (en) * 2022-08-24 2022-11-18 福州年科信息科技有限公司 Big data-based data management system for enterprise management consultation

Also Published As

Publication number Publication date
CN113239207B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN113239207B (en) Online document induction and storage system based on document data analysis
CN102929959B (en) A kind of book recommendation method based on user behavior
US7016889B2 (en) System and method for identifying useful content in a knowledge repository
US20050187937A1 (en) Computer program product, device system, and method for providing document view
CN107526807A (en) Information recommendation method and device
CN107832333B (en) Method and system for constructing user network data fingerprint based on distributed processing and DPI data
CN113704830B (en) Intelligent website data tamper-proof system and method
CN106997557A (en) Sequence information acquisition method and device
CN107092645A (en) A kind of library resource management method and device
Nadi et al. A hybrid recommender system for dynamic web users
CN104915388B (en) It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology
CN107103038A (en) A kind of Subject Selection system based on big data
KR20080040355A (en) System and method for deduction about future signal and issue using r&amp;d environmental information
CN117370539A (en) Legal provision information recommendation system based on knowledge base and large model
Shepherd et al. Are ISO 15489‐1: 2001 and ISAD (G) compatible? Part 1
CN113204644B (en) Government affair encyclopedia construction method based on knowledge graph
CN105264563A (en) Portal site system
CN107093149A (en) Online friend relation strength assessment method and system
CN111797317A (en) Wisdom learning system based on digital library
Butler Electronic editions of serials: the virtual library model
CN109460518A (en) A kind of book recommendation method based on user website access record
CN109491800B (en) File pushing system and file pushing method based on product structure
CN116595262A (en) Travel scheme recommendation method and device, electronic equipment and computer storage medium
CN110163575B (en) Regional library cluster automatic management system
CN115392875B (en) Traditional folk house protection data system and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230925

Address after: A2701, Nanshan Software Park, No. 10128 Shennan Avenue, Liancheng Community, Nantou Street, Nanshan District, Shenzhen, Guangdong Province, 518000

Patentee after: New Spacetime (Shenzhen) Intelligent Technology Co.,Ltd.

Address before: 518000 f6-021-c, Hedong building, Haoyunlai Plaza, Hedong community, Xixiang street, Bao'an District, Shenzhen City, Guangdong Province

Patentee before: Shenzhen Zhiku Information Technology Co.,Ltd.

CP03 Change of name, title or address

Address after: 513000 503, Building D, Haifu Ecological Building, 9 Happy Harbor, Haibin Community, Xin'an Street, Bao'an District, Shenzhen, Guangdong

Patentee after: New Spacetime (Shenzhen) Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: A2701, Nanshan Software Park, No. 10128 Shennan Avenue, Liancheng Community, Nantou Street, Nanshan District, Shenzhen, Guangdong Province, 518000

Patentee before: New Spacetime (Shenzhen) Intelligent Technology Co.,Ltd.

Country or region before: China