CN116663549B

CN116663549B - Digitized management method, system and storage medium based on enterprise files

Info

Publication number: CN116663549B
Application number: CN202310567168.XA
Authority: CN
Inventors: 陈四娣; 潘灵; 胡敏; 袁虎将; 李慢慢
Original assignee: Hainan University of Science and Technology
Current assignee: Hainan University of Science and Technology
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2024-03-19
Anticipated expiration: 2043-05-18
Also published as: CN116663549A

Abstract

The invention discloses a digital management method, a digital management system and a storage medium based on enterprise files, and relates to the technical field of file digital management. The digitized management method based on enterprise files comprises the following steps: acquiring enterprise files needing digital management, wherein the enterprise files comprise text files, picture files, audio recording files and video recording files; performing digital processing on the enterprise file to obtain a digital file of the enterprise document, wherein the digital processing comprises: acquiring a text file, preprocessing an image and the like; extracting keywords from the digitized file and classifying the digitized file, wherein the keyword extraction comprises word segmentation processing on the text file to obtain a plurality of classification labels; according to the classification labels, archiving and sorting are carried out; the method realizes the function of acquiring more detailed labels to classify the documents, and solves the problem that the enterprise file digital management method in the prior art is difficult to classify and process the enterprise files further.

Description

Digitized management method, system and storage medium based on enterprise files

Technical Field

The invention relates to the technical field of file digital management, in particular to a digital management method, system and storage medium based on enterprise files.

Background

Enterprise archive management is always one of the important links in enterprise management. In the past, enterprise files have been stored in handwritten or printed form, and the management mode has been relatively cumbersome, with the risk of losing and destroying the files. However, with the development of information technology, digital file management is gradually and widely adopted by enterprises, so that comprehensive, unified, efficient and accurate file management is realized. The digital file management technology converts paper files into electronic data, and realizes the rapid storage, retrieval and management of the files through a computer technology, thereby greatly improving the reliability and safety of file management.

Existing enterprise archive digital management methods generally include scanning means, text reading means, data uploading means, data storage means and data modification means, or intelligent and accurate archive processing based on text semantic understanding of the electronically scanned document.

For example, publication No.: the invention patent of CN112633042A discloses a digital file management system and method, the system comprises: the device comprises a scanning device, a text reading device, a data uploading device, a data storage device and a data correction device; the scanning device is in signal connection with the text reading device and is used for sending the scanned archive text to the text reading device; the text reading device is in signal connection with the data uploading device and is used for reading the scanned text content, converting the read text content into digital content and transmitting the converted digital content to the data uploading device; the data uploading device is in signal connection with the data storage device and is used for uploading the digital content to the data storage device; the data storage device is in signal connection with the data correction device and is used for storing the digital content of the uploaded file; the data correction device is used for detecting error content in the digital content of the file and correcting the error content. The method has the advantages of high automation degree, archive text correction function and high management efficiency.

For example, publication No.: the invention patent of CN115827939a discloses a digitized archive management system, which extracts, through a context encoder comprising an embedded layer, global-based high-dimensional semantic features of each word in a text description of an electronically scanned document; and extracting multi-scale semantic understanding associated features of the text description under the word features of different scales by using a text convolutional neural network with one-dimensional convolutional kernels of different scales, classifying and judging topic labels corresponding to the text description according to the multi-scale semantic understanding associated features, and further automatically archiving the electronic scanning document. In this way, intelligent and accurate archival archiving processing is performed based on text semantic understanding of the electronically scanned document, thereby enabling digitized archival management.

However, in the process of implementing the technical scheme of the invention in the embodiment of the application, the inventor of the application finds that at least the following technical problems exist in the above technology:

the existing enterprise archive digital management method is provided with a scanning device, a text reading device, a data uploading device, a data storage device and a data correction device, so that error contents in digital contents of archives are detected, and correction is carried out on the error contents, but the digital contents cannot be further classified; the text semantic understanding of the electronic scanning document is intelligently and accurately archival processing by extracting the global high-dimensional semantic features of each word in the text description of the electronic scanning document through a context encoder comprising an embedded layer, but the text content cannot be classified more carefully and accurately. In summary, the enterprise file digital management method in the prior art has the problem that it is difficult to further classify and process the enterprise file.

Disclosure of Invention

The embodiment of the application solves the problem that the enterprise file digital management method in the prior art is difficult to carry out further classification processing on the enterprise file by providing the enterprise file based digital management method, the enterprise file based digital management system and the storage medium, and realizes further classification management on the enterprise file content.

The embodiment of the application provides a digitized management method based on enterprise files, which comprises the following steps: acquiring digitally managed enterprise files, wherein the enterprise files comprise text files, picture files, audio recording files and video recording files; performing digital processing on the enterprise file to obtain a digital file of the enterprise document; extracting keywords from the digitized file and classifying the digitized file, wherein the keyword extraction comprises word segmentation processing on the text file to obtain a plurality of classification labels; and (5) according to the classification labels, archiving and sorting.

Further, the word segmentation process performs word segmentation through a word segmentation model, and the main steps are as follows: and (3) feature selection: converting Chinese text in a text file into a text sequence, taking each character in the text sequence as a state, and extracting character features, wherein the character features comprise a current character, a previous character and a next character; model training: training a word segmentation model according to the set training corpus to obtain parameters including transition probabilities among states and conditional probabilities among states and features; word segmentation prediction: predicting a new text sequence by using the trained word segmentation model to obtain a word segmentation sequence serving as a word segmentation label; the transition probability between states is calculated by the following formula: Wherein P is<y _i |y _i-1 >Representing the previous state y _i-1 Current state y _i Conditional probability f of (f) _k (y _i-1 ,y _i ) Representing the kth characteristic function at y _i-1 And y _i Lower value, lambda _k The weight representing the kth feature function, i representing the ith state, i=1, 2,3, a.j.k represents a k-th feature function, k=1, 2,3 a.n.; the conditional probability between the state and the feature is calculated by the following formula:wherein P is<y _i |y _i-1 ,x>Representing conditional probabilities between states and features, where x represents the input sequence, P<y _i |y _i-1 ,x,i>Representing a given input sequence P<y _i |x>And the labeling state y of the previous Chinese character _i-1 When the current labeling state of the Chinese character x is y _i Conditional probability of->Is given the state y of the previous Chinese character _i-1 And under the condition of inputting sequence x, the current Chinese character state y _i Is a sum of probabilities of (c).

Further, the digitizing process is performed on the text file, including the following steps: acquiring a text file, and acquiring image data of the text file through shooting or scanning; preprocessing an image, namely preprocessing the image data, wherein the preprocessing step comprises the steps of adjusting definition, denoising, self-adaptive binarization and text direction detection; dividing characters after image pretreatment into single characters or word blocks for processing; character recognition, namely recognizing the single character or word block by using an OCR technology, and converting the single character or word block into a text form; and (5) performing text post-processing, namely performing data processing on the text result obtained by recognition.

Further, extracting keywords from the picture files and classifying the picture files; acquiring a picture file; classifying pictures, namely classifying objects in the picture files by using a deep learning model, identifying different objects in the pictures, including vehicles, buildings and people by training the model, and generating keywords; object detection, namely detecting the region in the picture file by using an object detection algorithm, identifying the positions of a plurality of objects and generating keywords; and extracting keywords, namely performing word segmentation, part-of-speech tagging and grammar analysis on texts according to the picture identification result and the object detection result in the picture classification and by combining a text analysis function in a natural language processing technology, and extracting keywords, wherein the ranking and screening of the keywords are obtained according to the occurrence frequency and weight characteristics of the keywords.

Further, extracting keywords from the recording file and classifying the extracted keywords; acquiring a recording file; preprocessing the recording, namely preprocessing the recording file, wherein the preprocessing comprises audio format conversion, noise reduction and volume normalization; voice recognition, which is to use an automatic voice recognition technology to perform recording recognition on the preprocessed recording file and convert an audio signal in the recording into a text form; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text by using a natural language processing technology according to a voice recognition result, extracting keywords, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

Further, extracting keywords from the video files and classifying the video files; acquiring a video file; preprocessing, namely converting the video record file into an image sequence, and preprocessing the image sequence, wherein the preprocessing comprises cutting, scaling and denoising operations; video analysis, namely performing video analysis on the processed video files by using a deep learning model, classifying scenes and objects in the video, extracting detection keywords, performing word segmentation, part-of-speech tagging and grammar analysis processing on texts according to video analysis results and combining text analysis functions in natural language processing technology, extracting keywords and phrases, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

Furthermore, the method for digitally managing enterprise files based on the file verification method for extracting the digitized files after archiving and arrangement comprises the following specific steps: receiving a digital file to be stored, acquiring management object information, receiving the digital file to be stored, and acquiring management object information used for representing first authority information of the digital file; the digital files to be stored also comprise management authority information corresponding to the management object information, wherein the management authority information is used for representing the authority level of the digital files and corresponds to the user access authority of the storage environment using objects and is used for limiting the access and acquisition of different using objects to different digital files; judging the encryption state and executing the encryption program: judging the encryption state of the digitized file to be stored, if the file is not encrypted, executing an encryption program, acquiring a corresponding object key and a device verification key based on management object information by using the encryption program, and encoding and encrypting the digitized file and other related information by using the device verification key to acquire an encrypted stored file, wherein the device verification key is used for representing hardware verification information; responding to the digitized file acquisition request: accessing the digitized file, sending a digitized file acquisition request, receiving the request and acquiring hardware verification information and management object information of a request object; comparing and judging according to the object of the digital file acquisition request and the management object information, generating a judging result, acquiring the authority level of the digital file corresponding to the digital file acquisition request and the management authority information of the request object when the judging result shows that the digital file acquisition request is different, generating a file transmission request if the management authority information corresponds to the authority level, transmitting the file transmission request to the management object corresponding to the digital file, acquiring feedback information, decrypting the digital file based on the feedback information and transmitting the digital file to the request object; decrypting the digitized file and verifying: decrypting the obtained encrypted storage document based on the management object information and the hardware verification information, calculating a group of hardware verification information comparison groups, comparing and judging, and generating a verification result; outputting or executing request feedback: and outputting the decrypted digital file if the verification result is passed, executing a request feedback program if the verification result is not passed, and responding to the digital file acquisition request through the management object.

Further, a key acquisition request is generated according to the management object information and is sent to the management object, and the biological identification and the equipment hardware information of the management object are acquired, wherein the biological identification and the equipment hardware information comprise an equipment mainboard number and an equipment hardware address; generating an object key by using preset biological identification information, encrypting equipment hardware information by using the key, and generating an equipment verification key; combining the data to be stored and the equipment verification key, encrypting by using the equipment verification key, and generating an encrypted storage document; the step of responding to the digitized file acquisition request specifically comprises the following steps: receiving a request and responding, acquiring hardware verification information and management object information of a request object, generating a corresponding object key based on the management object information, and encrypting the hardware verification information through the object key so as to generate an equipment verification key; and decrypting the encrypted and stored document by using the device verification key, wherein if the decryption fails, the device verification key indicates that the request object or the request device has errors, the device verification key and the data to be stored are generated if the decryption is successful, the device verification key is decrypted again by using the object key to acquire hardware verification information, the hardware verification information is compared and judged, the required digital file is provided for the request object if the verification passes, and the request is refused if the verification does not pass.

Furthermore, the digital management system based on the enterprise files comprises an acquisition module, a processing module, a classification module and an archiving and sorting module; the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring digitally managed enterprise files, and the enterprise files comprise text files, picture files, audio files and video files; the processing module is used for carrying out digital processing on the enterprise file to obtain a digital file of the enterprise file; the classification module is used for extracting keywords from the digitized file and classifying the digitized file, and comprises the steps of performing word segmentation on the text file to obtain a plurality of classification labels; and the archiving and arranging module is used for archiving and arranging according to the classification labels.

Further, embodiments of the present application provide a computer readable storage medium storing a program that when executed by a processor implements a method for digitally managing enterprise files.

One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

1. the existing enterprise archive digital management method detects error contents in digital contents of archives and corrects the error contents, but cannot classify the digital contents further, and the invention comprises the following steps: acquiring an enterprise file to be digitally managed, and performing digital processing on the enterprise file to obtain a digital file of the enterprise document; the method has the advantages that the digital files are extracted in a keyword mode and classified, and according to classification labels, archiving and sorting are carried out, so that the problem that in the prior art, enterprise files are difficult to further classify and process in the digital management method of the enterprise files is effectively solved.

2. The method comprises the steps of performing word segmentation on a text file to obtain a plurality of classification labels, performing word segmentation on a word segmentation model, converting a Chinese text into a sequence, extracting character features, training the word segmentation model according to a set training corpus, and obtaining transition probability and conditional probability between states and features; a new text sequence can be predicted by utilizing the trained word segmentation model, so that a word segmentation sequence is obtained and used as a word segmentation label; the method can identify the core information and the important subject in the document, and effectively improve the efficiency and accuracy of document management.

3. According to the method, the text files are subjected to keyword extraction and classification, the picture files are subjected to keyword extraction and classification, the audio recording files are subjected to keyword extraction and classification, and the video recording files are subjected to keyword extraction and classification, so that the digital management of enterprise files is realized, and the problem that the picture, audio recording and video recording files cannot be classified by taking keywords as standards is effectively solved.

Drawings

FIG. 1 is a flowchart of a method for digitally managing enterprise files according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a process for digitizing a text file in the enterprise file-based digitizing management method according to the embodiment of the present application;

Fig. 3 is a schematic structural diagram of a digital management system based on enterprise files according to an embodiment of the present application.

Detailed Description

For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.

The technical scheme in the embodiment of the application aims to solve the problem that the enterprise file is difficult to further classify and process by the enterprise file-based digital management method in the prior art, and the general thought is as follows:

acquiring enterprise files needing digital management, wherein the enterprise files comprise text files, picture files, audio recording files and video recording files; performing digital processing on the enterprise file to obtain a digital file of the enterprise document; extracting keywords from the digitized file and classifying the digitized file; and (5) according to the classification labels, archiving and sorting. Word segmentation processing is carried out on the text file, and a plurality of classification labels are obtained; the word segmentation process carries out word segmentation through a word segmentation model, and mainly comprises the following steps: and (3) feature selection: converting Chinese text in the text file into a text sequence, taking each character in the text sequence as a state, and extracting character features, wherein the character features comprise a current character, a previous character and a next character; model training: training a word segmentation model according to the set training corpus to obtain parameters including transition probabilities among states and conditional probabilities among states and features; word segmentation prediction: and predicting a new text sequence by using the trained word segmentation model to obtain a word segmentation sequence serving as a word segmentation label. The method for digitizing the text file comprises the following steps: acquiring a text file, and acquiring image data of the text file through shooting or scanning; preprocessing an image, namely preprocessing image data, wherein the preprocessing step comprises the steps of adjusting definition, denoising, self-adaptive binarization and text direction detection; dividing characters after image pretreatment into single characters or word blocks for processing; character recognition, namely recognizing single characters or word blocks by using an OCR technology, and converting the single characters or word blocks into a text form; and (3) performing post-processing on the text result obtained by recognition, wherein the post-processing comprises formatting, correction and normalization, so that the accuracy and the accuracy of text recognition are improved. Extracting keywords from the picture files and classifying the picture files; acquiring a picture file; classifying pictures, namely classifying objects in a picture file by using a deep learning model, identifying different objects in the picture, including vehicles, buildings and people by training the model, and generating keywords; object detection, namely detecting the region in the picture file by using an object detection algorithm, identifying the positions of a plurality of objects and generating keywords; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text according to a result of picture identification in picture classification and an object detection result in combination with a text analysis function in a natural language processing technology, extracting keywords, and sorting and screening the keywords according to the occurrence frequency and weight characteristics of the keywords. Extracting keywords from the record files and classifying the record files; acquiring a recording file; preprocessing the recording, namely preprocessing the recording file, wherein the preprocessing comprises audio format conversion, noise reduction and volume normalization; voice recognition, which is to use an automatic voice recognition technology to perform recording recognition on the preprocessed recording file and convert an audio signal in the recording into a text form; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text by using a natural language processing technology according to a voice recognition result, extracting keywords, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords. Extracting keywords from the video files and classifying the video files; acquiring a video file; preprocessing, namely converting a video file into an image sequence, and preprocessing the image sequence, wherein the preprocessing comprises cutting, scaling and denoising operations; video analysis, which is to use a deep learning model to perform video analysis on the processed video files, and to classify and detect scenes and objects in the video; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text according to a video analysis result and a text analysis function in a natural language processing technology, extracting keywords and phrases, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.

As shown in fig. 1, a flowchart of a method for digitally managing enterprise files according to an embodiment of the present application is shown, where the method includes the following steps: acquiring enterprise files needing digital management, wherein the enterprise files comprise text files, picture files, audio recording files and video recording files; performing digital processing on the enterprise file to obtain a digital file of the enterprise document; extracting keywords from the digitized file, and classifying the digitized file, wherein the keyword extraction comprises word segmentation processing on the text file to obtain a plurality of classification labels; and (5) according to the classification labels, archiving and sorting.

Further, word segmentation is carried out by a word segmentation model in word segmentation processing, and the main steps are as follows: and (3) feature selection: converting Chinese text in the text file into a text sequence, taking each character in the text sequence as a state, and extracting character features, wherein the character features comprise a current character, a previous character and a next character; model training: training a word segmentation model according to the set training corpus to obtain parameters including transition probabilities among states and conditional probabilities among states and features; word segmentation prediction: predicting a new text sequence by using the trained word segmentation model to obtain a word segmentation sequence serving as a word segmentation label; the transition probability between states is calculated by the following formula: Wherein P is<y _i |y _i-1 >Representing the previous state y _i-1 Current state y _i Conditional probability f of (f) _k (y _i-1 ,y _i ) Representing the kth characteristic function at y _i-1 And y _i Lower value, lambda _k The weight of the kth feature function, i, i=1, 2, 3. The conditional probability between the state and the feature is calculated by the following formula: />Wherein P is<y _i |y _i-1 ,x>Representation ofConditional probability between state and feature, where x represents input sequence, P<y _i |y _i-1 ,x,i>Representing a given input sequence P<y _i |x>And the labeling state y of the previous Chinese character _i-1 When the current labeling state of the Chinese character x is y _i Conditional probability of->Is given the state y of the previous Chinese character _i-1 And under the condition of inputting sequence x, the current Chinese character state y _i Is the sum of the probabilities of (a); through P<y _i |y _i-1 ,x>And obtaining all labeling states of the current Chinese character and the corresponding probabilities thereof, and selecting the state with the maximum probability value as the labeling state of the current Chinese character by the model.

In this embodiment, the features are represented as states in feature selection, for example, whether the features are the beginning and the end of a word, and an optimization algorithm such as gradient descent is required in the model training process, so that the prediction result of the model on training data is the same as or has high similarity to the labeling result, the CRF word segmentation decodes the whole sequence by using the Viterbi algorithm, and the context information is considered to output the optimal word segmentation result. In actual training, the CRF word segmentation adopts methods such as maximum likelihood estimation or regularized maximum likelihood estimation to learn parameters, so that the prediction accuracy of the model on training data is the highest.

Further, as shown in fig. 2, in the method for digitally managing enterprise files according to the embodiment of the present application, a process diagram of performing digital processing on a text file is shown, where the process diagram includes the following steps: acquiring a text file, and acquiring image data of the text file through shooting or scanning; preprocessing an image, namely preprocessing image data, wherein the preprocessing step comprises the steps of adjusting definition, denoising, self-adaptive binarization and text direction detection; dividing characters after image pretreatment into single characters or word blocks for processing; character recognition, namely recognizing single characters or word blocks by using an OCR technology, and converting the single characters or word blocks into a text form; and (3) performing post-processing on the text result obtained by recognition, wherein the post-processing comprises formatting, correction and normalization, so that the accuracy and the accuracy of text recognition are improved.

In this embodiment, text segmentation is implemented using vertical, horizontal and diagonal scanning techniques, in combination with binarization, and text recognition is implemented using template matching or neural network-based learning algorithms.

Further, extracting keywords from the picture files and classifying the picture files; acquiring a picture file; classifying pictures, namely classifying objects in a picture file by using a deep learning model, identifying different objects in the picture, including vehicles, buildings and people by training the model, and generating keywords; object detection, namely detecting the region in the picture file by using an object detection algorithm, identifying the positions of a plurality of objects and generating keywords; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text according to a result of picture identification in picture classification and an object detection result in combination with a text analysis function in a natural language processing technology, extracting keywords, and sorting and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

In this embodiment, the deep learning model in the image classification uses a Convolutional Neural Network (CNN), the object detection algorithm in the object detection uses a target detection method based on deep learning, and the detection algorithm is based on various models, such as Yolo, SSD, fasterR-CNN, etc.

Further, extracting keywords from the record files and classifying the record files; acquiring a recording file; preprocessing the recording, namely preprocessing the recording file, wherein the preprocessing comprises audio format conversion, noise reduction and volume normalization; voice recognition, which is to use an automatic voice recognition technology to perform recording recognition on the preprocessed recording file and convert an audio signal in the recording into a text form; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text by using a natural language processing technology according to a voice recognition result, extracting keywords, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

In this embodiment, the automatic speech recognition technology is generally constructed by using a deep learning model and a speech feature extraction algorithm, such as Convolutional Neural Network (CNN), long short-term memory network (LSTM), mel-frequencycepstralcoefficients (MFCC), and the like, and by training the model, keywords and speech instructions in speech are recognized.

Further, extracting keywords from the video files and classifying the video files; acquiring a video file; preprocessing, namely converting a video file into an image sequence, and preprocessing the image sequence, wherein the preprocessing comprises cutting, scaling and denoising operations; video analysis, which is to use a deep learning model to perform video analysis on the processed video files, and to classify and detect scenes and objects in the video; keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text according to a video analysis result and a text analysis function in a natural language processing technology, extracting keywords and phrases, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

In this embodiment, the video is converted into a sequence of images, i.e. a continuous video stream is converted into a set of continuous still images, using the OpenCV library of Python or FFmpeg command line tool, the video analysis is by means of a Convolutional Neural Network (CNN) based video recognition model, also using object detection algorithms such as YOLO, SSD, maskR-CNN based on deep learning, etc.

Furthermore, the method for digitally managing enterprise files uses an extracted file verification mode when extracting the digitized files after archiving and finishing, and comprises the following specific steps: receiving a digital file to be stored, acquiring management object information, receiving the digital file to be stored, and acquiring management object information used for representing first authority information of the digital file; the digital files to be stored also comprise management authority information corresponding to the management object information, wherein the management authority information is used for representing the authority level of the digital files and is corresponding to the user access authority of the storage environment using objects and used for limiting the access and acquisition of different using objects to different digital files; judging the encryption state and executing the encryption program: judging the encryption state of the digitized file to be stored, if the file is not encrypted, executing an encryption program, acquiring a corresponding object key and an equipment verification key by the encryption program based on management object information, and encoding and encrypting the digitized file and other related information by using the equipment verification key to acquire an encrypted storage file, wherein the equipment verification key is used for representing hardware verification information; responding to the digitized file acquisition request: accessing the digitized file, sending a digitized file acquisition request, receiving the request and acquiring hardware verification information and management object information of a request object; comparing and judging according to the object of the digital file acquisition request and the management object information, generating a judging result, acquiring the authority level of the digital file corresponding to the digital file acquisition request and the management authority information of the request object when the judging result shows that the digital file acquisition request is different, generating a file transmission request if the management authority information corresponds to the authority level, transmitting the file transmission request to the management object corresponding to the digital file, acquiring feedback information, decrypting the digital file based on the feedback information and transmitting the digital file to the request object; decrypting the digitized file and verifying: decrypting the obtained encrypted storage document based on the management object information and the hardware verification information, calculating a group of hardware verification information comparison groups, comparing and judging, and generating a verification result; outputting or executing request feedback: and outputting the decrypted digital file if the verification result is passed, executing a request feedback program if the verification result is not passed, and responding to the digital file acquisition request through the management object.

In this embodiment, the management object information refers to some key information for authenticating the validity and authority of the digitized file, including content such as the outgoing place, the attribution, the file format, the creation time, the modification time, and the access rule of the digitized file, where the management object information for representing the attribution information of the digitized file generally includes identification information such as the name, the identification card number, and the organization code of the person or the enterprise to which the digitized file belongs.

In this embodiment, by verifying the management object information and encrypting the device hardware information, the system ensures that only authorized users access the digitized file, thereby improving the security and confidentiality of the digitized file, and at the same time, the system also prevents malicious users from tampering or damaging the digitized file, and protects the integrity and reliability of the file.

Further, as shown in fig. 3, a schematic structural diagram of the enterprise file-based digital management system provided in the embodiment of the present application includes an obtaining module, a processing module, a classifying module and an archiving and sorting module; the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring enterprise files which are digitally managed, and the enterprise files comprise text files, picture files, audio recording files and video recording files; the processing module is used for carrying out digital processing on the enterprise file to obtain a digital file of the enterprise file; the classification module is used for extracting keywords from the digitized file and classifying the digitized file, and comprises the steps of word segmentation processing on the text file to obtain a plurality of classification labels; and the archiving and arranging module is used for archiving and arranging according to the classification labels.

Further, an embodiment of the present application provides a computer readable storage medium storing a program, where the program when executed by a processor implements a method for digitally managing enterprise files.

In summary, in this embodiment, by providing: acquiring an enterprise file to be digitally managed, and performing digital processing on the enterprise file to obtain a digital file of the enterprise document; the method has the advantages that the digital files are extracted in a keyword mode and classified, and according to classification labels, archiving and sorting are carried out, so that the problem that in the prior art, enterprise files are difficult to further classify and process in the digital management method of the enterprise files is effectively solved.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A digitalized management method based on enterprise files is characterized by comprising the following steps:

acquiring enterprise files needing digital management, wherein the enterprise files comprise text files, picture files, audio recording files and video recording files;

performing digital processing on the enterprise file to obtain a digital file of the enterprise file;

extracting keywords from the digitized file and classifying the digitized file, wherein the keyword extraction comprises word segmentation processing on the text file to obtain a plurality of classification labels;

according to the classification labels, archiving and sorting are carried out;

the word segmentation process carries out word segmentation through a word segmentation model, and comprises the following steps:

and (3) feature selection: converting Chinese text in a text file into a text sequence, taking each character in the text sequence as a state, and extracting character features, wherein the character features comprise a current character, a previous character and a next character;

Model training: training a word segmentation model according to the set training corpus to obtain parameters including transition probabilities among states and conditional probabilities among states and features;

word segmentation prediction: predicting a new text sequence by using the trained word segmentation model to obtain a word segmentation sequence serving as a word segmentation label;

the transition probability between states is calculated by the following formula:

wherein P is<y _i |y _i-1 >Representing the previous state y _i-1 Current state y _i Conditional probability f of (f) _k (y _i-1 ,y _i ) Representing the kth characteristic function at y _i-1 And y _i Lower value, lambda _k The weight representing the kth feature function, i representing the ith state, i=1, 2,3, a.j.k represents a k-th feature function, k=1, 2,3 a.n.;

the conditional probability between the state and the feature is calculated by the following formula:

wherein P is<y _i |y _i-1 ,x>Representing conditional probabilities between states and features, where x represents the input sequence, P<y _i |y _i-1 ,x,i>Representing a given input sequence P<y _i |x>And the labeling state y of the previous Chinese character _i-1 When the current labeling state of the Chinese character x is y _i Is a function of the conditional probability of (1),is given the state y of the previous Chinese character _i-1 And under the condition of inputting sequence x, the current Chinese character state y _i Is the sum of the probabilities of (a);

the text file is digitized, which comprises the following steps:

Acquiring a text file, and acquiring image data of the text file through shooting or scanning;

preprocessing an image, namely preprocessing the image data, wherein the preprocessing step comprises the steps of adjusting definition, denoising, self-adaptive binarization and text direction detection;

dividing characters after image pretreatment into single characters or word blocks for processing;

character recognition, namely recognizing the single character or word block by using an OCR technology, and converting the single character or word block into a text form;

and (5) performing text post-processing, namely performing data processing on the text result obtained by recognition.

2. The method for digitally managing an enterprise archive of claim 1, wherein: extracting keywords from the picture files and classifying the picture files;

acquiring a picture file;

classifying pictures, namely classifying objects in the picture files by using a deep learning model, identifying different objects in the pictures, including vehicles, buildings and people by training the model, and generating keywords;

object detection, namely detecting the region in the picture file by using an object detection algorithm, identifying the positions of a plurality of objects and generating keywords;

And extracting keywords, namely performing word segmentation, part-of-speech tagging and grammar analysis on texts according to the picture identification result and the object detection result in the picture classification and by combining a text analysis function in a natural language processing technology, and extracting keywords, wherein the ranking and screening of the keywords are obtained according to the occurrence frequency and weight characteristics of the keywords.

3. The method for digitally managing an enterprise archive of claim 1, wherein: extracting keywords from the record files and classifying the record files;

acquiring a recording file;

preprocessing the recording, namely preprocessing the recording file, wherein the preprocessing comprises audio format conversion, noise reduction and volume normalization;

voice recognition, which is to use an automatic voice recognition technology to perform recording recognition on the preprocessed recording file and convert an audio signal in the recording into a text form;

keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text by using a natural language processing technology according to a voice recognition result, extracting keywords, and sequencing and screening the keywords according to the occurrence frequency and weight characteristics of the keywords.

4. The method for digitally managing an enterprise archive of claim 1, wherein: extracting keywords from the video files and classifying the video files;

Acquiring a video file;

preprocessing, namely converting the video record file into an image sequence, and preprocessing the image sequence, wherein the preprocessing comprises cutting, scaling and denoising operations;

video analysis, which is to use a deep learning model to perform video analysis on the processed video files, and to classify and detect scenes and objects in the video;

keyword extraction, namely performing word segmentation, part-of-speech tagging and grammar analysis on a text according to a video analysis result and a text analysis function in a natural language processing technology, and extracting keywords and phrases, wherein the ranking and screening of the keywords are obtained according to the occurrence frequency and weight characteristics of the keywords.

5. The method for digitally managing an enterprise archive of claim 1, wherein: the method for digitally managing the enterprise files based on the file comprises the following specific steps of:

receiving a digital file to be stored, acquiring management object information, receiving the digital file to be stored, and acquiring management object information used for representing first authority information of the digital file;

the digital files to be stored also comprise management authority information corresponding to the management object information, wherein the management authority information is used for representing the authority level of the digital files and corresponds to the user access authority of the storage environment using objects and is used for limiting the access and acquisition of different using objects to different digital files;

Judging the encryption state and executing the encryption program: judging the encryption state of the digitized file to be stored, if the file is not encrypted, executing an encryption program, acquiring a corresponding object key and a device verification key based on management object information by using the encryption program, and encoding and encrypting the digitized file and other related information by using the device verification key to acquire an encrypted stored file, wherein the device verification key is used for representing hardware verification information;

responding to the digitized file acquisition request: accessing the digitized file, sending a digitized file acquisition request, receiving the request and acquiring hardware verification information and management object information of a request object;

comparing and judging according to the object of the digital file acquisition request and the management object information, generating a judging result, acquiring the authority level of the digital file corresponding to the digital file acquisition request and the management authority information of the request object when the judging result shows that the digital file acquisition request is different, generating a file transmission request if the management authority information corresponds to the authority level, transmitting the file transmission request to the management object corresponding to the digital file, acquiring feedback information, decrypting the digital file based on the feedback information and transmitting the digital file to the request object;

Decrypting the digitized file and verifying: decrypting the obtained encrypted storage document based on the management object information and the hardware verification information, calculating a group of hardware verification information comparison groups, comparing and judging, and generating a verification result;

outputting or executing request feedback: and outputting the decrypted digital file if the verification result is passed, executing a request feedback program if the verification result is not passed, and responding to the digital file acquisition request through the management object.

6. The enterprise archive based digital management method of claim 5, wherein: generating a key acquisition request according to the management object information and sending the key acquisition request to the management object to acquire the biological identification and the equipment hardware information of the management object, wherein the biological identification and the equipment hardware information comprise an equipment mainboard number and an equipment hardware address;

generating an object key by using preset biological identification information, encrypting equipment hardware information by using the key, and generating an equipment verification key;

combining the data to be stored and the equipment verification key, encrypting by using the equipment verification key, and generating an encrypted storage document;

the step of responding to the digitized file acquisition request specifically comprises the following steps: receiving a request and responding, acquiring hardware verification information and management object information of a request object, generating a corresponding object key based on the management object information, and encrypting the hardware verification information through the object key so as to generate an equipment verification key;

And decrypting the encrypted and stored document by using the device verification key, wherein if the decryption fails, the device verification key indicates that the request object or the request device has errors, the device verification key and the data to be stored are generated if the decryption is successful, the device verification key is decrypted again by using the object key to acquire hardware verification information, the hardware verification information is compared and judged, the required digital file is provided for the request object if the verification passes, and the request is refused if the verification does not pass.

7. A system for applying the enterprise archive based digital management method of any one of claims 1-6, comprising an acquisition module, a processing module, a classification module, and an archive finishing module;

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring digitally managed enterprise files, and the enterprise files comprise text files, picture files, audio files and video files;

the processing module is used for carrying out digital processing on the enterprise file to obtain a digital file of the enterprise file;

the classification module is used for extracting keywords from the digitized file and classifying the digitized file, and comprises the steps of performing word segmentation on the text file to obtain a plurality of classification labels;

And the archiving and arranging module is used for archiving and arranging according to the classification labels.

8. A computer readable storage medium having stored thereon a program, which when executed by a processor, implements the steps of the enterprise archive data processing method of any one of claims 1 to 6.