CN111611211A - File importing and archiving method, electronic equipment and storage medium - Google Patents
File importing and archiving method, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111611211A CN111611211A CN202010346888.XA CN202010346888A CN111611211A CN 111611211 A CN111611211 A CN 111611211A CN 202010346888 A CN202010346888 A CN 202010346888A CN 111611211 A CN111611211 A CN 111611211A
- Authority
- CN
- China
- Prior art keywords
- file
- files
- compressed package
- keyword
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000008676 import Effects 0.000 claims abstract description 55
- 238000012549 training Methods 0.000 claims description 38
- 239000013598 vector Substances 0.000 claims description 29
- 238000013145 classification model Methods 0.000 claims description 20
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000013475 authorization Methods 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 7
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012356 Product development Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/168—Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an artificial intelligence intelligent decision making, and provides a file importing and archiving method, which comprises the following steps: transmitting a file in a compressed package, the compressed package comprising one or more files; reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type; judging whether the file name meets the name rule or not according to the identified effective parameters; displaying the import failure and reason of the file which does not accord with the naming rule; and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names. The invention also provides an electronic device and a storage medium. The invention can automatically file the files. The invention also relates to a blockchain technique, the compressed packets being stored in a blockchain.
Description
Technical Field
The present invention relates to the field of artificial intelligence intelligent decision making technology, and more particularly, to a file importing and archiving method, an electronic device, and a storage medium.
Background
In the prior art, a file filing and uploading interface generally enables a user to transmit required files to an Asset-backed security (ABS) system one by one for many times, and this interaction behavior not only greatly increases the use labor intensity of the user, but also prolongs unnecessary operation time. For the reading of the system, the method also cannot uniformly and completely analyze the archived data, and the defects of omission or incomplete reading of multiple data transmission and the like may exist. Moreover, the user may be confused by the fact that the user cannot effectively transmit data at one time, for example, an Asset-backed security system (ABS) needs to file data uploaded by the user into different directories, only one file can be filed into a corresponding directory at a time, multiple files need to be repeatedly read and filed when filed, repeated and mechanical behavior scenes appear in the using process, and in addition, the user cannot know how many files are uploaded and cannot know whether the uploading is successful, that is, the user cannot quickly know the current behavior schedule and system feedback. Therefore, the files in the prior art cannot be filed uniformly at one time, the reason why the files cannot be filed cannot be displayed, manual customer service is needed, the user experience comfort level is poor, especially, the files in the research and development of complex products are filed, functional iteration is easy to miss, the cost of product development is increased, and the benefit is reduced.
Disclosure of Invention
In view of the foregoing problems, it is an object of the present invention to provide a file import filing method, an electronic device, and a storage medium that can automatically file a file.
In order to achieve the above object, the present invention provides an electronic device, including a memory and a processor, wherein the memory stores a file import archive program, and the file import archive program implements the following steps when executed by the processor:
transmitting a file in a compressed package, the compressed package comprising one or more files;
reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type;
judging whether the file name meets the name rule or not according to the identified effective parameters;
displaying the import failure and reason of the file which does not accord with the naming rule;
and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names.
In addition, in order to achieve the above object, the present invention further provides a file importing and archiving method, including:
transmitting a file in a compressed package, the compressed package comprising one or more files;
reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type;
judging whether the file name meets the name rule or not according to the identified effective parameters;
displaying the import failure and reason of the file which does not accord with the naming rule;
and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names.
In one embodiment, the step of displaying the import failure and reason of the file which does not meet the naming rule includes: all file names and reasons for import failures are shown in the form of a popup.
Preferably, the pop-up windows include a first pop-up window, a second pop-up window and a third pop-up window, the first pop-up window is used for displaying the import failure and reason on the client uploading the file, the second pop-up window is used for providing the client uploading the file with an option of ignoring the error and importing the file, and the third pop-up window is used for providing the client with a selection of re-uploading.
Further, preferably, the step of displaying the import failure and reason of the file which does not conform to the naming rule further includes: when the number of files failed to be imported is less than the set number, popping up a second popup window; and popping up a third popup window when the number of the files which fail to be imported is not less than the set number.
In one embodiment, the compressed package is stored in a blockchain, and the step of archiving files meeting the naming rules to corresponding classification items comprises:
extracting keywords of the files in the compressed package according to a naming rule, obtaining characteristic values of the keywords of the files in the compressed package through a vector space model, wherein the characteristic values of all the keywords of each file form a file characteristic vector;
constructing a classification model based on the feature vectors;
and inputting the characteristic vectors of the files in the compressed package into a classification model to prejudge the files in the compressed package, and filing the files in the compressed package into corresponding classification items.
Preferably, the constructing of the feature vector-based classification model includes:
screening out a set number of filed files matched with each file according to the number of the common keywords to serve as a training set of each file;
the characteristic vector of the filed file of the training set of each file forms a characteristic matrix of the training set, and the classification items of the filed file form a classification item matrix of the training set;
obtaining the keyword filing probability of filing each keyword of the file in each classification item through the feature matrix and the classification item matrix of the training set of the file;
screening out the classification item with the highest keyword filing probability as the optimal classification item of the keyword;
obtaining the file filing probability of the file belonging to each classification item according to the optimal classification item of each keyword of the file and the keyword filing probability corresponding to the optimal classification item, constructing a classification model,
wherein the step of archiving the files in the compressed package to the corresponding classification items comprises:
and displaying the classification items according to the sequence of the file filing probability from high to low for the client to select, or directly filing the file to the classification item with the highest file filing probability.
In one embodiment, the content item further comprises an authorization identifier of the compressed package or an authorization identifier of each file in the compressed package, wherein the authorization identifier is a unique identifier of the client.
Further, preferably, the constructing step of the feature vector of the archived files of the training set of each file includes:
obtaining the word frequency of each keyword of the archived files of the training set of each file by the following formula
Wherein, TF (W')m,d'n) Is the m-th keyword WmWith respect to archived file dnWord frequency, count (W')m,d'n) As a keyword WmIn archived file dnNumber of occurrences in, count (d')n) For archived files dnThe sum of the occurrence times of all the keywords;
obtaining an inverse word frequency for each keyword of the archived files of the training set for each file by
Wherein, IDF (W')m) As a keyword WmInverse word frequency of dmTo appear the keyword WmThe archived file of (d), count (d)m) For appearing offKey word WmN is the number of the filed files of the screened training set;
obtaining the word frequency-inverse word frequency of each keyword of the filed files of the training set of each file as the characteristic value of the keyword, wherein the characteristic value of each keyword of each filed file forms a characteristic vector
TFIDF(W'm,d'n)=TF(W'm,d'n)*IDF(W'm)
Wherein TFIDF (W')m,d'n) As a keyword WmIn archived file dnWord frequency-inverse word frequency.
In addition, in order to achieve the above object, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a file import filing program, and the file import filing program, when executed by a processor, implements the steps of the file import filing method.
The file importing and filing method, the electronic equipment and the storage medium transmit a single compressed package file in a compressed package form, read all content items in the transmitted compressed package, identify effective parameters, classify and place files of different classes into corresponding classification items at one time, realize automatic matching and dynamic distribution of the files in the compressed package, display import failure and reasons of the files which do not accord with naming rules, feed back the files which are not successfully read to a client in time, apply the technologies of artificial intelligence matching and the like to each service scene, and reduce repeated, single and multiple use scenes.
Drawings
FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a file import archiving method according to the present invention;
FIG. 2 is a block diagram of a preferred embodiment of the file import archive program of FIG. 1;
FIG. 3 is a flowchart illustrating a file import archiving method according to a preferred embodiment of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention provides a file importing and archiving method which is applied to electronic equipment. Referring to fig. 1, a schematic diagram of an application environment of a preferred embodiment of the file importing and archiving method of the present invention is shown.
In the present embodiment, the electronic device 1 may be a terminal client having an arithmetic function, such as a server, a mobile phone, a tablet computer, a portable computer, or a desktop computer.
The electronic device 1 comprises a memory 11, a processor 12, a network interface 13 and a communication bus 14.
The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1.
In this embodiment, the readable storage medium of the memory 11 is generally used for storing the file import archive program 10 and the like installed in the electronic device 1. The memory 11 may also be used to temporarily store data that has been output or is to be output.
The processor 12 may be a Central Processing Unit (CPU), microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the file import archive program 10.
The network interface 13 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication connection between the electronic device 1 and other electronic clients.
The communication bus 14 is used to enable connection communication between these components.
Fig. 1 only shows the electronic device 1 with components 11-14, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.
Optionally, the electronic device 1 may further include a user interface, the user interface may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other client with a voice recognition function, a voice output device such as a sound box, a headset, and the like, and optionally the user interface may further include a standard wired interface or a wireless interface.
Optionally, the electronic device 1 may further comprise a display, which may also be referred to as a display screen or a display unit.
In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
Optionally, the electronic device 1 further comprises a touch sensor. The area provided by the touch sensor for the user to perform touch operation is called a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.
Optionally, the electronic device 1 may further include logic gates, sensors, audio circuits, and the like, which are not described herein.
In the apparatus embodiment shown in fig. 1, a memory 11 as a kind of computer storage medium may include an operating system and a file import archive program 10 therein; the processor 12 executes the file import archive program 10 stored in the memory 11 to implement the following steps:
transmitting a file in a compressed package, the compressed package comprising one or more files;
reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type;
judging whether the file name meets the name rule or not according to the identified effective parameters;
displaying the import failure and reason of the file which does not accord with the naming rule;
and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names. It is emphasized that the compressed packet may also be stored in a node of a blockchain in order to further ensure the privacy and security of the compressed packet.
In other embodiments, the file import archive program 10 may also be divided into one or more modules, and one or more modules are stored in the memory 11 and executed by the processor 12 to implement the present invention. The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions. Referring to FIG. 2, a functional block diagram of a preferred embodiment of the file import archive program 10 of FIG. 1 is shown. The file importing and archiving program 10 may be divided into a transmission module 110, a parameter reading module 120, a determination module 130, a display module 140, and an archiving module 150, wherein:
a transmission module 110 that transmits a file in a compressed package including one or more files;
the parameter reading module 120 is configured to read all content items of the transmitted compressed package, and identify effective parameters, where the content items include names of the compressed packages, formats of the compressed packages, names and types of files in the compressed packages, and source identifiers of the compressed packages, and the effective parameters include file names or/and file types; (ii) a
The judging module 130 judges whether the file name meets the name rule according to the identified effective parameters, if not, sends a signal to the display module 140, and if so, sends a signal to the filing module 150;
the display module 140 displays the import failure and reason of the file which does not accord with the naming rule;
the filing module 150 files the files meeting the naming rule to the corresponding classification items, wherein the classification items are classification labels classified according to file names.
In an alternative embodiment, the display module 140 displays all file names and reasons of the import failure in a popup window.
Preferably, the display module 140 includes:
the first popup window setting unit is used for setting a first popup window, and the first popup window is used for displaying import failure and reasons on a client side of an uploaded file;
the second popup setting unit is used for providing options for ignoring errors and importing the files for the client side which uploads the files;
and the third popup setting unit is used for setting a third popup, and the third popup is used for providing a selection of uploading again for the client.
Further, preferably, the display module 140 further includes:
the pop-up window selection unit sends a signal to the second pop-up window setting unit when the number of the files which are failed to be imported is less than the set number, and a second pop-up window is popped up at the client; and when the number of the files which fail to be imported is not less than the set number, sending a signal to a third popup setting unit, and popping up a third popup at the client.
In an alternative embodiment, the parameter reading module 120 includes:
the reading unit reads the compressed packet name and format;
the storage unit is used for storing the read compressed packet in a file form;
a format conversion unit for converting the stored compressed packet into a set compression format;
and the circulating unit is used for circularly reading the compressed packet after the format conversion and acquiring each file in the compressed packet.
In one embodiment, the archive module 150 includes:
the characteristic vector obtaining unit is used for extracting the keywords of each file according to a naming rule, constructing characteristic values of the keywords of the files in the compressed package through a vector space model to form characteristic vectors, and constructing the characteristic values of all the keywords of each file to form the characteristic vectors of the files;
the model construction unit is used for constructing a classification model based on the feature vector;
and the filing unit is used for prejudging the files in the compressed package through the classification model and filing the files in the compressed package to the corresponding classification items.
Preferably, the model construction unit comprises:
a training set constructing subunit, which screens out a set number of filed files matched with each file according to the number of the common keywords as a training set of each file;
the matrix construction subunit is used for constructing a characteristic matrix of the training set by the characteristic vector of the filed file of the training set of each file, and constructing a classification item matrix of the training set by the classification items of the filed file;
a keyword filing probability obtaining subunit, configured to obtain, through the feature matrix and the classification item matrix of the training set of the file, a keyword filing probability that each keyword of the file is filed in each classification item;
the screening subunit screens out the classification item with the highest keyword filing probability as the optimal classification item of the keyword;
a file filing probability obtaining subunit, which obtains the file filing probability of the file belonging to each classification item according to the optimal classification item of each keyword of the file and the keyword filing probability corresponding to the optimal classification item, constructs a classification model,
the filing unit displays the classification items according to the sequence of the filing probability from high to low, so that the client can select the classification item for filing the file to the display, or directly files the file to the classification item with the highest filing probability.
The electronic equipment can perform timely visual interactive behavior feedback, layer-by-layer matching and loop-by-layer buckling from a technical layer to an interface layer to an emotion layer are achieved, a brand-new full-flow interface guide feedback system is guaranteed for a user, and good closed-loop experience of technical ecology and emotion ecology is achieved.
In addition, the invention also provides a file importing and archiving method. Referring to FIG. 3, a flow chart of a preferred embodiment of the present invention for file import archiving is shown. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the file import archiving method includes:
step S1, transmitting the file in the form of a compressed package, the compressed package including one or more files;
step S2, reading all content items of the transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and type of the file in the compressed package, and the source identifier (such as IP address) of the compressed package, and the effective parameters comprise the file name or/and the file type;
step S3, determining whether the file name meets the name rule according to the identified valid parameter, for example, the compressed package support format includes zip and 7z, the compressed package includes one or more files doc or docx named according to the name specification, the name rule is: product name + file name +6 year, month and day, e.g. "a product plan Specification 20190101";
step S4, the file which does not conform to the naming rule is displayed with import failure and reason, preferably, all file names and reasons of import failure are displayed in a popup window form, and the reasons comprise repeated file naming, non-standard file naming, non-support file format and the like;
step S5, the file meeting the naming rule is filed to the corresponding classification item, the classification item is the classification label classified according to the file name, for example, the file name includes plan specification, standard terms, hosting agreement, etc., the file name is filled according to the above example naming rule, the file can be read successfully and uploaded to the corresponding file type directory (classification item) in the page, for example, the Asset securitization system (ABS) includes five classification items of standing report, declaration material, audit material, registration and listing, and the file in the compressed package is filed to the corresponding item in the five classification items at one time.
The file importing and archiving method is improved from multiple times to one time in a unified manner in the man-machine interaction behavior, and not only is the use efficiency of a user improved in a leap-forward manner, but also the innovation of the whole technical department is an indispensable effective productivity.
Preferably, the pop-up windows in step S4 include a first pop-up window, a second pop-up window, and a third pop-up window, where the first pop-up window displays the import failure and reason on the client that uploads the file, the second pop-up window displays "ignore and submit" on the client that uploads the file for the client to select, and the third pop-up window displays "re-upload" on the client for the client to select, further preferably, when the number of the files that fail to import is less than the set number, the second pop-up window is popped up, and when the number of the files that fail to import is not less than the set number, the third pop-up window is popped up. In addition, when the client selects the third popup window, the client abandons receiving the instruction of the successfully read file, and the client can modify completely off line and then submit the file in a unified way.
In one embodiment, step S2 includes:
reading the name and format of the compressed packet;
storing the read compressed package in a file form;
converting the stored compressed packet into a set compression format;
and circularly reading the compressed packet after the format conversion, acquiring each file in the compressed packet, crawling the file name or/and the file type of each file, and identifying the effective parameters of the compressed packet.
In one embodiment, step S5 includes:
extracting keywords of the files in the compressed package according to a naming rule, obtaining characteristic values of the keywords of the files in the compressed package through a vector space model, wherein the characteristic values of all the keywords of each file form a characteristic vector of the file;
constructing a classification model based on the feature vectors;
and inputting the characteristic vectors of the files in the compressed package into a classification model to prejudge the files in the compressed package, and filing the files in the compressed package into corresponding classification items.
Preferably, the classification model constructing step includes:
screening out a set number of filed files matched with each file according to the number of the common keywords to serve as a training set of each file;
the characteristic vectors of the filed files of the training set of each file form a characteristic matrix of the training set, and the classification items of the filed files form a classification item matrix of the training set
C=[c1,c2…ca]
Wherein F is a file diC is a file diM is the total number of the feature values of the keywords, n is the set number of the screened filed files, fnmFor archived files dnKeyword WmCharacteristic value of (C)aIs the a classification item;
obtaining the keyword filing probability of filing each keyword of the file in each classification item according to the following formula through the feature matrix and the classification item matrix of the training set of the file
Wherein, fmaAs a keyword WmWith respect to classification item CaCharacteristic value of (1), P (W')m|Ca) Representing a keyword WmIs separately filed in classification item CaThe keyword filing probability of (1), the keyword WmAlso file diThe keyword of (1);
screening out the classification item with the highest keyword filing probability as the optimal classification item of the keyword;
obtaining the file filing probability of the file belonging to each classification item according to the optimal classification item of each keyword of the file and the corresponding keyword filing probability
Wherein, CaTo classify item CaTotal number of keywords, n, as the best classification termaFor archiving to classification items CaM is the total number of keywords of (1), M is the file diTotal number of keywords of (1), WjAs a file diThe jth keyword PdiCaPresentation document diArchive to classification item CaThe file archiving probability of (a);
and (4) displaying the files to be filed to all classification items according to the sequence of the filing probability from high to low for the client to select, or directly filing the files to the classification item with the highest filing probability.
Further, preferably, the constructing step of the feature vector of the archived files of the training set of each file includes:
obtaining a word frequency of each keyword of the archived files of the training set of each file through the following formula;
wherein, TF (W')m,d'n) Is the m-th keyword WmWith respect to archived file dnWord frequency, count (W')m,d'n) Is to turn offKey word WmIn archived file dnNumber of occurrences in, count (d')n) For archived files dnThe sum of the occurrence times of all the keywords;
obtaining an inverse word frequency of each keyword of the archived files of the training set of each file by the following formula;
wherein, IDF (W')m) As a keyword WmInverse word frequency of dmTo appear the keyword WmThe archived file of (d), count (d)m) For appearance of the occurrence of the keyword WmThe number of archived files;
obtaining the word frequency-inverse word frequency of each keyword of the filed files of the training set of each file as the characteristic value of the keyword, wherein the characteristic value of each keyword of each filed file forms a characteristic vector
TFIDF(W'm,d'n)=TF(W'm,d'n)*IDF(W'm)
Wherein TFIDF (W')m,d'n) As a keyword WmIn archived file dnWord frequency-inverse word frequency.
In addition, preferably, the method further comprises:
dividing the data of the correct historical filing file, taking one part of the data as a training set of a classification model, training model parameters, and taking the other part of the data as a test set of a performance test part of the classification model;
and (3) evaluating and judging the performance of the classification model through a test set, specifically:
the accuracy P of the classification model file archiving is evaluated according to the following formula,
the recall rate R of the classification model file archive is evaluated according to the following formula,
evaluating F-parameters of a classification model file archive according to
Where β is a trade-off factor between accuracy P and recall R.
In an optional embodiment, each file in the compressed package may not be archived, but the compressed package may be directly archived, that is, the keywords in the compressed package are screened out according to the number of occurrences to obtain a set number of keywords, the archived files in the set number matched with the compressed package are screened out according to the number of common keywords to serve as the training set of the compressed package, and the classification items archived by the compressed package are obtained according to the steps in the previous embodiment.
In an optional example, the content item of the compressed package further includes an authorization identifier of the compressed package or an authorization identifier of each file in the compressed package, where the authorization identifier is a unique identifier of the client, for example, a staff number or an identity card number of an employee is used as the authorization identifier, and when the client queries the archived file, the file with the client authorization identifier is displayed to the client.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a file import archive program, and the file import archive program, when executed by a processor, implements the following steps:
transmitting a file in a compressed package, the compressed package comprising one or more files;
reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type;
judging whether the file name meets the name rule or not according to the identified effective parameters;
displaying the import failure and reason of the file which does not accord with the naming rule;
and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names. It is emphasized that the compressed packet may also be stored in a node of a blockchain in order to further ensure the privacy and security of the compressed packet.
The embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiment of the file importing and archiving method and the electronic device, and will not be described herein again.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal client (e.g., a mobile phone, a computer, a server, or a network client) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A file import archiving method is characterized by comprising the following steps:
transmitting a file in a compressed package, the compressed package comprising one or more files;
reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type;
judging whether the file name meets the name rule or not according to the identified effective parameters;
displaying the import failure and reason of the file which does not accord with the naming rule;
and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names.
2. The method for archiving and importing files according to claim 1, wherein the step of displaying the import failure and reason of the file which does not conform to the naming rule comprises: all file names and reasons for import failures are shown in the form of a popup.
3. The file import filing method according to claim 2, wherein the popup window comprises a first popup window, a second popup window and a third popup window, the first popup window is used for displaying import failure and reason on a client uploading a file, the second popup window is used for providing an option of ignoring an error and importing a file to the client uploading a file, and the third popup window is used for providing a selection of re-uploading to the client.
4. The file import filing method according to claim 3, wherein the step of displaying the import failure and reason of the file which does not conform to the naming rule further comprises: when the number of files failed to be imported is less than the set number, popping up a second popup window; and popping up a third popup window when the number of the files which fail to be imported is not less than the set number.
5. The file import archiving method according to claim 1, wherein the compressed package is stored in a block chain, and the step of archiving the files meeting the naming rule to the corresponding classification items comprises:
extracting keywords of the files in the compressed package according to a naming rule, obtaining characteristic values of the keywords of the files in the compressed package through a vector space model, wherein the characteristic values of all the keywords of each file form a characteristic vector of the file;
constructing a classification model based on the feature vectors;
and inputting the characteristic vectors of the files in the compressed package into a classification model to prejudge the files in the compressed package, and filing the files in the compressed package into corresponding classification items.
6. The file import archiving method according to claim 5, wherein the step of constructing the feature vector based classification model comprises:
screening out a set number of filed files matched with each file according to the number of the common keywords to serve as a training set of each file;
the characteristic vector of the filed file of the training set of each file forms a characteristic matrix of the training set, and the classification items of the filed file form a classification item matrix of the training set;
obtaining the keyword filing probability of filing each keyword of the file in each classification item through the feature matrix and the classification item matrix of the training set of the file;
screening out the classification item with the highest keyword filing probability as the optimal classification item of the keyword;
obtaining the file filing probability of the file belonging to each classification item according to the optimal classification item of each keyword of the file and the keyword filing probability corresponding to the optimal classification item, constructing a classification model,
wherein the step of archiving the files in the compressed package to the corresponding classification items comprises:
and displaying the classification items according to the sequence of the file filing probability from high to low for the client to select, or directly filing the file to the classification item with the highest file filing probability.
7. The file import archiving method according to claim 6, wherein the step of constructing the feature vector of the archived files of the training set of each file comprises:
obtaining the word frequency of each keyword of the archived files of the training set of each file by the following formula
Wherein TF (W'm,d′n) Is the m-th keyword W'mRelative to archived file d'nTerm frequency, count (W'm,d′n) Is a keyword W'mIn archived file d'nNumber of occurrences of, count (d'n) Is an archived file d'nThe sum of the occurrence times of all the keywords;
obtaining an inverse word frequency for each keyword of the archived files of the training set for each file by
Wherein IDF (W'm) Is a keyword W'mInverse word frequency of dmIs the occurrence of keyword W'mThe archived file of (d), count (d)m) To appear keyword W'mN is the number of the filed files of the screened training set;
obtaining the word frequency-inverse word frequency of each keyword of the filed files of the training set of each file as the characteristic value of the keyword, wherein the characteristic value of each keyword of each filed file forms a characteristic vector
TFIDF(W′m,d′n)=TF(W′m,d′n)*IDF(W′m)
Wherein, TFIDF (W'm,d′n) Is a keyword W'mIn archived file d'nWord frequency-inverse word frequency.
8. The file import archiving method according to claim 1, wherein the content item further includes an authorization identifier of the compressed package or an authorization identifier of each file in the compressed package, and the authorization identifier is a unique identifier of the client.
9. An electronic device comprising a memory and a processor, wherein the memory stores a file import archive program, and wherein the file import archive program when executed by the processor implements the steps of:
transmitting a file in a compressed package, the compressed package comprising one or more files;
reading all content items of a transmitted compressed package, and identifying effective parameters, wherein the content items comprise the name of the compressed package, the format of the compressed package, the name and the type of files in the compressed package and the source identification of the compressed package, and the effective parameters comprise the file name or/and the file type;
judging whether the file name meets the name rule or not according to the identified effective parameters;
displaying the import failure and reason of the file which does not accord with the naming rule;
and filing the files meeting the naming rules to corresponding classification items, wherein the classification items are classification labels classified according to file names.
10. A computer-readable storage medium, wherein a file import archive program is included in the computer-readable storage medium, and when executed by a processor, the file import archive program implements the steps of the file import archive method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010346888.XA CN111611211A (en) | 2020-04-27 | 2020-04-27 | File importing and archiving method, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010346888.XA CN111611211A (en) | 2020-04-27 | 2020-04-27 | File importing and archiving method, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111611211A true CN111611211A (en) | 2020-09-01 |
Family
ID=72204445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010346888.XA Pending CN111611211A (en) | 2020-04-27 | 2020-04-27 | File importing and archiving method, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111611211A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112615885A (en) * | 2020-12-29 | 2021-04-06 | 格美安(北京)信息技术有限公司 | Cross-border transmission method based on directory dynamic control and storage device |
CN113220635A (en) * | 2021-05-11 | 2021-08-06 | 深圳市星火数控技术有限公司 | File archiving method, device, equipment and computer readable storage medium |
CN113609069A (en) * | 2021-07-06 | 2021-11-05 | 厦门国际银行股份有限公司 | Document management method, system, terminal device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005352770A (en) * | 2004-06-10 | 2005-12-22 | Fuji Xerox Co Ltd | Document storage device, document storage system, document storage method and program |
CN106250385A (en) * | 2015-06-10 | 2016-12-21 | 埃森哲环球服务有限公司 | The system and method for the abstract process of automated information for document |
CN106708926A (en) * | 2016-11-14 | 2017-05-24 | 北京赛思信安技术股份有限公司 | Realization method for analysis model supporting massive long text data classification |
CN107368526A (en) * | 2017-06-09 | 2017-11-21 | 北京因果树网络科技有限公司 | A kind of data processing method and device |
CN110535890A (en) * | 2018-05-23 | 2019-12-03 | 杭州海康威视系统技术有限公司 | The method and apparatus that file uploads |
CN110716895A (en) * | 2019-09-17 | 2020-01-21 | 平安科技(深圳)有限公司 | Target data archiving method and device, computer equipment and medium |
CN110781303A (en) * | 2019-10-28 | 2020-02-11 | 佰聆数据股份有限公司 | Short text classification method and system |
-
2020
- 2020-04-27 CN CN202010346888.XA patent/CN111611211A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005352770A (en) * | 2004-06-10 | 2005-12-22 | Fuji Xerox Co Ltd | Document storage device, document storage system, document storage method and program |
CN106250385A (en) * | 2015-06-10 | 2016-12-21 | 埃森哲环球服务有限公司 | The system and method for the abstract process of automated information for document |
CN106708926A (en) * | 2016-11-14 | 2017-05-24 | 北京赛思信安技术股份有限公司 | Realization method for analysis model supporting massive long text data classification |
CN107368526A (en) * | 2017-06-09 | 2017-11-21 | 北京因果树网络科技有限公司 | A kind of data processing method and device |
CN110535890A (en) * | 2018-05-23 | 2019-12-03 | 杭州海康威视系统技术有限公司 | The method and apparatus that file uploads |
CN110716895A (en) * | 2019-09-17 | 2020-01-21 | 平安科技(深圳)有限公司 | Target data archiving method and device, computer equipment and medium |
CN110781303A (en) * | 2019-10-28 | 2020-02-11 | 佰聆数据股份有限公司 | Short text classification method and system |
Non-Patent Citations (1)
Title |
---|
潘亮光;曾太;: "基于朴素贝叶斯的法律咨询文本分类方法", 电脑编程技巧与维护, no. 08, 31 August 2018 (2018-08-31), pages 2 - 3 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112615885A (en) * | 2020-12-29 | 2021-04-06 | 格美安(北京)信息技术有限公司 | Cross-border transmission method based on directory dynamic control and storage device |
CN112615885B (en) * | 2020-12-29 | 2023-04-18 | 格美安(北京)信息技术有限公司 | Cross-border transmission method based on directory dynamic control and storage device |
CN113220635A (en) * | 2021-05-11 | 2021-08-06 | 深圳市星火数控技术有限公司 | File archiving method, device, equipment and computer readable storage medium |
CN113609069A (en) * | 2021-07-06 | 2021-11-05 | 厦门国际银行股份有限公司 | Document management method, system, terminal device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230325396A1 (en) | Real-time content analysis and ranking | |
US20240040074A1 (en) | Systems and methods for presenting image classification results | |
US10372791B2 (en) | Content customization | |
US8392472B1 (en) | Auto-classification of PDF forms by dynamically defining a taxonomy and vocabulary from PDF form fields | |
CA2846176C (en) | System and method for generating an informational packet for the purpose of marketing a vehicle to prospective customers | |
US11748070B2 (en) | Systems and methods for generating graphical user interfaces | |
US20080059447A1 (en) | System, method and computer program product for ranking profiles | |
CN110352427B (en) | System and method for collecting data associated with fraudulent content in a networked environment | |
CN111611211A (en) | File importing and archiving method, electronic equipment and storage medium | |
US20150378975A1 (en) | Attribute fill using text extraction | |
CN109274843B (en) | Key prediction method, device and computer readable storage medium | |
US20090222485A1 (en) | Product information system for aggregating and classifying information from multiple sources with update ability | |
CN112085567A (en) | Commodity recommendation method and device, electronic equipment and readable medium | |
US20180349974A1 (en) | System and method for presenting product-specific content on a client device based on a scanned barcode | |
CN113868498A (en) | Data storage method, electronic device, device and readable storage medium | |
US20080071553A1 (en) | Generation of Commercial Presentations | |
KR102322212B1 (en) | Apparatus and method for recommending learning contents | |
CN110244934B (en) | Method and device for generating product demand document and test information | |
US8249735B2 (en) | Method and system for automatically identifying an existing workflow to manufacture a given product type | |
CN113781235B (en) | Data processing method, device, computer equipment and storage medium | |
CN112925550B (en) | App interface updating method, system, device and storage medium | |
US20240070319A1 (en) | Dynamically updating classifier priority of a classifier model in digital data discovery | |
CA2907123A1 (en) | Content customization | |
Precht et al. | Transparency Disclosure for End Consumers in Private Food Supply Chains-A Systematic Literature Review | |
JP2015201059A (en) | Parameter setting support system and parameter setting method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |