CN114398315A - Data storage method, system, storage medium and electronic equipment - Google Patents

Data storage method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN114398315A
CN114398315A CN202111673172.1A CN202111673172A CN114398315A CN 114398315 A CN114398315 A CN 114398315A CN 202111673172 A CN202111673172 A CN 202111673172A CN 114398315 A CN114398315 A CN 114398315A
Authority
CN
China
Prior art keywords
data
preset
text information
data type
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111673172.1A
Other languages
Chinese (zh)
Inventor
刘永召
周德营
何玲燕
周海东
张继梅
王少培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Supcon Technology Co Ltd
Original Assignee
Zhejiang Supcon Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Supcon Technology Co Ltd filed Critical Zhejiang Supcon Technology Co Ltd
Priority to CN202111673172.1A priority Critical patent/CN114398315A/en
Publication of CN114398315A publication Critical patent/CN114398315A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data storage method, a data storage system, a storage medium and electronic equipment. The method comprises the following steps: analyzing the first file to be stored to obtain first data text information; screening out words under the preset data types from the first data text information based on the preset word set of each preset data type to obtain a first word set; calculating the frequency of the vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the association degree of the first data text information and the preset data type based on the association degree coefficient of the first word frequency and each preset word frequency; and comparing the association degrees of the first data text information and each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information. The invention can improve the integrity and the integrity of the data.

Description

Data storage method, system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data storage method, a data storage system, a storage medium, and an electronic device.
Background
With the wider and wider coverage of industrial software, the storage requirement for industrial software development data is gradually increased. Because the industrial software research and development process involves more data types and data are difficult to integrate, the integration degree and the perfection of the research and development data are low at present, the stored data are difficult to search, the data storage is disordered, and the data resources are greatly wasted and the risk of data loss is easily caused. Therefore, how to store data and improve the integrity and integrity of data is an urgent problem to be solved.
Disclosure of Invention
Embodiments of the present invention provide a data storage method, a data storage system, a storage medium, and an electronic device, which can improve the integrity and integrity of data. The specific technical scheme is as follows:
the invention provides a data storage method, which comprises the following steps:
analyzing the first file to be stored to obtain first data text information;
screening out words under the preset data types from the first data text information based on a preset word set of each preset data type to obtain a first word set; the vocabulary in the preset vocabulary set at least comprises the vocabulary in the first vocabulary set, and the vocabulary in the preset vocabulary set has a weighted value under the preset data type;
calculating the occurrence frequency of the vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the association degree of the first data text information and the preset data type based on the association degree coefficient of the first word frequency and each preset word frequency; the preset word frequency is the product of the word frequency of the vocabulary in the preset vocabulary set and the weight value of the corresponding vocabulary;
and comparing the association degree of the first data text information with each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information.
Optionally, after determining a preset data type corresponding to the maximum association degree as the data type of the first data text information, the method further includes:
judging whether the preset data type corresponding to the maximum association degree has a subdata type or not; the subdata type belongs to the preset data type;
if the sub vocabulary set exists, screening out the vocabulary under the sub data type from the first data text information based on the sub vocabulary set under each seed data type to obtain a second vocabulary set; the words in the sub-word set at least comprise the words in the second word set, and the words in the sub-word set have weight values under the sub-data types;
calculating the occurrence frequency of the words in the second word set in the first data text information to obtain a second word frequency, and determining the association degree of the first data text information and the subdata type based on the association degree coefficient of the second word frequency and each subdue word frequency;
and comparing the association degree of the first data text information with each sub-data type, determining the sub-data type corresponding to the maximum association degree as the sub-data type of the first data text information, and storing the first file to be stored based on the data type and the sub-data type of the first data text information.
Optionally, the method further comprises: analyzing the first file to be stored to obtain first data source information and first storage time information;
the method for storing the first file to be stored specifically comprises the following steps:
determining a data type item based on the first data source information and the first data text information;
mapping the data type item, the identifier of the data type of the first data text information and the name of the first file to be stored to generate a data mapping relation;
and storing the first file to be stored based on the data mapping relation and the first storage time information.
Optionally, the method further comprises:
analyzing the second file to be stored to obtain second data source information and second storage time information;
judging whether files with the same name are stored or not;
if so, searching a data mapping relation corresponding to the name of the second file to be stored to obtain a target data mapping relation;
searching a data type item matched with the target data mapping relation to obtain a target data type item;
modifying the target data type item based on the second data text information and the second data source information to obtain a modified data type item;
updating the data mapping relation based on the modified data type item to obtain an updated data mapping relation;
and storing the second file to be stored based on the updated data mapping relation and the second storage time information.
Optionally, the method further comprises:
receiving a file deleting operation request, and analyzing the file deleting operation request to obtain a deleted file name;
judging whether the data request set has storage request information with the same name as the deleted file name;
and if so, not deleting the file with the same name as the deleted file.
Optionally, the determining the association degree between the first data text information and the preset data type based on the association degree coefficient between the first word frequency and each preset word frequency specifically includes:
determining a correlation coefficient between the first word frequency and each preset word frequency based on a grey correlation analysis method;
and taking the average value of the multiple association coefficients as the association degree of the first data text information and the preset data type.
Optionally, the weighted value under the preset data type is obtained based on an analytic hierarchy process; the method for analyzing the first file to be stored is an optical character recognition method.
The present invention also provides a data storage system comprising:
the analysis module is used for analyzing the first file to be stored to obtain first data text information;
the screening module is used for screening out vocabularies under the preset data types from the first data text information based on the preset vocabulary sets of each preset data type to obtain a first vocabulary set; the vocabulary in the preset vocabulary set at least comprises the vocabulary in the first vocabulary set, and the vocabulary in the preset vocabulary set has a weighted value under the preset data type;
the relevancy calculation module is used for calculating the occurrence frequency of vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the relevancy of the first data text information and the preset data type based on the relevancy coefficient of the first word frequency and each preset word frequency; the preset word frequency is the product of the word frequency of the vocabulary in the preset vocabulary set and the weight value of the corresponding vocabulary;
and the storage module is used for comparing the association degree of the first data text information with each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information.
The present invention also provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the data storage method described above.
The present invention also provides an electronic device comprising:
at least one processor, and at least one memory, bus connected with the processor;
the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory so as to execute the data storage method.
According to the data storage method, the data storage system, the storage medium and the electronic device, a first file to be stored is analyzed to obtain first data text information; screening out words under the preset data types from the first data text information based on the preset word sets under each preset data type to obtain a first word set; calculating the frequency of the vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the association degree of the first data text information and the preset data type based on the association degree coefficient of the first word frequency and each preset word frequency; and comparing the association degrees of the first data text information and each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information. According to the invention, the files to be stored are classified, the stored data is easy to search, the integrity and the integrity of the data can be improved, and the risks of data loss and large waste caused by data resources are avoided.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a data storage method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data storage system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a data storage method, as shown in fig. 1, the method includes:
step 101: and analyzing the first file to be stored to obtain first data text information.
The first file to be stored may have picture information, and at this time, the first file to be stored may be analyzed by an optical character recognition method to obtain first data text information. Of course, the first file to be stored may also have sound information, and at this time, the sound information may be converted into text information to obtain first data text information.
When an optical character recognition method is adopted for analysis, carrying out gray processing on picture information in a first file to be stored; carrying out binarization processing on the image information subjected to gray level processing, and separating characters from a background; carrying out noise reduction processing on the image, determining an average pixel value of a connected region in the image, and processing noise point information; performing inclination correction on characters in the picture based on Hall conversion, performing expansion processing on the picture, connecting discontinuous characters into a straight line, then performing straight line detection, and rotating the characters after determining an inclination angle; dividing the picture after the inclination correction is finished, projecting characters of the picture to a Y axis by scanning in the transverse direction, determining a histogram on the Y axis, and performing longitudinal scanning after transverse scanning to obtain a cutting result; and performing character recognition on the cut characters, and performing matching operation on the feature vectors of the cut characters of the picture and the features of the feature template library to obtain first data text information.
Step 102: screening out words under the preset data types from the first data text information based on the preset word set of each preset data type to obtain a first word set; the words in the preset word set at least comprise words in the first word set, and the words in the preset word set have weighted values under a preset data type.
The preset data types can be specifically divided into a large type, a medium type and a small type, and can be various large types, various medium types and various small types, of course, various sub types can also be arranged below the small type, and the preset data types are determined according to different application scene requirements.
In this embodiment, the large class vocabularies are first screened from the first data text information according to the order from the large class to the middle class to the small class, and optionally, the vocabularies under the preset data type are vocabularies with a word frequency greater than a preset word frequency, which can be obtained through statistics, and the weighted values under the preset data type can be obtained based on an analytic hierarchy process.
When the vocabulary under the preset data type is screened out from the first data text information, whether the vocabulary of the preset vocabulary set exists in the first data text information is searched, if so, the vocabulary is recorded in the first vocabulary set, and the mapping relation is established between the weight value under the corresponding preset data type and the vocabulary in the first vocabulary set.
Step 103: calculating the frequency of the vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the association degree of the first data text information and the preset data type based on the association degree coefficient of the first word frequency and each preset word frequency; the preset word frequency is the product of the word frequency of the vocabulary in the preset vocabulary set and the weight value of the corresponding vocabulary.
Optionally, determining the association degree between the first data text information and the preset data type based on the association degree coefficient between the first word frequency and each preset word frequency specifically includes: determining a correlation coefficient between the first word frequency and each preset word frequency based on a grey correlation analysis method; and taking the average value of the multiple association coefficients as the association degree of the first data text information and the preset data type.
When determining the association degree coefficient between the first word frequency and each preset word frequency based on a grey association degree analysis method, obtaining the association degree coefficient by adopting the following formula:
Figure RE-GDA0003548094640000061
in the formula, xii(k) Is the correlation coefficient, x, of the kth first word frequency and the ith preset word frequency0(k) Is the k first word frequency, xi(k) Is the product of the ith preset word frequency and the corresponding weighted value, sigma is a resolution coefficient, and sigma is more than 0 and less than or equal to 1.
Step 104: and comparing the association degrees of the first data text information and each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information.
After the preset data type corresponding to the maximum degree of association is determined as the data type of the first data text information, the first file to be stored is encoded, for example, as KNWE, and table 1 is a category matching list.
TABLE 1 Category match List
Figure RE-GDA0003548094640000071
In Table 1, R & D is a major class, which is a research and development management class identifier, SP is a middle class, which is a standard class identifier, M is a minor class, which is a manual class identifier, PF is a middle class, which is a program file class identifier, R & D-SP-001 and R & D-SP-002 are serial number names under the R & D-SP class, which represent different files, R & D-SP-M-001 is a serial number name under the R & D-SP-M class, and R & D-PF-001 is a serial number name under the R & D-PF class.
When the first file to be stored is stored based on the data type of the first data text information, the storage information includes information such as knowledge category identification, file identification to be stored, text content in the file to be stored, data acquisition source, data creation time and the like, and is associated with a corresponding data category table, as shown in table 2.
TABLE 2 data Category Table
Serial number Category name Identification Source of collection Creation time Association category Remarks for note
1 Canonical classes R&D-SP-001 001 2021-07-10 R&D-SP
2 Canonical classes R&D-SP-002 001 2021-07-10 R&D-SP
3 Handbooks and the like R&D-SP-M-001 001-01 2021-07-11 R&D-SP-M
4 Program file class R&D-PF-001 002 2021-07-13 R&D-PF
In table 2, the collection source is represented in the form of a string, and corresponds to a data collector or a data acquirer.
As an optional implementation manner, after "determining the preset data type corresponding to the maximum association degree as the data type of the first data text information" in step 104, the method further includes: judging whether the preset data type corresponding to the maximum association degree has a subdata type or not; the subdata type belongs to a preset data type; if the vocabulary exists, based on the sub vocabulary set under each seed data type, screening out the vocabulary under the sub data type from the first data text information to obtain a second vocabulary set; the vocabulary in the sub vocabulary set at least comprises the vocabulary in the second vocabulary set, and the vocabulary in the sub vocabulary set has the weight value under the subdata type; calculating the occurrence frequency of the words in the second word set in the first data text information to obtain a second word frequency, and determining the association degree of the first data text information and the subdata type based on the association degree coefficient of the second word frequency and each subdue word frequency; and comparing the association degree of the first data text information with each subdata type, determining the subdata type corresponding to the maximum association degree as the subdata type of the first data text information, and storing the first file to be stored based on the data type and the subdata type of the first data text information.
Optionally, if the data type of the first data text information only has a major class and a middle class, and no minor class, the first file to be stored is stored directly based on the major class and the minor class of the first data text information.
As an optional implementation manner, the first file to be stored is parsed to obtain the first data source information and the first storage time information. The method for storing the first file to be stored specifically comprises the following steps: determining a data type item based on the first data source information and the first data text information; mapping the data type item, the identification of the data type of the first data text information and the name of the first file to be stored to generate a data mapping relation; and storing the first file to be stored based on the data mapping relation and the first storage time information.
When determining the data type item based on the first data source information and the first data text information, as shown in table 3 and table 4, where table 3 is a specification class data table and table 4 is an associated specification class sub item data table.
TABLE 3 Specification class data sheet
Figure RE-GDA0003548094640000091
In Table 3, the type name is Specification Class, and the corresponding data type Item is Specification Class Item.
Table 4 specification class sub item data table
Figure RE-GDA0003548094640000092
The contents of table 3 and table 4 are storage data tables obtained by storing the first file to be stored based on the data mapping relationship and the first storage time information.
As an optional implementation manner, the data storage method of the present invention further includes: analyzing the second file to be stored to obtain second data source information and second storage time information; judging whether files with the same name are stored or not; if so, searching a data mapping relation corresponding to the name of the second file to be stored to obtain a target data mapping relation; searching a data type item matched with the target data mapping relation to obtain a target data type item; modifying the target data type item based on the second data text information and the second data source information to obtain a modified data type item; updating the data mapping relation based on the modified data type item to obtain an updated data mapping relation; and storing the second file to be stored based on the updated data mapping relation and the second storage time information.
Before the second file to be stored is stored, whether the database stores the storage file with the same name as the file of the second file to be stored needs to be judged, if the storage file does not have the same name, the file with the name is not stored before, and if the storage file has the same name, the file content of the file name is updated. If the file is newly added, executing the process of step 101-104; and if the file is updated, searching a data type item matched with the target data mapping relation to obtain a target data type item, and modifying the content of the table 4 based on the second data text information and the second data source information to obtain a modified data type item. Department personnel and position information and corresponding content and content description in the table 4 can be obtained from the second data source information; the standard information in table 4 can be obtained from the target data type item; the other items of information in table 4 may be obtained from the second data source information.
As an optional implementation manner, the data storage method of the present invention further includes: receiving a file deleting operation request, and analyzing the file deleting operation request to obtain a deleted file name; judging whether the data request set has storage request information with the same name as the deleted file name; if yes, the file with the same name as the deleted file is not deleted.
In this embodiment, each request operation is sequenced according to the request time, and a request execution sequence set is generated, and if there is an update operation or a delete operation on a file with the same file name in the set, processing exception may be caused. In this case, it is necessary to determine whether the data request set has storage request information with the same name as the deleted file name; if yes, the file with the same name as the deleted file is not deleted. The file status data table is shown in table 5.
TABLE 5 File status data Table
Serial number Data ofType identification Data content Status of state Knowledge categories
1 R&D-SP-011 "Industrial software usability Specification V2.0" Updating R&D-SP
2 R&D-SP-012 "Industrial software XXX front end code Specification V2.0" Updating R&D-SP
3 R&D-SP-013 "Industrial software XX language code specification V1.0" Adding new R&D-SP
4 R&D-SP-014 "Industrial software interface test Specification V1.0" Adding new R&D-SP
5 R&D-SP-015 "Industrial software Performance test Specification V0.1" Updating R&D-SP
6 R&D-SP-016 "other Specifications for Industrial software V0.1" Others R&D-SP
In table 5, the status is other may include deleting the file. In addition, information such as file processing priority, file content security level, and operation execution risk level may be recorded in table 5.
For the stored files, when various routes are formulated, for example, development skill knowledge routes, test skill knowledge routes, management skill knowledge routes, product knowledge routes and the like are formulated, each route needs to be formulated based on a plurality of stored files, and the classified storage data can facilitate searching of needed files when various routes are formulated.
The present invention also provides a data storage system, as shown in fig. 2, the system comprising:
the parsing module 201 is configured to parse the first file to be stored to obtain first data text information. The method for analyzing the first file to be stored is an optical character recognition method.
The screening module 202 is configured to screen out vocabularies under the preset data types from the first data text information based on the preset vocabulary sets of each preset data type to obtain a first vocabulary set; the words in the preset word set at least comprise words in the first word set, and the words in the preset word set have weighted values under a preset data type. Wherein the weighted value under the preset data type is obtained based on an analytic hierarchy process.
The relevancy calculation module 203 is configured to calculate the occurrence frequency of vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determine the relevancy of the first data text information and the preset data type based on a relevancy coefficient between the first word frequency and each preset word frequency; the preset word frequency is the product of the word frequency of the vocabulary in the preset vocabulary set and the weight value of the corresponding vocabulary.
The association degree calculating module 203 specifically includes:
the association degree calculating unit is used for determining association degree coefficients of the first word frequency and each preset word frequency based on a gray association degree analysis method; and taking the average value of the multiple association coefficients as the association degree of the first data text information and the preset data type.
The storage module 204 is configured to compare the association degrees of the first data text information and each preset data type, determine the preset data type corresponding to the maximum association degree as the data type of the first data text information, and store the first file to be stored based on the data type of the first data text information.
The storage module 204 specifically includes:
the storage unit is used for determining a data type item based on first data source information and first data text information after the first file to be stored is analyzed to obtain the first data source information and the first storage time information; mapping the data type item, the identification of the data type of the first data text information and the name of the first file to be stored to generate a data mapping relation; and storing the first file to be stored based on the data mapping relation and the first storage time information.
As an optional implementation, the data storage system further includes:
the processing module is used for judging whether the preset data type corresponding to the maximum association degree has the subdata type or not; the subdata type belongs to a preset data type; if the vocabulary exists, based on the sub vocabulary set under each seed data type, screening out the vocabulary under the sub data type from the first data text information to obtain a second vocabulary set; the vocabulary in the sub vocabulary set at least comprises the vocabulary in the second vocabulary set, and the vocabulary in the sub vocabulary set has the weight value under the subdata type; calculating the occurrence frequency of the words in the second word set in the first data text information to obtain a second word frequency, and determining the association degree of the first data text information and the subdata type based on the association degree coefficient of the second word frequency and each subdue word frequency; and comparing the association degree of the first data text information with each subdata type, determining the subdata type corresponding to the maximum association degree as the subdata type of the first data text information, and storing the first file to be stored based on the data type and the subdata type of the first data text information.
The file updating module is used for analyzing the second file to be stored to obtain second data source information and second storage time information; judging whether files with the same name are stored or not; if so, searching a data mapping relation corresponding to the name of the second file to be stored to obtain a target data mapping relation; searching a data type item matched with the target data mapping relation to obtain a target data type item; modifying the target data type item based on the second data text information and the second data source information to obtain a modified data type item; updating the data mapping relation based on the modified data type item to obtain an updated data mapping relation; and storing the second file to be stored based on the updated data mapping relation and the second storage time information.
The deleting module is used for receiving the file deleting operation request, analyzing the file deleting operation request and obtaining a deleted file name; judging whether the data request set has storage request information with the same name as the deleted file name; if yes, the file with the same name as the deleted file is not deleted.
An embodiment of the present invention provides a computer-readable storage medium on which a program is stored, the program implementing the above-described data storage method when executed by a processor.
An embodiment of the present invention provides an electronic device, as shown in fig. 3, an electronic device 30 includes at least one processor 301, and at least one memory 302 and a bus 303 connected to the processor 301; wherein, the processor 301 and the memory 302 complete the communication with each other through the bus 303; the processor 301 is used to call program instructions in the memory 302 to perform the data storage methods described above. The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application also provides a computer program product adapted to execute a program initialized with the steps comprised by the above-mentioned data storage method, when executed on a data processing device.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method of storing data, comprising:
analyzing the first file to be stored to obtain first data text information;
screening out words under the preset data types from the first data text information based on a preset word set of each preset data type to obtain a first word set; the vocabulary in the preset vocabulary set at least comprises the vocabulary in the first vocabulary set, and the vocabulary in the preset vocabulary set has a weighted value under the preset data type;
calculating the occurrence frequency of the vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the association degree of the first data text information and the preset data type based on the association degree coefficient of the first word frequency and each preset word frequency; the preset word frequency is the product of the word frequency of the vocabulary in the preset vocabulary set and the weight value of the corresponding vocabulary;
and comparing the association degree of the first data text information with each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information.
2. The data storage method according to claim 1, wherein after determining a preset data type corresponding to a maximum association degree as the data type of the first data text information, the method further comprises:
judging whether the preset data type corresponding to the maximum association degree has a subdata type or not; the subdata type belongs to the preset data type;
if the sub vocabulary set exists, screening out the vocabulary under the sub data type from the first data text information based on the sub vocabulary set under each seed data type to obtain a second vocabulary set; the words in the sub-word set at least comprise the words in the second word set, and the words in the sub-word set have weight values under the sub-data types;
calculating the occurrence frequency of the words in the second word set in the first data text information to obtain a second word frequency, and determining the association degree of the first data text information and the subdata type based on the association degree coefficient of the second word frequency and each subdue word frequency;
and comparing the association degree of the first data text information with each sub-data type, determining the sub-data type corresponding to the maximum association degree as the sub-data type of the first data text information, and storing the first file to be stored based on the data type and the sub-data type of the first data text information.
3. The data storage method of claim 1, further comprising: analyzing the first file to be stored to obtain first data source information and first storage time information;
the method for storing the first file to be stored specifically comprises the following steps:
determining a data type item based on the first data source information and the first data text information;
mapping the data type item, the identifier of the data type of the first data text information and the name of the first file to be stored to generate a data mapping relation;
and storing the first file to be stored based on the data mapping relation and the first storage time information.
4. The data storage method of claim 3, further comprising:
analyzing the second file to be stored to obtain second data source information and second storage time information;
judging whether files with the same name are stored or not;
if so, searching a data mapping relation corresponding to the name of the second file to be stored to obtain a target data mapping relation;
searching a data type item matched with the target data mapping relation to obtain a target data type item;
modifying the target data type item based on the second data text information and the second data source information to obtain a modified data type item;
updating the data mapping relation based on the modified data type item to obtain an updated data mapping relation;
and storing the second file to be stored based on the updated data mapping relation and the second storage time information.
5. The data storage method of claim 1, further comprising:
receiving a file deleting operation request, and analyzing the file deleting operation request to obtain a deleted file name;
judging whether the data request set has storage request information with the same name as the deleted file name;
and if so, not deleting the file with the same name as the deleted file.
6. The data storage method according to claim 1, wherein the determining the association degree between the first data text information and the preset data type based on the association degree coefficient between the first word frequency and each preset word frequency specifically comprises:
determining a correlation coefficient between the first word frequency and each preset word frequency based on a grey correlation analysis method;
and taking the average value of the multiple association coefficients as the association degree of the first data text information and the preset data type.
7. The data storage method according to claim 1, wherein the weight value under the preset data type is obtained based on an analytic hierarchy process; the method for analyzing the first file to be stored is an optical character recognition method.
8. A data storage system, comprising:
the analysis module is used for analyzing the first file to be stored to obtain first data text information;
the screening module is used for screening out vocabularies under the preset data types from the first data text information based on the preset vocabulary sets of each preset data type to obtain a first vocabulary set; the vocabulary in the preset vocabulary set at least comprises the vocabulary in the first vocabulary set, and the vocabulary in the preset vocabulary set has a weighted value under the preset data type;
the relevancy calculation module is used for calculating the occurrence frequency of vocabularies in the first vocabulary set in the first data text information to obtain a first word frequency, and determining the relevancy of the first data text information and the preset data type based on the relevancy coefficient of the first word frequency and each preset word frequency; the preset word frequency is the product of the word frequency of the vocabulary in the preset vocabulary set and the weight value of the corresponding vocabulary;
and the storage module is used for comparing the association degree of the first data text information with each preset data type, determining the preset data type corresponding to the maximum association degree as the data type of the first data text information, and storing the first file to be stored based on the data type of the first data text information.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a program which, when executed by a processor, implements the data storage method of any one of claims 1 to 7.
10. An electronic device, comprising:
at least one processor, and at least one memory, bus connected with the processor;
the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the data storage method of any of claims 1-7.
CN202111673172.1A 2021-12-31 2021-12-31 Data storage method, system, storage medium and electronic equipment Pending CN114398315A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111673172.1A CN114398315A (en) 2021-12-31 2021-12-31 Data storage method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111673172.1A CN114398315A (en) 2021-12-31 2021-12-31 Data storage method, system, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114398315A true CN114398315A (en) 2022-04-26

Family

ID=81229471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111673172.1A Pending CN114398315A (en) 2021-12-31 2021-12-31 Data storage method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114398315A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910166A (en) * 2023-09-12 2023-10-20 湖南尚医康医疗科技有限公司 Hospital information acquisition method and system of Internet of things, electronic equipment and storage medium
CN117170590A (en) * 2023-11-03 2023-12-05 沈阳卓志创芯科技有限公司 Computer data storage method and system based on cloud computing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910166A (en) * 2023-09-12 2023-10-20 湖南尚医康医疗科技有限公司 Hospital information acquisition method and system of Internet of things, electronic equipment and storage medium
CN116910166B (en) * 2023-09-12 2023-11-24 湖南尚医康医疗科技有限公司 Hospital information acquisition method and system of Internet of things, electronic equipment and storage medium
CN117170590A (en) * 2023-11-03 2023-12-05 沈阳卓志创芯科技有限公司 Computer data storage method and system based on cloud computing
CN117170590B (en) * 2023-11-03 2024-01-26 沈阳卓志创芯科技有限公司 Computer data storage method and system based on cloud computing

Similar Documents

Publication Publication Date Title
CN106033416B (en) Character string processing method and device
CN110275965B (en) False news detection method, electronic device and computer readable storage medium
CN110909725A (en) Method, device and equipment for recognizing text and storage medium
CN111210842B (en) Voice quality inspection method, device, terminal and computer readable storage medium
CN108027814B (en) Stop word recognition method and device
CN110502227B (en) Code complement method and device, storage medium and electronic equipment
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
CN103810212A (en) Automated database index creation method and system
CN114398315A (en) Data storage method, system, storage medium and electronic equipment
CN106959976B (en) Search processing method and device
CN111338692A (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN111159334A (en) Method and system for house source follow-up information processing
CN110019542B (en) Generation of enterprise relationship, generation of organization member database and identification of same name member
CN107832391B (en) Data query method and system
CN110147223B (en) Method, device and equipment for generating component library
CN112434115B (en) Data processing method and device, electronic equipment and readable storage medium
CN116821903A (en) Detection rule determination and malicious binary file detection method, device and medium
CN115437930A (en) Identification method of webpage application fingerprint information and related equipment
CN111460268B (en) Method and device for determining database query request and computer equipment
CN105512145A (en) Method and device for information classification
CN114677125A (en) Standard data processing method, device and system
CN110765100B (en) Label generation method and device, computer readable storage medium and server
CN109542986B (en) Element normalization method, device, equipment and storage medium of network data
CN111563123A (en) Live warehouse metadata real-time synchronization method
CN112559674A (en) Method for inquiring content of legal item in referee document and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination