CN117851340A - File forming method, system, terminal and storage medium based on keywords - Google Patents

File forming method, system, terminal and storage medium based on keywords Download PDF

Info

Publication number
CN117851340A
CN117851340A CN202410263005.7A CN202410263005A CN117851340A CN 117851340 A CN117851340 A CN 117851340A CN 202410263005 A CN202410263005 A CN 202410263005A CN 117851340 A CN117851340 A CN 117851340A
Authority
CN
China
Prior art keywords
archived
file
keyword
files
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410263005.7A
Other languages
Chinese (zh)
Inventor
肖斌
罗华山
雷鸣
曹俪娟
陈雪婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Cloud Archive Information Technology Co ltd
Original Assignee
Hunan Cloud Archive Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Cloud Archive Information Technology Co ltd filed Critical Hunan Cloud Archive Information Technology Co ltd
Priority to CN202410263005.7A priority Critical patent/CN117851340A/en
Publication of CN117851340A publication Critical patent/CN117851340A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to the field of archive management technologies, and in particular, to a method, a system, a terminal, and a storage medium for forming an archive based on keywords, where the method includes: acquiring file characteristics of a file to be archived; extracting keywords from files to be archived based on file characteristics, and generating keywords; establishing a mapping relation between the keywords and the files to be archived; classifying files to be archived based on the keywords and the mapping relation and generating classification results; and archiving the files to be archived based on the classification result. The file archiving method and device are beneficial to improving the archival archiving efficiency.

Description

File forming method, system, terminal and storage medium based on keywords
Technical Field
The present disclosure relates to the field of archive management technologies, and in particular, to a method, a system, a terminal, and a storage medium for archive formation based on keywords.
Background
How to effectively archive the electronic files is a pain point for file collection, each business department carries out the first operation of the file life cycle according to an archive range deadline provided by an archive office, namely, the archive value identification of the electronic files is carried out by more people in the business department, the archive value identification of the electronic files is carried out by professional staff of related professionals, due to large workload and insufficient professional staff, part of departments arrange non-professional staff to carry out the work, which easily causes the conditions of fuzzy archive value of the electronic files, incomplete archive materials and the like, and further, the whole part/whole volume of files are directly caused to lose the meaning of long-term preservation.
In the prior art, after electronic files are collected, identification, classification, writing and the like are required to be carried out manually, however, when files are archived, due to different purposes and purposes, the electronic files are frequently missed, repeated, invalid files, non-compliance in the process and the like at two ends of the files which are responsible for production, file utilization and file collection and management, and the work efficiency of the whole archiving work is severely restricted by reworking.
Disclosure of Invention
In order to help to improve archival efficiency, the application provides an archival forming method, system, terminal and storage medium based on keywords.
In a first aspect, the present application provides a keyword-based archive forming method, which adopts the following technical scheme:
a file forming method based on keywords comprises the following steps:
acquiring a file to be archived and file characteristics of the file to be archived;
extracting keywords from the file to be archived based on the file characteristics, and generating keywords;
establishing a mapping relation between the keywords and the files to be archived;
classifying the files to be archived based on the keywords and the mapping relation and generating classification results;
and archiving the files to be archived based on the classification result.
By adopting the technical scheme, the files to be archived and the file characteristics thereof are obtained, keyword extraction is carried out on the files to be archived according to the file characteristics, the mapping relation between the keywords and the files to be archived is established, the files to be archived are classified according to the keywords and the mapping relation, classification results are generated, and finally the files to be archived are archived according to the classification results; and extracting corresponding keywords according to file characteristics, and archiving the files to be archived according to the keywords and the mapping relation, so that the problems of omission, repetition, invalid files, non-compliance in the process and the like of the files to be archived in an archiving link caused by different archiving purposes and the like are reduced, and the archiving efficiency is improved.
Optionally, the specific step of obtaining the file to be archived and the file characteristics of the file to be archived includes:
acquiring a document format and document contents of the file to be archived;
identifying and analyzing the document content to acquire the theme ideas of the files to be archived;
based on the theme ideas, acquiring the file types of the files to be archived;
and taking the file format and the file type as the file characteristics.
By adopting the technical scheme, the document format and the document content of the file to be archived are obtained, the content is identified and analyzed, so that the theme idea of the file to be archived is obtained, the file type of the file to be archived is obtained according to the theme idea, finally the file format and the file type are used as file characteristics, the file format and the file type are combined to obtain the file characteristics, the file characteristics are better attached to the file to be archived, and the archiving accuracy of the file to be archived is improved.
Optionally, the specific steps of extracting the keywords from the file to be archived based on the file characteristics and generating the keywords include:
the file to be archived is subjected to word segmentation, and word segmentation words are obtained;
acquiring a first association degree of the word segmentation words and the theme ideas;
judging whether the first association degree meets a preset association requirement or not;
and if the first association degree meets the preset association requirement, taking the word segmentation word as the keyword.
By adopting the technical scheme, the file to be archived is divided into countless word segmentation words, the first association degree of each word segmentation word and the theme thought is obtained, whether the first association degree meets the preset association requirement is judged, if yes, the word segmentation word and the theme thought of the file to be archived are indicated to have higher association, and therefore the word segmentation word is used as a keyword; whether the word segmentation words meet preset association requirements or not is judged, whether the word segmentation words are attached to the theme ideas of the files to be archived or not is judged, namely whether the word segmentation words can be used as keywords for representing the files to be archived or not is judged, the keywords are attached to the theme ideas more, the archiving accuracy is improved, the repeated archiving caused by inaccurate archiving is reduced, and the archiving efficiency is improved.
Optionally, the specific step of determining whether the first association degree meets a preset association requirement includes:
acquiring target positions of different word segmentation words in the file to be archived;
judging whether the target position is a designated position or not;
if the target position is the designated position, judging that the first association degree meets a preset association requirement;
if the target position is not the designated position, acquiring the use times of the word segmentation words in the file to be archived;
judging whether the using times exceeds a preset quantity threshold value or not;
if the number of times of use does not exceed the preset number threshold, judging that the first association degree does not meet a preset association requirement;
and if the using times exceeds the preset quantity threshold, judging that the first association degree meets a preset association requirement.
By adopting the technical scheme, the target positions of different word segmentation words in the files to be archived are obtained, whether the target positions are designated positions or not is judged, if yes, the word segmentation words are indicated to be positioned at important positions in the files to be archived, so that the word segmentation words also have higher value in the files to be archived, and therefore the word segmentation words can be used for representing the files to be archived or representing a certain theme in the files to be archived, and therefore, the first association degree is judged to meet the preset association requirement;
if not, the fact that the word segmentation word is not located in an important position in the file to be archived is indicated, the use times of the word segmentation word in the file to be archived need to be obtained for further first association degree to meet preset association requirements, whether the use times exceed a preset quantity threshold value is judged, if not, the fact that the frequency of occurrence of the word segmentation word in the file to be archived is low is indicated, and only from the view of the use times, the first association degree corresponding to the word segmentation word does not meet the preset degree requirements; if the first association degree exceeds the first association degree, the occurrence frequency of the word segmentation words in the files to be archived is high, so that the association between the word segmentation words and the files to be archived is high, and the first association degree is judged to meet the preset association requirement;
by combining the target position and the using times, the first association degree between the word segmentation words and the theme ideas is judged from multiple aspects, so that whether the first association degree between the word segmentation words and the theme ideas meets a preset degree threshold value can be judged more accurately.
Optionally, the method further comprises:
judging whether an archive retrieval instruction is detected;
if the file retrieval instruction is detected, obtaining a retrieval word;
matching the search term with the keyword, and obtaining the matching degree;
sorting the matching degrees to generate a degree sorting list;
and generating retrieval content based on the degree order list and the mapping relation.
By adopting the technical scheme, when a file retrieval instruction is detected, the user is indicated to need to find related files, the retrieval words input by the user are matched with the keywords to obtain the matching degree between the retrieval words and different keywords, the different matching degrees are ordered according to the size sequence to generate an ordered list, and finally retrieval contents related to the user needs are generated according to the ordered list and the mapping relation; by acquiring the keywords with higher matching degree with the keywords and generating corresponding search contents according to the mapping relation of the keywords, the method is beneficial to helping users find files meeting the requirements more quickly and accurately.
Optionally, the specific step of matching the search term with the keyword and obtaining the matching degree includes:
matching the search term with the keyword;
judging whether the keywords corresponding to the search terms exist or not;
if the keyword corresponding to the search term exists, the keyword is used as a target keyword;
acquiring a second association degree of the search term and the target keyword;
acquiring a keyword label of the target keyword;
and acquiring the matching degree based on the second association degree and the keyword label.
By adopting the technical scheme, judging whether the keyword corresponding to the search word exists, if so, indicating that the keyword associated with the search word input by the user exists, and also indicating that the archive associated with the search word input by the user exists in the archive, taking the keyword as a target keyword, acquiring a second association degree of the search word and the target keyword and a keyword label of the target keyword, and finally acquiring a matching degree by combining the second association degree and the keyword label; by acquiring the second association degree of the search word and the target keyword, whether the search word has association and the intensity degree of the association between the search word and the existing keyword can be acquired, then according to the keyword label, the keyword associated with the search word can be acquired and used as the keyword for representing the corresponding file according to what factors, and the two are combined, so that the matching degree is more accurate, and the matching degree is used as the sorting basis of all the searched files, and the user is helped to acquire the files meeting the requirements of the user quickly and accurately.
Optionally, the specific step of obtaining the matching degree based on the second association degree and the keyword label includes:
acquiring an association score based on the second association degree and a preset association rule;
acquiring the label number corresponding to the keyword labels;
acquiring label scores based on the label number and preset label basic scores;
acquiring a matching score based on the association score, the tag score and a preset score weight;
and acquiring the matching degree based on the matching score and a preset matching rule.
By adopting the technical scheme, the association score is calculated according to the second association degree and the preset association rule, the number of the labels is obtained, the label score is calculated according to the number of the labels and the preset label base score, the matching score is calculated according to the association score, the label score and the preset score weight corresponding to the association score and the label score, and finally the matching degree is calculated by combining the preset matching rule; according to different calculation factors and preset score weights corresponding to the calculation factors, finally, the matching degree is obtained, so that the matching degree is more persuasive and accurate, and a user can quickly and accurately find files meeting the requirements of the user.
In a second aspect, the present application further discloses a keyword-based archive forming system, which adopts the following technical scheme:
a keyword-based archive forming system comprising:
the file characteristic acquiring module is used for acquiring the file to be archived and the file characteristic of the file to be archived;
the generating module is used for extracting keywords from the file to be archived based on the file characteristics and generating keywords;
the relation establishing module is used for establishing a mapping relation between the keywords and the files to be archived;
the classification module is used for classifying the files to be archived based on the keywords and the mapping relation and generating classification results;
and the archiving module is used for archiving the files to be archived based on the classification result.
By adopting the technical scheme, the files to be archived and the file characteristics thereof are obtained, keyword extraction is carried out on the files to be archived according to the file characteristics, the mapping relation between the keywords and the files to be archived is established, the files to be archived are classified according to the keywords and the mapping relation, classification results are generated, and finally the files to be archived are archived according to the classification results; and extracting corresponding keywords according to file characteristics, and archiving the files to be archived according to the keywords and the mapping relation, so that the problems of omission, repetition, invalid files, non-compliance in the process and the like of the files to be archived in an archiving link caused by different archiving purposes and the like are reduced, and the archiving efficiency is improved.
In a third aspect, the present application provides a computer apparatus, which adopts the following technical scheme:
an intelligent terminal comprising a memory, a processor, wherein the memory is configured to store a computer program capable of running on the processor, and the processor, when loaded with the computer program, performs the method of the first aspect.
By adopting the technical scheme, the computer program is generated based on the method of the first aspect and is stored in the memory to be loaded and executed by the processor, so that the intelligent terminal is manufactured according to the memory and the processor, and the intelligent terminal is convenient for a user to use.
In a fourth aspect, the present application provides a computer readable storage medium, which adopts the following technical scheme:
a computer readable storage medium having stored therein a computer program which, when loaded by a processor, performs the method of the first aspect.
By adopting the technical scheme, the method based on the first aspect generates the computer program, and stores the computer program in the computer readable storage medium to be loaded and executed by the processor, and the computer program is convenient to read and store through the computer readable storage medium.
In summary, the present application includes the following beneficial technical effects:
acquiring files to be archived and file characteristics thereof, extracting keywords from the files to be archived according to the file characteristics, establishing a mapping relation between the keywords and the files to be archived, classifying the files to be archived according to the keywords and the mapping relation, generating classification results, and archiving the files to be archived according to the classification results; and extracting corresponding keywords according to file characteristics, and archiving the files to be archived according to the keywords and the mapping relation, so that the problems of omission, repetition, invalid files, non-compliance in the process and the like of the files to be archived in an archiving link caused by different archiving purposes and the like are reduced, and the archiving efficiency is improved.
Drawings
FIG. 1 is a main flow chart of a keyword-based archive forming method according to an embodiment of the present application;
fig. 2 is a step flowchart of steps S201 to S204;
fig. 3 is a step flowchart of steps S301 to S304;
fig. 4 is a step flowchart of steps S401 to S407;
fig. 5 is a step flowchart of steps S501 to S505;
fig. 6 is a step flowchart of steps S601 to S606;
fig. 7 is a step flowchart of steps S701 to S705;
FIG. 8 is a block diagram of a keyword-based archive forming system in accordance with an embodiment of the present application.
Reference numerals illustrate:
1. an acquisition module; 2. a generating module; 3. a relation establishing module; 4. a classification module; 5. and (5) an archiving module.
Detailed Description
In a first aspect, the present application discloses a keyword-based archive formation method.
Referring to fig. 1, a keyword-based archive forming method includes steps S101 to S105:
step S101: and acquiring file characteristics of the file to be archived.
Specifically, the files to be archived, that is, files to be archived, may be various files such as text files, images, videos, etc., and in this embodiment, the files to be archived refer to text type files; file characteristics, i.e. characteristics of the file to be archived, include file type and file content, etc.
Step S102: and extracting keywords from the file to be archived based on the file characteristics, and generating keywords.
Specifically, the keyword has strong relevance to the file to be archived, and can be used for representing words of the file to be archived; in this embodiment, a suitable extraction algorithm may be selected to extract key words such as important entities, topics, or phrases in the text. For example, a pure text electronic file (such as a text receiving and transmitting text) can use a TF-IDF (Term Frequency-Inverse Document Frequency) keyword extraction technology to extract keywords according to the content of a document, synchronously generate a system keyword library, support manual word library management, and provide a basic word library for the next keyword extraction. For student books, such as electronic files with specific keyword requirements or text processing requirements in specific fields, keyword extraction can be performed by using a custom keyword library, a stop word list, a field specific dictionary and the like.
Step S103: and establishing a mapping relation between the keywords and the files to be archived.
Specifically, in this embodiment, information such as the positions, frequencies, etc. of the keyword ID and the electronic file ID and the number of times the keyword ID appears in the whole file set is stored according to a data structure of the system design, such as an inverted table, a hash table, a tree structure, etc., and the extracted keyword is written into metadata of the electronic file.
Step S104: and classifying files to be archived based on the keywords and the mapping relation and generating classification results.
Specifically, the classification result is a result formed after classifying the files to be archived according to the keywords and the mapping relation, in this embodiment, classification can be performed according to the extraction precision and the effectiveness of the keywords, for example, in the case of a large data volume, classification can be performed on the files to be archived by converting the keywords into feature vectors and using classification methods such as a naive bayes classifier, a Support Vector Machine (SVM), a decision tree, a random forest, and the like; under the condition of small data volume, a keyword search frame can be provided in the collection library, files in the same category or subject can be manually and rapidly searched and screened through keyword combination, and manual classification marking is performed on the basis.
Step S105: and archiving the files to be archived based on the classification result.
According to the file forming method based on the keywords, the files to be archived and the file characteristics of the files are obtained, keyword extraction is carried out on the files to be archived according to the file characteristics, mapping relations between the keywords and the files to be archived are established, the files to be archived are classified according to the keywords and the mapping relations, classification results are generated, and finally the files to be archived are archived according to the classification results; and extracting corresponding keywords according to file characteristics, and archiving the files to be archived according to the keywords and the mapping relation, so that the problems of omission, repetition, invalid files, non-compliance in the process and the like of the files to be archived in an archiving link caused by different archiving purposes and the like are reduced, and the archiving efficiency is improved.
Referring to fig. 2, in one implementation manner of the present embodiment, the specific step of obtaining the file to be archived and the file characteristics of the file to be archived in step S101 includes steps S201 to S204:
step S201: and acquiring the document format and the document content of the file to be archived.
Specifically, in this embodiment, the document format is the format of the file to be archived, for example, the document format, the picture format, the video format, etc., where the document format includes txt, doc, xml, pdf, etc.; the document content is the content of the file to be archived.
Step S202: and identifying and analyzing the document content to obtain the theme ideas of the files to be archived.
Specifically, in this embodiment, the theme concept is a main concept that the file to be archived wants to express, such as praise or criticism of someone or something, and emotion that the author wants to express.
Step S203: based on the theme idea, the file type of the file to be archived is obtained.
Specifically, in this embodiment, the file type is the type of the file to be archived, including the types such as emotion, martial arts, science and technology, history, and the like.
Step S204: the file format and the file type are taken as file characteristics.
According to the file forming method based on the keywords, which is provided by the embodiment, the file format and the file content of the file to be archived are obtained, the content is identified and analyzed, so that the theme thought of the file to be archived is obtained, the file type of the file to be archived is obtained according to the theme thought, finally the file format and the file type are used as file characteristics, the file format and the file type are combined, the file characteristics are obtained, the file characteristics are more attached to the file to be archived, and therefore the accuracy of archiving the file to be archived is improved.
Referring to fig. 3, in one implementation manner of the present embodiment, step S102 performs keyword extraction on a file to be archived based on file characteristics, and the specific steps of generating keywords include steps S301 to S304:
step S301: and segmenting the file to be archived, and acquiring segmented words.
Specifically, in this embodiment, innumerable words formed after word segmentation of the file to be archived are word segmentation words, and word segmentation may be performed according to part of speech, word segmentation may also be performed according to context, and so on.
Step S302: and acquiring a first association degree of the word segmentation words and the theme ideas.
Specifically, the first association degree, that is, the association degree of the word segmentation word and the theme idea, in this embodiment, a rule for judging the first association degree may be preset, for example, the theme idea is introduction to a certain computer, and the word segmentation word is a computer, which indicates that the first association degree is extremely high.
Step S303: judging whether the first association degree meets the preset association requirement.
Specifically, in this embodiment, the association requirement is preset, that is, a criterion for determining whether the word can be used as a keyword.
Step S304: and if the first association degree meets the preset association requirement, the word segmentation word is used as a keyword.
According to the file forming method based on the keywords, which is provided by the embodiment, the file to be archived is divided into countless word segmentation words, the first association degree of each word segmentation word and the theme thought is obtained, whether the first association degree meets the preset association requirement is judged, if yes, the fact that the association between the word segmentation word and the theme thought of the file to be archived is higher is indicated, and therefore the word segmentation word is used as the keyword; whether the word segmentation words meet preset association requirements or not is judged, whether the word segmentation words are attached to the theme ideas of the files to be archived or not is judged, namely whether the word segmentation words can be used as keywords for representing the files to be archived or not is judged, the keywords are attached to the theme ideas more, the archiving accuracy is improved, the repeated archiving caused by inaccurate archiving is reduced, and the archiving efficiency is improved.
Referring to fig. 4, in one implementation manner of the present embodiment, the specific step of determining, in step S303, whether the first association degree meets the preset association requirement includes steps S401 to S407:
step S401: and acquiring target positions of different word segmentation words in the file to be archived.
Specifically, in this embodiment, the target position is the position of the word segmentation word in the file to be archived, for example, at the large title, at the subtitle, or in the body.
Step S402: and judging whether the target position is a designated position.
Specifically, in this embodiment, the designated position is a pre-designated position, and in this embodiment, the designated position may be a large title, a subtitle, or a document summary.
Step S403: if the target position is the designated position, the first association degree is judged to meet the preset association requirement.
Step S404: and if the target position is not the designated position, acquiring the use times of the word segmentation words in the file to be archived.
Specifically, in this embodiment, the number of times of use is the total number of times of occurrence of the word segmentation word in the file to be archived.
Step S405: and judging whether the using times exceed a preset quantity threshold value.
Specifically, in this embodiment, the preset number threshold is a preset judgment rule for judging whether the number of times of use reaches the standard corresponding to the keyword.
Step S406: if the number of times of use does not exceed the preset number threshold, determining that the first association degree does not meet the preset association requirement.
Step S407: if the number of times of use exceeds a preset number threshold, determining that the first association degree meets a preset association requirement.
According to the file forming method based on the keywords, the target positions of different word segmentation words in the file to be archived are obtained, whether the target positions are designated positions or not is judged, if yes, the word segmentation words are shown to be located at important positions in the file to be archived, so that the word segmentation words have high value in the file to be archived, the file to be archived can be represented, or a certain theme in the file to be archived is represented, and therefore the first association degree is judged to meet the preset association requirement.
If not, the fact that the word segmentation word is not located in an important position in the file to be archived is indicated, the use times of the word segmentation word in the file to be archived need to be obtained for further first association degree to meet preset association requirements, whether the use times exceed a preset quantity threshold value is judged, if not, the fact that the frequency of occurrence of the word segmentation word in the file to be archived is low is indicated, and only from the view of the use times, the first association degree corresponding to the word segmentation word does not meet the preset degree requirements; if the first association degree exceeds the first association degree, the occurrence frequency of the word segmentation words in the files to be archived is high, so that the association between the word segmentation words and the files to be archived is high, and the first association degree is judged to meet the preset association requirement.
By combining the target position and the using times, the first association degree between the word segmentation words and the theme ideas is judged from multiple aspects, so that whether the first association degree between the word segmentation words and the theme ideas meets a preset degree threshold value can be judged more accurately.
Referring to fig. 5, in one implementation manner of the present embodiment, steps S501 to S505 are further included:
step S501: it is determined whether a file retrieval instruction is detected.
Specifically, in this embodiment, the file retrieval instruction is an instruction for retrieving files according to the user's requirement through a retrieval algorithm.
Step S502: if the file retrieval instruction is detected, a retrieval word is obtained.
Specifically, in this embodiment, the term is a term input by the user for retrieval.
Step S503: and matching the search term with the keyword, and obtaining the matching degree.
Specifically, the matching degree is the similarity and association degree between the keywords in the keyword library and the search term.
Step S504: and sequencing the matching degree to generate a degree sequencing list.
Specifically, the degree ranking list is a list formed after ranking according to a certain rule according to the matching degree, and in this embodiment, ranking is performed according to the order of the matching degree from large to small.
Step S505: and generating retrieval content based on the degree ordering list and the mapping relation.
Specifically, after the user inputs the search term, the search system performs content pushing according to the matching degree between the search term and the keyword and the mapping relation between the keyword and the archive completed, and the pushed content is the search content.
According to the file forming method based on the keywords, when a file retrieval instruction is detected, the fact that a user needs to find related files is indicated, the retrieval words input by the user are matched with the keywords, the matching degree between the retrieval words and different keywords is obtained, the different matching degrees are ordered according to the size sequence, an ordered list is generated, and finally retrieval content related to the user needs is generated according to the ordered list and the mapping relation; by acquiring the keywords with higher matching degree with the keywords and generating corresponding search contents according to the mapping relation of the keywords, the method is beneficial to helping users find files meeting the requirements more quickly and accurately.
Referring to fig. 6, in one implementation manner of the present embodiment, step S503 includes steps S601 to S606, where the specific steps of matching the keyword with the search term and obtaining the matching degree include:
step S601: and matching the search term with the keyword.
Step S602: and judging whether keywords corresponding to the search terms exist or not.
Specifically, the search words are compared with the keywords in the keyword library, the association degrees between the search words and different keywords are obtained, and whether the keywords associated with the search words exist is judged by judging whether the association degrees exceed the corresponding preset threshold.
Step S603: if the keyword corresponding to the search term exists, the keyword is taken as the target keyword.
Specifically, in this embodiment, the target keyword is a keyword corresponding to the search term.
Step S604: and obtaining a second association degree of the search term and the target keyword.
Specifically, the second association degree is the association degree of the search term and the target keyword.
Step S605: and obtaining a keyword label of the target keyword.
Specifically, the keyword label is determined by a decision factor of the keyword, in this embodiment, the keyword label is divided into a first type and a second type, corresponding keywords are determined by the word segmentation word being at a designated position and the word segmentation word is determined by the number of times of usage exceeding a preset number threshold, for example, if the keyword is determined by the word segmentation word being at the designated position, the keyword label corresponding to the keyword is set as the first type, if the keyword is determined by the word segmentation word being used at the number exceeding the preset number threshold, the keyword label corresponding to the keyword is set as the second type, and if the keyword is determined by the word segmentation word being at the designated position and the word segmentation word being used at the number exceeding the preset number threshold, the keyword label corresponding to the keyword is set as the first type and the second type.
Step S606: and acquiring the matching degree based on the second association degree and the keyword label.
Specifically, in this embodiment, the matching degree is obtained by combining the second association degree and the keyword label.
Judging whether a keyword corresponding to a search term exists or not according to the keyword-based file forming method provided by the embodiment, if so, indicating that the keyword associated with the search term input by the user exists, and also indicating that the file associated with the search term input by the user exists in the archive, taking the keyword as a target keyword, acquiring a second association degree of the search term and the target keyword and a keyword label of the target keyword, and finally acquiring a matching degree by combining the second association degree and the keyword label; by acquiring the second association degree of the search word and the target keyword, whether the search word has association and the intensity degree of the association between the search word and the existing keyword can be acquired, then according to the keyword label, the keyword associated with the search word can be acquired and used as the keyword for representing the corresponding file according to what factors, and the two are combined, so that the matching degree is more accurate, and the matching degree is used as the sorting basis of all the searched files, and the user is helped to acquire the files meeting the requirements of the user quickly and accurately.
Referring to fig. 7, in one implementation manner of the present embodiment, step S606 includes steps S701 to S705, where the specific step of obtaining the matching degree based on the second association degree and the keyword label:
step S701: and acquiring the association score based on the second association degree and a preset association rule.
Specifically, the second association degrees can be classified into different grades, such as a first grade, a second grade and a third grade, and according to a preset association rule, the second association degrees of the different grades correspond to different scores, and the scores are association scores; in this embodiment, the preset association rule is a rule for obtaining the corresponding association score according to the association level.
Step S702: and obtaining the label number corresponding to the keyword labels.
Specifically, in this embodiment, the number of labels is the number of keyword labels corresponding to each keyword.
Step S703: and acquiring the label score based on the number of labels and a preset label basic score.
Specifically, the label score is a score calculated according to the number of labels and a preset label base score, in this embodiment, the label score=the preset label base score is the number of labels, where the preset label base score is a positive number greater than 1.
Step S704: and obtaining a matching score based on the association score, the label score and the preset score weight.
Specifically, the preset score weight is a preset association score and a weight corresponding to the label score, in this embodiment, the preset score weight corresponding to the association score may be set to 50%, the preset score weight corresponding to the label score may be set to 50%, or other values, but it is noted that the preset score weight corresponding to the association score+the preset score weight corresponding to the label score=100%; in this embodiment, the matching score=50% associated score+50% tag score.
Step S705: and obtaining the matching degree based on the matching score and a preset matching rule.
Specifically, a preset matching rule is a preset rule for acquiring a matching degree according to a matching score, in this embodiment, the matching degree may be divided into a first level, a second level and a third level, and the matching degrees of different levels correspond to different matching score intervals, for example, the matching score interval is [ a, b ], the matching score interval is [ b, c ], the matching score interval is corresponding to the second level, the matching score interval is [ c, d ], and the matching score interval is corresponding to the third level; in the grade corresponding to the matching degree, the third grade is more than the second grade and more than the first grade, and the matching score d is more than c and more than b is more than a.
According to the file forming method based on the keywords, the association score is calculated according to the second association degree and the preset association rule, the number of the labels is obtained, the label score is calculated according to the number of the labels and the preset label base score, the matching score is calculated according to the association score, the label score and the preset score weight corresponding to the association score, the label score and the preset score weight, and finally the matching degree is calculated by combining the preset matching rule; according to different calculation factors and preset score weights corresponding to the calculation factors, finally, the matching degree is obtained, so that the matching degree is more persuasive and accurate, and a user can quickly and accurately find files meeting the requirements of the user.
In a second aspect, the present application also discloses a keyword-based archive forming system.
Referring to fig. 8, a keyword-based archive forming system includes:
the acquisition module 1 is used for acquiring files to be archived and file characteristics of the files to be archived;
the generating module 2 is used for extracting keywords of files to be archived based on file characteristics and generating keywords;
the relation establishing module 3 is used for establishing a mapping relation between the keywords and the files to be archived;
the classification module 4 is used for classifying files to be archived based on the keywords and the mapping relation and generating classification results;
and the archiving module 5 is used for archiving the files to be archived based on the classification result.
In a third aspect, an embodiment of the present application discloses an intelligent terminal, including a memory, and a processor, where the memory is configured to store a computer program capable of running on the processor, and when the processor loads the computer program, the processor executes a keyword-based archive forming method in the foregoing embodiment.
In a fourth aspect, embodiments of the present application disclose a computer readable storage medium, and a computer program is stored in the computer readable storage medium, where the computer program, when loaded by a processor, performs a keyword-based archive forming method of the above embodiments.
The foregoing are all preferred embodiments of the present application, and are not intended to limit the scope of the present application in any way, therefore: all equivalent changes in structure, shape and principle of this application should be covered in the protection scope of this application.

Claims (10)

1. A keyword-based archive forming method, comprising:
acquiring a file to be archived and file characteristics of the file to be archived;
extracting keywords from the file to be archived based on the file characteristics, and generating keywords;
establishing a mapping relation between the keywords and the files to be archived;
classifying the files to be archived based on the keywords and the mapping relation and generating classification results;
and archiving the files to be archived based on the classification result.
2. A keyword-based archive forming method as claimed in claim 1, wherein the specific steps of obtaining the files to be archived and the file characteristics of the files to be archived include:
acquiring a document format and document contents of the file to be archived;
identifying and analyzing the document content to acquire the theme ideas of the files to be archived;
based on the theme ideas, acquiring the file types of the files to be archived;
and taking the file format and the file type as the file characteristics.
3. The keyword-based archive forming method of claim 1, wherein the specific steps of extracting keywords from the file to be archived based on the file characteristics, and generating keywords include:
the file to be archived is subjected to word segmentation, and word segmentation words are obtained;
acquiring a first association degree of the word segmentation words and the theme ideas;
judging whether the first association degree meets a preset association requirement or not;
and if the first association degree meets the preset association requirement, taking the word segmentation word as the keyword.
4. A keyword-based archive forming method as claimed in claim 3, wherein the specific step of determining whether the first association degree meets a preset association requirement comprises:
acquiring target positions of different word segmentation words in the file to be archived;
judging whether the target position is a designated position or not;
if the target position is the designated position, judging that the first association degree meets a preset association requirement;
if the target position is not the designated position, acquiring the use times of the word segmentation words in the file to be archived;
judging whether the using times exceeds a preset quantity threshold value or not;
if the number of times of use does not exceed the preset number threshold, judging that the first association degree does not meet a preset association requirement;
and if the using times exceeds the preset quantity threshold, judging that the first association degree meets a preset association requirement.
5. A keyword-based archive formation method as claimed in claim 1, further comprising:
judging whether an archive retrieval instruction is detected;
if the file retrieval instruction is detected, obtaining a retrieval word;
matching the search term with the keyword, and obtaining the matching degree;
sorting the matching degrees to generate a degree sorting list;
and generating retrieval content based on the degree order list and the mapping relation.
6. A keyword-based archive forming method as claimed in claim 5, wherein the specific steps of matching the search term with the keyword and obtaining the matching degree include:
matching the search term with the keyword;
judging whether the keywords corresponding to the search terms exist or not;
if the keyword corresponding to the search term exists, the keyword is used as a target keyword;
acquiring a second association degree of the search term and the target keyword;
acquiring a keyword label of the target keyword;
and acquiring the matching degree based on the second association degree and the keyword label.
7. A keyword-based archive forming method as claimed in claim 6, wherein the specific step of obtaining the matching degree based on the second association degree and the keyword label includes:
acquiring an association score based on the second association degree and a preset association rule;
acquiring the label number corresponding to the keyword labels;
acquiring label scores based on the label number and preset label basic scores;
acquiring a matching score based on the association score, the tag score and a preset score weight;
and acquiring the matching degree based on the matching score and a preset matching rule.
8. A keyword-based archive forming system comprising:
the system comprises an acquisition module (1) for acquiring file characteristics of a file to be archived;
the generating module (2) is used for extracting keywords from the file to be archived based on the file characteristics and generating keywords;
the relation establishing module (3) is used for establishing a mapping relation between the keywords and the files to be archived;
the classification module (4) is used for classifying the files to be archived based on the keywords and the mapping relation and generating classification results;
and the archiving module (5) is used for archiving the files to be archived based on the classification result.
9. A smart terminal comprising a memory, a processor, wherein the memory is adapted to store a computer program capable of running on the processor, and wherein the processor, when loaded with the computer program, performs the method of any of claims 1 to 7.
10. A computer readable storage medium having a computer program stored therein, characterized in that the computer program, when loaded by a processor, performs the method of any of claims 1 to 7.
CN202410263005.7A 2024-03-08 2024-03-08 File forming method, system, terminal and storage medium based on keywords Pending CN117851340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410263005.7A CN117851340A (en) 2024-03-08 2024-03-08 File forming method, system, terminal and storage medium based on keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410263005.7A CN117851340A (en) 2024-03-08 2024-03-08 File forming method, system, terminal and storage medium based on keywords

Publications (1)

Publication Number Publication Date
CN117851340A true CN117851340A (en) 2024-04-09

Family

ID=90540485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410263005.7A Pending CN117851340A (en) 2024-03-08 2024-03-08 File forming method, system, terminal and storage medium based on keywords

Country Status (1)

Country Link
CN (1) CN117851340A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
US10176207B1 (en) * 2015-06-09 2019-01-08 Skyhigh Networks, Llc Wildcard search in encrypted text
CN112507068A (en) * 2020-11-30 2021-03-16 北京百度网讯科技有限公司 Document query method and device, electronic equipment and storage medium
US20220058214A1 (en) * 2018-12-28 2022-02-24 Shenzhen Sekorm Component Network Co., Ltd Document information extraction method, storage medium and terminal
CN117194322A (en) * 2023-09-01 2023-12-08 统信软件技术有限公司 File classification management method, system and computing device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5278980A (en) * 1991-08-16 1994-01-11 Xerox Corporation Iterative technique for phrase query formation and an information retrieval system employing same
US10176207B1 (en) * 2015-06-09 2019-01-08 Skyhigh Networks, Llc Wildcard search in encrypted text
US20220058214A1 (en) * 2018-12-28 2022-02-24 Shenzhen Sekorm Component Network Co., Ltd Document information extraction method, storage medium and terminal
CN112507068A (en) * 2020-11-30 2021-03-16 北京百度网讯科技有限公司 Document query method and device, electronic equipment and storage medium
CN117194322A (en) * 2023-09-01 2023-12-08 统信软件技术有限公司 File classification management method, system and computing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘天宇;: "一种基于Lucene的近义词关键字检索系统设计", 中国科技信息, no. 05, 27 February 2018 (2018-02-27) *

Similar Documents

Publication Publication Date Title
CN109992645B (en) Data management system and method based on text data
US9589208B2 (en) Retrieval of similar images to a query image
US8150170B2 (en) Statistical approach to large-scale image annotation
CN107480200B (en) Word labeling method, device, server and storage medium based on word labels
US20190108276A1 (en) Methods and system for semantic search in large databases
US20140214835A1 (en) System and method for automatically classifying documents
Aytar et al. Utilizing semantic word similarity measures for video retrieval
CN115270738B (en) Research and report generation method, system and computer storage medium
EP2577521A2 (en) Detection of junk in search result ranking
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
Krishnan et al. Bringing semantics in word image retrieval
CN113094538A (en) Image retrieval method, device and computer-readable storage medium
Zhang et al. Semantic image retrieval using region based inverted file
JP4703487B2 (en) Image classification method, apparatus and program
KR20130097018A (en) Method and apparatus for retrieving relevant data by using file-based query generation
CN117851340A (en) File forming method, system, terminal and storage medium based on keywords
CN113449063B (en) Method and device for constructing document structure information retrieval library
CN114706938A (en) Document tag determination method and device, electronic equipment and storage medium
Khalaf et al. News retrieval based on short queries expansion and best matching
Takashita et al. Tag recommendation for flickr using web browsing behavior
Budi et al. A Multidimensional Approach in Content-based Multimedia Information Retrieval System
CN114357952A (en) Method and system for labeling transfer article
Sedghpour et al. Web document categorization using naive bayes classifier and latent semantic analysis
Tsay et al. Personal photo organizer based on automated annotation framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination