CN117251587A - Intelligent information mining method for digital archives - Google Patents

Intelligent information mining method for digital archives Download PDF

Info

Publication number
CN117251587A
CN117251587A CN202311534225.0A CN202311534225A CN117251587A CN 117251587 A CN117251587 A CN 117251587A CN 202311534225 A CN202311534225 A CN 202311534225A CN 117251587 A CN117251587 A CN 117251587A
Authority
CN
China
Prior art keywords
digital
file
information
files
digital file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311534225.0A
Other languages
Chinese (zh)
Inventor
李燕强
齐少华
马国伟
张泽宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yinduo Shuzhi Archives Technology Industry Development Co ltd
Original Assignee
Beijing Yinduo Shuzhi Archives Technology Industry Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yinduo Shuzhi Archives Technology Industry Development Co ltd filed Critical Beijing Yinduo Shuzhi Archives Technology Industry Development Co ltd
Priority to CN202311534225.0A priority Critical patent/CN117251587A/en
Publication of CN117251587A publication Critical patent/CN117251587A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a digital archive intelligent information mining method, which relates to the technical field of digital archive mining and comprises the following steps: step one, data preprocessing; step two, classifying files; step three, extracting file information; step four, file marking; step five, file abstracts; step six, file analysis; step seven, record files, the advantage of the invention is: when a person inputs the digital file into the database, the digital file is set, so that the digital file can be set into a digital file which can be extracted in file analysis and a digital file which cannot be extracted, the digital file information of the same type in the database is extracted through file analysis, the hidden information in the file is conveniently found out through analysis between a plurality of digital files and the existing file, the hidden information in the file is searched by utilizing a search engine, and the effects of quickly consulting and understanding the hidden information in the digital file by the person are achieved.

Description

Intelligent information mining method for digital archives
Technical Field
The invention relates to the technical field of digital archive mining, in particular to an intelligent information mining method for digital archives.
Background
The digital archives have the characteristics of digitizing collection resources, networking information organization and transmission, expanding service range, sharing information resources, facilitating information retrieval and the like, and refer to an information space for storing and utilizing archival information resources, and are a digital archives group consisting of a plurality of archives resource groups, an archives information resource processing center and archives user groups.
The digital archives are a collection of a content management system, an integrated system and a digital information long-term storage system, and serve as digital archives taking unstructured data such as electronic files, archives and other information resources as main management objects, not only play a role of a data center, but also play a role of issuing and utilizing, but also have functions of orderly processing and integrated management, the orderly processing and management process comprises the whole process of collecting, creating, confirming, converting, archiving, managing, issuing and utilizing and the like covering file life cycle management practices, the integrated process comprises comprehensive, fusion and integration into a whole and integrated meaning, and the integrated management theory is applied to the whole process covering file information resource life cycle management practices in terms of the digital archives, namely the integrated theory is taken as a guide in the management idea, the integrated mechanism is taken as a core in the management action, the limit among management business flow mechanisms is broken through in the management view, and various archives information resource elements are treated in the whole management and optimized management level, the degree of the various archives information elements is improved, the authenticity, the integrity and the integrity of information resources are improved, and the integrated service demands are provided for users.
However, the existing digital file mining mode is inconvenient for people to review the digital file, hidden information in digital file information is reviewed in time and is inconvenient to understand, and the problem that people miss the hidden information is possibly caused.
Disclosure of Invention
The invention aims to provide a digital archive intelligent information mining method.
In order to solve the problems set forth in the background art, the invention provides the following technical scheme: an intelligent information mining method for digital files comprises the following steps,
preprocessing data in a digital file, reducing noise of audio and video in the digital file, performing morphological reduction on a text document and a picture, and extracting text data in the digital file;
classifying files, namely classifying text data in the digital files according to predefined categories;
step three, file information extraction, namely extracting key information and attributes from the digital file;
step four, marking the file, namely marking the specific meaning identified in the digital file;
step five, file abstracts are extracted from texts, pictures, audio and video in the digital files, and content in the inner wall of the digital files is extracted based on a statistical method and a graph model;
step six, file analysis, namely analyzing the relativity and rules of text data in a plurality of digital files, providing information of different rules in the same digital file which is classified in the step two, searching the information of the different rules through a search engine, sorting the searched information, and simplifying and postfix the sorted information behind characters of the different rules;
step seven, file records are recorded, personnel consult the data file information and upload the data file information to the database, and meanwhile, the corresponding files are classified, so that information is conveniently called from the inside of the database during file analysis in step six, and rule information of different files in the database is increased to be perfected.
As a further aspect of the invention: and removing irrelevant symbol icons and irrelevant words in the audio, video, documents and the words extracted from the pictures in the digital file by a person in the step one.
As a further aspect of the invention: in the second step, the personnel can carry out text classification on the digital files of different types through a machine learning algorithm.
As a further aspect of the invention: and thirdly, extracting rule information and expression information in the digital file.
As a further aspect of the invention: and in the fourth step, the name information, the place name information and the time information in the digital file are marked, so that the key information in the digital file can be quickly consulted when personnel quickly look up the file.
As a further aspect of the invention: in the fifth step, important contents in the text are identified by using a natural language processing and machine learning algorithm through the quality of the algorithm and the training data by using TextRank, BERT, GPT software, irrelevant details are removed, and the accuracy of abstract extraction in the digital file is improved.
As a further aspect of the invention: and when the digital files are analyzed, extracting the digital files classified in the step two, analyzing related information in the digital files, and distinguishing the digital files in analysis from the digital files in classification.
As a further aspect of the invention: after analyzing the different rule information, searching the information with different meanings in the digital file by using a search engine, and then marking the searched information at the rear of the different information after simplifying, so that people can conveniently and quickly preview the hidden information in the digital file when browsing the file.
As a further aspect of the invention: and in the seventh step, the digital files of the database are correspondingly classified into an adjustable digital file and an non-adjustable digital file, so that the adjustable digital file is conveniently adjusted during file analysis in the sixth step, the database is perfected, and the accuracy of the digital file analysis is improved.
By adopting the technical scheme, compared with the prior art, the invention has the beneficial effects that:
when a person inputs a digital archive template into a database, setting the digital archive, so that the digital archive is set into a digital archive which can be extracted in archive analysis and a digital archive which cannot be extracted, extracting digital archive information of the same type in the database through archive analysis, and conveniently improving the hidden information existing in the archive through analysis between a plurality of digital archives and the existing archive, thereby facilitating the person to search the hidden information in the archive, searching the hidden information in the archive by utilizing a search engine, and marking the searched hidden information behind the information, so that the effect of quickly searching and understanding the hidden information in the digital archive by the person is achieved;
according to the invention, the digital archives are respectively converted into the documents through the second step, the third step, the fourth step and the fifth step, the rule information and the expression information in the digital archives are extracted by using the marks, so that when personnel review archives information, special information is observed in time, the personnel name information, the place name information and the time information are marked, the personnel can quickly move corresponding positions when the personnel review the personnel name information, the place name information and the time information conveniently, the personnel can quickly preview the whole archives by using abstract extraction, whether the archives are needed or not is observed, and the personnel can quickly review archives;
after the file analysis is completed, personnel consult the file and mark whether the hidden information is needed, when the hidden information in the file can not be used as a reference file, the personnel records the digital file information in a database of the information which can not be called, otherwise, the digital file information is recorded in the information database which can be called, and when the file is analyzed, the software compares the file which can be called in the database with the existing file data, so that the analysis accuracy of the hidden information in the digital file is improved.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
The following describes the embodiments of the present invention further with reference to the drawings. The description of these embodiments is provided to assist understanding of the present invention, but is not intended to limit the present invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Embodiment 1 referring to fig. 1, the present invention provides a technical solution: an intelligent information mining method for digital files comprises the following steps,
preprocessing data in a digital file, reducing noise of audio and video in the digital file, performing morphological reduction on a text document and a picture, and extracting text data in the digital file;
classifying files, namely classifying text data in the digital files according to predefined categories;
step three, file information extraction, namely extracting key information and attributes from the digital file;
step four, marking the file, namely marking the specific meaning identified in the digital file;
step five, file abstracts are extracted from texts, pictures, audio and video in the digital files, and content in the inner wall of the digital files is extracted based on a statistical method and a graph model;
step six, file analysis, namely analyzing the relativity and rules of text data in a plurality of digital files, providing information of different rules in the same digital file which is classified in the step two, searching the information of the different rules through a search engine, sorting the searched information, and simplifying and postfix the sorted information behind characters of the different rules;
step seven, file recording, personnel consult the data file information and upload the data file information to the database, classify corresponding files at the same time, call information from the database in the step six when file analysis is convenient, and increase rule information of different files in the database to perfect.
Referring to fig. 1, the present invention provides a technical solution: in the fifth step, important contents in texts are identified by using natural language processing and machine learning algorithms through the quality of algorithms and training data by using TextRank, BERT, GPT software, irrelevant details are removed, the accuracy of abstracting in digital files is improved, when analysis is carried out through the digital files, the digital files classified in the second step are extracted, relevant information in a plurality of digital files is analyzed, when the digital files in analysis are distinguished from the digital files in classification, after analysis is carried out on different rule information, information with different meanings in the digital files is searched by using a search engine, then the searched information is simplified and marked behind different information, and therefore, people can conveniently and rapidly preview hidden information in the digital files when browsing the files;
in this embodiment, the digital file is processed through data preprocessing, so that the processing procedures of later classification, extraction and analysis of the digital file are improved, personnel classify the digital file according to different categories of information in the digital file, then file information is utilized to extract, personnel search file information later, and after the analysis of the digital file is completed, personnel can classify the digital file in advance and can not classify the digital file in advance according to the analysis result in the digital file.
When the digital file template is input into the database by personnel during use, the digital file is set, so that the digital file can be set into a digital file which can be extracted in file analysis and a digital file which cannot be extracted, the digital file information of the same type in the database is extracted through file analysis, the hidden information existing in the file is conveniently improved through analysis between a plurality of digital files and the existing file, the hidden information in the file is conveniently consulted by personnel, the hidden information in the file is searched by utilizing a search engine, and the searched hidden information is marked behind the information.
In a second embodiment, referring to fig. 1, a method for mining digital archive intelligent information includes the following steps,
preprocessing data in a digital file, reducing noise of audio and video in the digital file, performing morphological reduction on a text document and a picture, and extracting text data in the digital file;
classifying files, namely classifying text data in the digital files according to predefined categories;
step three, file information extraction, namely extracting key information and attributes from the digital file;
step four, marking the file, namely marking the specific meaning identified in the digital file;
step five, file abstracts are extracted from texts, pictures, audio and video in the digital files, and content in the inner wall of the digital files is extracted based on a statistical method and a graph model;
step six, file analysis, namely analyzing the relativity and rules of text data in a plurality of digital files, providing information of different rules in the same digital file which is classified in the step two, searching the information of the different rules through a search engine, sorting the searched information, and simplifying and postfix the sorted information behind characters of the different rules;
step seven, file records are recorded, personnel consult the data file information and upload the data file information to the database, and meanwhile, the corresponding files are classified, so that information is conveniently called from the inside of the database during file analysis in step six, and rule information of different files in the database is increased to be perfected.
Referring to fig. 1, the first personnel remove irrelevant symbol icons and irrelevant words in the extracted words in the digital files, the second personnel can perform text classification on different types of digital files through a machine learning algorithm, the third personnel extract rule information and expression information in the digital files, the fourth personnel mark name information and place name information in the digital files, so that when the personnel quickly look up the files, key information in the digital files can be quickly looked up, the fifth personnel can identify important contents in the text through the quality of algorithm and training data by using TextRank, BERT, GPT software through natural language processing and machine learning algorithm, irrelevant details are removed, and the accuracy of abstract extraction in the digital files is improved;
in this embodiment, a large number of files which can be called are stored in the database, and the files are called by software such as TextRank, BERT, GPT.
When the file information is used, the digital file is rotated and the document is immediately read by the step two, the step three, the step four and the step five, the rule information and the expression information in the digital file are extracted by the marks, so that when people review the file information, special information is observed in time, people name information, place name information and time information are marked, people can quickly move corresponding positions when reviewing the person name information, place name information and time information conveniently, and people can quickly preview the whole file by the abstract extraction, so that whether the file is needed is observed.
In a third embodiment, referring to fig. 1, the present invention provides a technical solution: an intelligent information mining method for digital files comprises the following steps,
preprocessing data in a digital file, reducing noise of audio and video in the digital file, performing morphological reduction on a text document and a picture, and extracting text data in the digital file;
classifying files, namely classifying text data in the digital files according to predefined categories;
step three, file information extraction, namely extracting key information and attributes from the digital file;
step four, marking the file, namely marking the specific meaning identified in the digital file;
step five, file abstracts are extracted from texts, pictures, audio and video in the digital files, and content in the inner wall of the digital files is extracted based on a statistical method and a graph model;
step six, file analysis, namely analyzing the relativity and rules of text data in a plurality of digital files, providing information of different rules in the same digital file which is classified in the step two, searching the information of the different rules through a search engine, sorting the searched information, and simplifying and postfix the sorted information behind characters of the different rules;
step seven, file records are recorded, personnel consult the data file information and upload the data file information to the database, and meanwhile, the corresponding files are classified, so that information is conveniently called from the inside of the database during file analysis in step six, and rule information of different files in the database is increased to be perfected.
Referring to fig. 1, in step seven, the digital files of the database are classified into an adjustable digital file and an non-adjustable digital file, so that the adjustable digital file can be conveniently adjusted during file analysis in step six, the database is perfected, and the accuracy of digital file analysis is improved.
In this embodiment, when a person needs to review the corresponding digital file, the person can query the digital file in the database.
When the digital file is analyzed, software can be compared with the existing file data from the database, so that the analysis accuracy of the hidden information in the digital file is improved.
Working principle:
firstly, personnel transmit a digital file to data preprocessing, and then the data preprocessing removes irrelevant symbol icons and irrelevant words in the audio, video, documents and characters extracted from pictures in the digital file, so that the digital file can be classified and analyzed later;
secondly, classifying and marking the processed files, identifying and marking key information in the files according to the classified files, extracting the content in the whole files by utilizing file abstracts, and facilitating the quick reference of personnel after extraction on the whole information of the digital files;
and finally, the file information in the database, which is correspondingly classified, is called, so that a large number of files and the existing files are conveniently analyzed, hidden information is analyzed, the hidden information is searched by utilizing a search engine, the searched information is simplified and marked behind the corresponding hidden information, people can conveniently and timely understand when looking up different information, then people can compare the files with other files to place, and the comparison accuracy of the files in the database to the existing files in the later period is conveniently improved.
The front, rear, left, right, up and down are all based on fig. 1 in the drawings of the specification, the face of the device facing the observer is defined as front, the left side of the observer is defined as left, and so on, according to the viewing angle of the person.
In the description of the present invention, it should be understood that the terms "center," "longitudinal," "lateral," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the scope of the present invention.
It should be noted that, the device structure and the drawings of the present invention mainly describe the principle of the present invention, in terms of the technology of the design principle, the arrangement of the power mechanism, the power supply system, the control system, etc. of the device is not completely described, and on the premise that the person skilled in the art understands the principle of the present invention, the specific details of the power mechanism, the power supply system and the control system can be clearly known, the control mode of the application file is automatically controlled by the controller, and the control circuit of the controller can be realized by simple programming of the person skilled in the art;
the standard parts used in the method can be purchased from the market, and can be customized according to the description of the specification and the drawings, the specific connection modes of the parts are conventional means such as mature bolts, rivets and welding in the prior art, the machines, the parts and the equipment are conventional models in the prior art, and the structures and the principles of the parts are all known by the skilled person through technical manuals or through conventional experimental methods.
The embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, and yet fall within the scope of the invention.

Claims (9)

1. A digital archive intelligent information mining method is characterized in that: comprises the steps of,
preprocessing data in a digital file, reducing noise of audio and video in the digital file, performing morphological reduction on a text document and a picture, and extracting text data in the digital file;
classifying files, namely classifying text data in the digital files according to predefined categories;
step three, file information extraction, namely extracting key information and attributes from the digital file;
step four, marking the file, namely marking the specific meaning identified in the digital file;
step five, file abstracts are extracted from texts, pictures, audio and video in the digital files, and content in the inner wall of the digital files is extracted based on a statistical method and a graph model;
step six, file analysis, namely analyzing the relativity and rules of text data in a plurality of digital files, providing information of different rules in the same digital file which is classified in the step two, searching the information of the different rules through a search engine, sorting the searched information, and simplifying and postfix the sorted information behind characters of the different rules;
step seven, file records are recorded, personnel consult the data file information and upload the data file information to the database, and meanwhile, the corresponding files are classified, so that information is conveniently called from the inside of the database during file analysis in step six, and rule information of different files in the database is increased to be perfected.
2. The method for mining digital archive intelligence information according to claim 1, wherein: and removing irrelevant symbol icons and irrelevant words in the audio, video, documents and the words extracted from the pictures in the digital file by a person in the step one.
3. The method for mining digital archive intelligence information according to claim 1, wherein: in the second step, the personnel can carry out text classification on the digital files of different types through a machine learning algorithm.
4. The method for mining digital archive intelligence information according to claim 1, wherein: and thirdly, extracting rule information and expression information in the digital file.
5. The method for mining digital archive intelligence information according to claim 1, wherein: and in the fourth step, the name information, the place name information and the time information in the digital file are marked, so that the key information in the digital file can be quickly consulted when personnel quickly look up the file.
6. The method for mining digital archive intelligence information according to claim 1, wherein: in the fifth step, important contents in the text are identified by using a natural language processing and machine learning algorithm through the quality of the algorithm and the training data by using TextRank, BERT, GPT software, irrelevant details are removed, and the accuracy of abstract extraction in the digital file is improved.
7. The method for mining digital archive intelligence information according to claim 1, wherein: and when the digital files are analyzed, extracting the digital files classified in the step two, analyzing related information in the digital files, and distinguishing the digital files in analysis from the digital files in classification.
8. The method for intelligent information mining of digital archives as set forth in claim 7, wherein: after analyzing the different rule information, searching the information with different meanings in the digital file by using a search engine, and then marking the searched information at the rear of the different information after simplifying, so that people can conveniently and quickly preview the hidden information in the digital file when browsing the file.
9. The method for intelligent information mining of digital archives as set forth in claim 6, wherein: and in the seventh step, the digital files of the database are correspondingly classified into an adjustable digital file and an non-adjustable digital file, so that the adjustable digital file is conveniently adjusted during file analysis in the sixth step, the database is perfected, and the accuracy of the digital file analysis is improved.
CN202311534225.0A 2023-11-17 2023-11-17 Intelligent information mining method for digital archives Pending CN117251587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311534225.0A CN117251587A (en) 2023-11-17 2023-11-17 Intelligent information mining method for digital archives

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311534225.0A CN117251587A (en) 2023-11-17 2023-11-17 Intelligent information mining method for digital archives

Publications (1)

Publication Number Publication Date
CN117251587A true CN117251587A (en) 2023-12-19

Family

ID=89137271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311534225.0A Pending CN117251587A (en) 2023-11-17 2023-11-17 Intelligent information mining method for digital archives

Country Status (1)

Country Link
CN (1) CN117251587A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065857A1 (en) * 2000-10-04 2002-05-30 Zbigniew Michalewicz System and method for analysis and clustering of documents for search engine
CN106682072A (en) * 2016-11-17 2017-05-17 安徽华博胜讯信息科技股份有限公司 Knowledge management based data mining method for digital archives
CN115062117A (en) * 2022-07-11 2022-09-16 北京四方智汇信息科技有限公司 Method for automatically generating and classifying documents based on natural language processing technology
CN116384889A (en) * 2022-11-24 2023-07-04 杭州半云科技有限公司 Intelligent analysis method for information big data based on natural language processing technology

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065857A1 (en) * 2000-10-04 2002-05-30 Zbigniew Michalewicz System and method for analysis and clustering of documents for search engine
CN106682072A (en) * 2016-11-17 2017-05-17 安徽华博胜讯信息科技股份有限公司 Knowledge management based data mining method for digital archives
CN115062117A (en) * 2022-07-11 2022-09-16 北京四方智汇信息科技有限公司 Method for automatically generating and classifying documents based on natural language processing technology
CN116384889A (en) * 2022-11-24 2023-07-04 杭州半云科技有限公司 Intelligent analysis method for information big data based on natural language processing technology

Similar Documents

Publication Publication Date Title
CN109992645B (en) Data management system and method based on text data
CN111753099B (en) Method and system for enhancing relevance of archive entity based on knowledge graph
CN102959578B (en) Forensic system and forensic method, and forensic program
CN111460252B (en) Automatic search engine method and system based on network public opinion analysis
CN109783787A (en) A kind of generation method of structured document, device and storage medium
CN102834832A (en) Forensic system, forensic method, and forensic program
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
JP3791877B2 (en) An apparatus for searching information using the reason for referring to a document
CN113190687A (en) Knowledge graph determining method and device, computer equipment and storage medium
CN110659310A (en) Intelligent search method for vehicle information
CN112052317A (en) Medical knowledge base intelligent retrieval system and method based on deep learning
Adetunji et al. Web Document Classification Using Naïve Bayes
Cui A preliminary study on the management strategy of university personnel files based on artificial intelligence technology
CN111353077B (en) Intelligent creation algorithm-based converged media collecting, editing and distributing system
CN102789466B (en) A kind of enquirement title quality judging method, enquirement bootstrap technique and device thereof
CN111159984A (en) Supplementary reading system with intelligence study note function
CN117251587A (en) Intelligent information mining method for digital archives
CN116595043A (en) Big data retrieval method and device
CN115526601A (en) File management method and device
JPH0744573A (en) Electronic filling device
CN113468377A (en) Video and literature association and integration method
CN112256912A (en) Intelligent marking analysis and playing method for trial video
CN111079394A (en) Internet-based government affair data form filling system and method
Abd Manaf et al. Review on statistical approaches for automatic image annotation
Xie The application of artificial intelligence technology in public library information retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination