CN116069730A - Knowledge management system and construction method thereof - Google Patents
Knowledge management system and construction method thereof Download PDFInfo
- Publication number
- CN116069730A CN116069730A CN202310165996.0A CN202310165996A CN116069730A CN 116069730 A CN116069730 A CN 116069730A CN 202310165996 A CN202310165996 A CN 202310165996A CN 116069730 A CN116069730 A CN 116069730A
- Authority
- CN
- China
- Prior art keywords
- file
- knowledge
- type
- original file
- management system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Library & Information Science (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a knowledge management system and a construction method thereof, wherein the method comprises the following steps: receiving an original file uploaded by a knowledge provider; analyzing the type of the original file, and extracting file content from the original file according to the type; and segmenting the file content, and constructing an inverted index of the knowledge management system based on the segmentation result. According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
Description
Technical Field
The application belongs to the technical field of computers, and particularly relates to a knowledge management system and a construction method thereof.
Background
The knowledge management system is a platform for enterprises to realize knowledge management, and the overall aim is to integrate various knowledge resources in the enterprises into a dynamic knowledge system so as to promote knowledge innovation, and drive the improvement of labor productivity through the continuous improvement of knowledge innovation capability, so that the core competitiveness of the enterprises is finally improved. Within a certain organization, the organization members may be knowledge or experience providers, or knowledge users. By establishing an effective mechanism, knowledge in an organization can be well managed, and sharing and searching are facilitated, so that convenience is brought to establishing a learning organization, and the problems of information asymmetry, difficulty in experience popularization and the like are solved.
However, the document formats supported by existing knowledge management systems are limited, resulting in limited knowledge that can be shared.
Content of the application
The embodiment of the application aims to provide a knowledge management system and a construction method thereof, so as to solve the defect that the knowledge shared by the existing knowledge management system is limited.
In order to solve the technical problems, the application is realized as follows:
in a first aspect, a method for constructing a knowledge management system is provided, including the following steps:
receiving an original file uploaded by a knowledge provider;
analyzing the type of the original file, and extracting file content from the original file according to the type;
and segmenting the file content, and constructing an inverted index of the knowledge management system based on the segmentation result.
In a second aspect, there is provided a knowledge management system comprising:
the receiving module is used for receiving the original file uploaded by the knowledge provider;
the analysis module is used for analyzing the type of the original file and extracting file content from the original file according to the type;
and the construction module is used for segmenting the file content and constructing an inverted index of the knowledge management system based on the segmentation result.
According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
Drawings
FIG. 1 is a flowchart of a method for constructing a knowledge management system according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a knowledge management system according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The existing knowledge management system has limited document formats and cannot cover various file formats. For example, a file that is biased to a text-like type, a file that supports a format such as doc, txt, pdf, or the like, cannot support a picture, video, and audio file at the same time. Even though pdf and pictures may be some form of text content, matching content cannot be searched, and the same problem exists with knowledge of audio and video classes.
Taking the following scenario as an example, when an organization receives a file conveyed by a higher level, the file is stored in the form of a scanned file, and only internal conveyance or file reservation is desired. Since the communicated document is a scanned document, it is necessary to manually transcribe the scanned document into a text form, and such an operation is time-consuming and labor-consuming. If text conversion is not performed, the query cannot be performed according to the content, so that the availability and usability of knowledge are reduced, and the knowledge exists inside an organization as an archive file. The audio and video files are more so, and with the increase of conference activities on online offices and heterogeneous wires, a great deal of knowledge of audio and video types can be generated. For such a scenario, knowledge management systems are required to increase the ability of different types of file content extraction.
The method for constructing the knowledge management system provided in the embodiment of the present application is described in detail below by means of specific embodiments and application scenarios thereof with reference to the accompanying drawings.
As shown in fig. 1, a flowchart of a method for constructing a knowledge management system according to an embodiment of the present application is provided, where the method includes the following steps:
In this embodiment, after receiving the original file uploaded by the knowledge provider, knowledge may be formed according to the original file and the classification information and description information corresponding to the original file, and the knowledge may be saved to the knowledge management system.
Further, after the knowledge is saved to the knowledge management system, online preview, play, print and download operations may also be performed according to the type of knowledge.
And 102, analyzing the type of the original file, and extracting file content from the original file according to the type.
Specifically, under the condition that the type of the original file is text type, directly extracting file content from the original file;
extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type; extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type; and under the condition that the type of the original file is a video type, the original file can be converted into an audio file through an audio extraction module, and then the file content is extracted from the audio file through a voice recognition module.
Wherein OCR (optical character recognition ) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks characters printed on paper and then translates the shape into computer text using a character recognition method; namely, the text data is scanned, and then the image file is analyzed and processed to obtain the text and layout information.
The speech recognition module extracts the file content from the audio file by means of ASR (Automatic Speech Recognition ) technology, a technology that converts human speech into text. Speech recognition is a multi-disciplinary, intersecting domain that is tightly coupled to numerous disciplines such as acoustics, speech, linguistics, digital signal processing theory, information theory, computer science, and the like. Due to the variety and complexity of speech signals, speech recognition systems can only achieve satisfactory performance under certain constraints or can only be used in certain specific applications.
And 103, word segmentation is carried out on the file content, and an inverted index of a knowledge management system is constructed based on the word segmentation result.
In this embodiment, after the inverted index of the knowledge management system is constructed based on the word segmentation result, the keywords input by the knowledge user on the search page can be obtained; and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
In the embodiment of the application, the implementation process for constructing the knowledge management system comprises the following steps: file uploading- > type analysis- > OCR/audio extraction content- > text analysis- > index warehousing- > search- > knowledge presentation.
In the file uploading step, a knowledge provider uploads a document in a certain format, and information such as classification, description and the like is added to form knowledge; in the step of type analysis, the system analyzes according to the type of the uploaded file, distributes and carries out the next content analysis work, namely, directly extracts the content of the text type, extracts the content of the picture type through an OCR module, extracts the content of the audio type through a voice recognition module, converts the video type into audio through the audio extraction module, and extracts the content through the voice recognition module.
Further, in the indexing and warehousing step, the content is segmented to construct an inverted index; in the searching step, the knowledge user searches the keywords on the page, and the system returns a knowledge list according to the matching degree of the content and the keywords; in the knowledge showing step, online previewing, playing, printing, downloading and other operations are performed according to the knowledge type.
According to the embodiment of the invention, the file content is extracted from the pdf or doc format file formed by the picture format or the scanning piece through the OCR algorithm, so that the file with the format in the knowledge management system can be searched, the types of knowledge can be enriched, the acquisition of the knowledge is easier, the content which can be searched by a user is more various, and more scenes can be covered. In addition, the ASR technology can extract the content of the audio or video into the text, so that the audio and video files in the knowledge management system can be searched, the diversity of knowledge is improved, the content of the knowledge management system is enriched, and the usability of the system is improved.
Fig. 2 is a schematic structural diagram of a knowledge management system according to an embodiment of the present application, including:
the receiving module 210 is configured to receive an original file uploaded by a knowledge provider.
And the analysis module 220 is used for analyzing the type of the original file and extracting file content from the original file according to the type.
Specifically, the analysis module 220 is specifically configured to analyze a type of the original file, and directly extract file content from the original file when the type of the original file is a text type; extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type; extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type; and under the condition that the type of the original file is a video type, converting the original file into an audio file through an audio extraction module, and extracting file contents from the audio file through a voice recognition module.
The construction module 230 is configured to segment the content of the file, and construct an inverted index of the knowledge management system based on the segmentation result.
Further, the system further comprises:
the retrieval module is used for acquiring keywords input by a knowledge user on a retrieval page; and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
And the storage module is used for forming knowledge according to the original file and the corresponding classification information and description information thereof, and storing the knowledge to the knowledge management system.
And the processing module is used for executing online preview, play, print and download operations according to the knowledge type.
According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
The embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the above embodiment of the method for constructing a knowledge management system, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.
Claims (10)
1. The construction method of the knowledge management system is characterized by comprising the following steps:
receiving an original file uploaded by a knowledge provider;
analyzing the type of the original file, and extracting file content from the original file according to the type;
and segmenting the file content, and constructing an inverted index of the knowledge management system based on the segmentation result.
2. The method according to claim 1, wherein said extracting file content from said original file according to said type comprises:
extracting file content directly from the original file under the condition that the type of the original file is a text type;
extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type;
extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type;
and under the condition that the type of the original file is a video type, converting the original file into an audio file through an audio extraction module, and extracting file contents from the audio file through a voice recognition module.
3. The method of claim 1, further comprising, after constructing the inverted index of the knowledge management system based on the word segmentation result:
acquiring keywords input by a knowledge user on a retrieval page;
and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
4. The method of claim 1, wherein after receiving the original file uploaded by the knowledge provider, further comprising:
forming knowledge according to the original file and the corresponding classification information and description information, and storing the knowledge to the knowledge management system.
5. The method of claim 4, wherein after the saving the knowledge to the knowledge management system, further comprising:
and performing online preview, play, print and download operations according to the type of knowledge.
6. A knowledge management system, comprising:
the receiving module is used for receiving the original file uploaded by the knowledge provider;
the analysis module is used for analyzing the type of the original file and extracting file content from the original file according to the type;
and the construction module is used for segmenting the file content and constructing an inverted index of the knowledge management system based on the segmentation result.
7. The system of claim 6, wherein the system further comprises a controller configured to control the controller,
the analysis module is specifically configured to analyze a type of the original file, and directly extract file content from the original file when the type of the original file is a text type; extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type; extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type; and under the condition that the type of the original file is a video type, converting the original file into an audio file through an audio extraction module, and extracting file contents from the audio file through a voice recognition module.
8. The system of claim 6, further comprising:
the retrieval module is used for acquiring keywords input by a knowledge user on a retrieval page; and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
9. The system of claim 6, further comprising:
and the storage module is used for forming knowledge according to the original file and the corresponding classification information and description information thereof, and storing the knowledge to the knowledge management system.
10. The system of claim 9, further comprising:
and the processing module is used for executing online preview, play, print and download operations according to the knowledge type.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165996.0A CN116069730A (en) | 2023-02-15 | 2023-02-15 | Knowledge management system and construction method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165996.0A CN116069730A (en) | 2023-02-15 | 2023-02-15 | Knowledge management system and construction method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116069730A true CN116069730A (en) | 2023-05-05 |
Family
ID=86181974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310165996.0A Pending CN116069730A (en) | 2023-02-15 | 2023-02-15 | Knowledge management system and construction method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116069730A (en) |
-
2023
- 2023-02-15 CN CN202310165996.0A patent/CN116069730A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7689037B2 (en) | System and method for identifying and labeling fields of text associated with scanned business documents | |
Arai et al. | Automatic e-comic content adaptation | |
US6823311B2 (en) | Data processing system for vocalizing web content | |
US6353840B2 (en) | User-defined search template for extracting information from documents | |
US8107727B2 (en) | Document processing apparatus, document processing method, and computer program product | |
US8244037B2 (en) | Image-based data management method and system | |
US6766069B1 (en) | Text selection from images of documents using auto-completion | |
US20040015775A1 (en) | Systems and methods for improved accuracy of extracted digital content | |
US7743347B2 (en) | Paper-based interface for specifying ranges | |
US7088859B1 (en) | Apparatus for processing machine-readable code printed on print medium together with human-readable information | |
CN102196130A (en) | Image processing apparatus and image processing method | |
JP2002132547A (en) | Server for electronics information control, client therefor, method therefor and readable record medium recording program therefor | |
US20130259377A1 (en) | Conversion of a document of captured images into a format for optimized display on a mobile device | |
US8850359B2 (en) | Image processor and image processing method | |
CN105956098B (en) | A kind of correlating method and system of paper printed matter and e-sourcing | |
US20150278248A1 (en) | Personal Information Management Service System | |
CN111276149A (en) | Voice recognition method, device, equipment and readable storage medium | |
CN110136688A (en) | A kind of text-to-speech method and relevant device based on speech synthesis | |
US20060167899A1 (en) | Meta-data generating apparatus | |
CN115774805A (en) | File intelligent query method and system based on digital processing | |
CN115273840A (en) | Voice interaction device and voice interaction method | |
CN112633042A (en) | Digital file management system and method | |
CN112464907A (en) | Document processing system and method | |
JPH1166196A (en) | Document image recognition device and computer-readable recording medium where program allowing computer to function as same device is recorded | |
CN116069730A (en) | Knowledge management system and construction method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |