CN116069730A - Knowledge management system and construction method thereof - Google Patents

Knowledge management system and construction method thereof Download PDF

Info

Publication number
CN116069730A
CN116069730A CN202310165996.0A CN202310165996A CN116069730A CN 116069730 A CN116069730 A CN 116069730A CN 202310165996 A CN202310165996 A CN 202310165996A CN 116069730 A CN116069730 A CN 116069730A
Authority
CN
China
Prior art keywords
file
knowledge
type
original file
management system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310165996.0A
Other languages
Chinese (zh)
Inventor
常宏伟
彭珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing E Hualu Information Technology Co Ltd
Original Assignee
Beijing E Hualu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing E Hualu Information Technology Co Ltd filed Critical Beijing E Hualu Information Technology Co Ltd
Priority to CN202310165996.0A priority Critical patent/CN116069730A/en
Publication of CN116069730A publication Critical patent/CN116069730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Library & Information Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a knowledge management system and a construction method thereof, wherein the method comprises the following steps: receiving an original file uploaded by a knowledge provider; analyzing the type of the original file, and extracting file content from the original file according to the type; and segmenting the file content, and constructing an inverted index of the knowledge management system based on the segmentation result. According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.

Description

Knowledge management system and construction method thereof
Technical Field
The application belongs to the technical field of computers, and particularly relates to a knowledge management system and a construction method thereof.
Background
The knowledge management system is a platform for enterprises to realize knowledge management, and the overall aim is to integrate various knowledge resources in the enterprises into a dynamic knowledge system so as to promote knowledge innovation, and drive the improvement of labor productivity through the continuous improvement of knowledge innovation capability, so that the core competitiveness of the enterprises is finally improved. Within a certain organization, the organization members may be knowledge or experience providers, or knowledge users. By establishing an effective mechanism, knowledge in an organization can be well managed, and sharing and searching are facilitated, so that convenience is brought to establishing a learning organization, and the problems of information asymmetry, difficulty in experience popularization and the like are solved.
However, the document formats supported by existing knowledge management systems are limited, resulting in limited knowledge that can be shared.
Content of the application
The embodiment of the application aims to provide a knowledge management system and a construction method thereof, so as to solve the defect that the knowledge shared by the existing knowledge management system is limited.
In order to solve the technical problems, the application is realized as follows:
in a first aspect, a method for constructing a knowledge management system is provided, including the following steps:
receiving an original file uploaded by a knowledge provider;
analyzing the type of the original file, and extracting file content from the original file according to the type;
and segmenting the file content, and constructing an inverted index of the knowledge management system based on the segmentation result.
In a second aspect, there is provided a knowledge management system comprising:
the receiving module is used for receiving the original file uploaded by the knowledge provider;
the analysis module is used for analyzing the type of the original file and extracting file content from the original file according to the type;
and the construction module is used for segmenting the file content and constructing an inverted index of the knowledge management system based on the segmentation result.
According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
Drawings
FIG. 1 is a flowchart of a method for constructing a knowledge management system according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a knowledge management system according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The existing knowledge management system has limited document formats and cannot cover various file formats. For example, a file that is biased to a text-like type, a file that supports a format such as doc, txt, pdf, or the like, cannot support a picture, video, and audio file at the same time. Even though pdf and pictures may be some form of text content, matching content cannot be searched, and the same problem exists with knowledge of audio and video classes.
Taking the following scenario as an example, when an organization receives a file conveyed by a higher level, the file is stored in the form of a scanned file, and only internal conveyance or file reservation is desired. Since the communicated document is a scanned document, it is necessary to manually transcribe the scanned document into a text form, and such an operation is time-consuming and labor-consuming. If text conversion is not performed, the query cannot be performed according to the content, so that the availability and usability of knowledge are reduced, and the knowledge exists inside an organization as an archive file. The audio and video files are more so, and with the increase of conference activities on online offices and heterogeneous wires, a great deal of knowledge of audio and video types can be generated. For such a scenario, knowledge management systems are required to increase the ability of different types of file content extraction.
The method for constructing the knowledge management system provided in the embodiment of the present application is described in detail below by means of specific embodiments and application scenarios thereof with reference to the accompanying drawings.
As shown in fig. 1, a flowchart of a method for constructing a knowledge management system according to an embodiment of the present application is provided, where the method includes the following steps:
step 101, receiving an original file uploaded by a knowledge provider.
In this embodiment, after receiving the original file uploaded by the knowledge provider, knowledge may be formed according to the original file and the classification information and description information corresponding to the original file, and the knowledge may be saved to the knowledge management system.
Further, after the knowledge is saved to the knowledge management system, online preview, play, print and download operations may also be performed according to the type of knowledge.
And 102, analyzing the type of the original file, and extracting file content from the original file according to the type.
Specifically, under the condition that the type of the original file is text type, directly extracting file content from the original file;
extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type; extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type; and under the condition that the type of the original file is a video type, the original file can be converted into an audio file through an audio extraction module, and then the file content is extracted from the audio file through a voice recognition module.
Wherein OCR (optical character recognition ) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks characters printed on paper and then translates the shape into computer text using a character recognition method; namely, the text data is scanned, and then the image file is analyzed and processed to obtain the text and layout information.
The speech recognition module extracts the file content from the audio file by means of ASR (Automatic Speech Recognition ) technology, a technology that converts human speech into text. Speech recognition is a multi-disciplinary, intersecting domain that is tightly coupled to numerous disciplines such as acoustics, speech, linguistics, digital signal processing theory, information theory, computer science, and the like. Due to the variety and complexity of speech signals, speech recognition systems can only achieve satisfactory performance under certain constraints or can only be used in certain specific applications.
And 103, word segmentation is carried out on the file content, and an inverted index of a knowledge management system is constructed based on the word segmentation result.
In this embodiment, after the inverted index of the knowledge management system is constructed based on the word segmentation result, the keywords input by the knowledge user on the search page can be obtained; and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
In the embodiment of the application, the implementation process for constructing the knowledge management system comprises the following steps: file uploading- > type analysis- > OCR/audio extraction content- > text analysis- > index warehousing- > search- > knowledge presentation.
In the file uploading step, a knowledge provider uploads a document in a certain format, and information such as classification, description and the like is added to form knowledge; in the step of type analysis, the system analyzes according to the type of the uploaded file, distributes and carries out the next content analysis work, namely, directly extracts the content of the text type, extracts the content of the picture type through an OCR module, extracts the content of the audio type through a voice recognition module, converts the video type into audio through the audio extraction module, and extracts the content through the voice recognition module.
Further, in the indexing and warehousing step, the content is segmented to construct an inverted index; in the searching step, the knowledge user searches the keywords on the page, and the system returns a knowledge list according to the matching degree of the content and the keywords; in the knowledge showing step, online previewing, playing, printing, downloading and other operations are performed according to the knowledge type.
According to the embodiment of the invention, the file content is extracted from the pdf or doc format file formed by the picture format or the scanning piece through the OCR algorithm, so that the file with the format in the knowledge management system can be searched, the types of knowledge can be enriched, the acquisition of the knowledge is easier, the content which can be searched by a user is more various, and more scenes can be covered. In addition, the ASR technology can extract the content of the audio or video into the text, so that the audio and video files in the knowledge management system can be searched, the diversity of knowledge is improved, the content of the knowledge management system is enriched, and the usability of the system is improved.
Fig. 2 is a schematic structural diagram of a knowledge management system according to an embodiment of the present application, including:
the receiving module 210 is configured to receive an original file uploaded by a knowledge provider.
And the analysis module 220 is used for analyzing the type of the original file and extracting file content from the original file according to the type.
Specifically, the analysis module 220 is specifically configured to analyze a type of the original file, and directly extract file content from the original file when the type of the original file is a text type; extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type; extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type; and under the condition that the type of the original file is a video type, converting the original file into an audio file through an audio extraction module, and extracting file contents from the audio file through a voice recognition module.
The construction module 230 is configured to segment the content of the file, and construct an inverted index of the knowledge management system based on the segmentation result.
Further, the system further comprises:
the retrieval module is used for acquiring keywords input by a knowledge user on a retrieval page; and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
And the storage module is used for forming knowledge according to the original file and the corresponding classification information and description information thereof, and storing the knowledge to the knowledge management system.
And the processing module is used for executing online preview, play, print and download operations according to the knowledge type.
According to the embodiment of the application, the file content is extracted from the original file according to the type of the original file, and the inverted index is constructed, so that files with various formats in the knowledge management system can be searched, the document types supported by the knowledge management system are enriched, the knowledge management system can cover more scenes, the knowledge diversity is improved, the content of the knowledge management system is enriched, and the usability of the knowledge management system is improved.
The embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the above embodiment of the method for constructing a knowledge management system, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims (10)

1. The construction method of the knowledge management system is characterized by comprising the following steps:
receiving an original file uploaded by a knowledge provider;
analyzing the type of the original file, and extracting file content from the original file according to the type;
and segmenting the file content, and constructing an inverted index of the knowledge management system based on the segmentation result.
2. The method according to claim 1, wherein said extracting file content from said original file according to said type comprises:
extracting file content directly from the original file under the condition that the type of the original file is a text type;
extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type;
extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type;
and under the condition that the type of the original file is a video type, converting the original file into an audio file through an audio extraction module, and extracting file contents from the audio file through a voice recognition module.
3. The method of claim 1, further comprising, after constructing the inverted index of the knowledge management system based on the word segmentation result:
acquiring keywords input by a knowledge user on a retrieval page;
and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
4. The method of claim 1, wherein after receiving the original file uploaded by the knowledge provider, further comprising:
forming knowledge according to the original file and the corresponding classification information and description information, and storing the knowledge to the knowledge management system.
5. The method of claim 4, wherein after the saving the knowledge to the knowledge management system, further comprising:
and performing online preview, play, print and download operations according to the type of knowledge.
6. A knowledge management system, comprising:
the receiving module is used for receiving the original file uploaded by the knowledge provider;
the analysis module is used for analyzing the type of the original file and extracting file content from the original file according to the type;
and the construction module is used for segmenting the file content and constructing an inverted index of the knowledge management system based on the segmentation result.
7. The system of claim 6, wherein the system further comprises a controller configured to control the controller,
the analysis module is specifically configured to analyze a type of the original file, and directly extract file content from the original file when the type of the original file is a text type; extracting file content from the original file through an optical character recognition OCR module under the condition that the type of the original file is a picture type; extracting file content from the original file through a voice recognition module under the condition that the type of the original file is an audio type; and under the condition that the type of the original file is a video type, converting the original file into an audio file through an audio extraction module, and extracting file contents from the audio file through a voice recognition module.
8. The system of claim 6, further comprising:
the retrieval module is used for acquiring keywords input by a knowledge user on a retrieval page; and returning a knowledge list according to the matching degree of the file content of the knowledge management system and the keywords.
9. The system of claim 6, further comprising:
and the storage module is used for forming knowledge according to the original file and the corresponding classification information and description information thereof, and storing the knowledge to the knowledge management system.
10. The system of claim 9, further comprising:
and the processing module is used for executing online preview, play, print and download operations according to the knowledge type.
CN202310165996.0A 2023-02-15 2023-02-15 Knowledge management system and construction method thereof Pending CN116069730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310165996.0A CN116069730A (en) 2023-02-15 2023-02-15 Knowledge management system and construction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310165996.0A CN116069730A (en) 2023-02-15 2023-02-15 Knowledge management system and construction method thereof

Publications (1)

Publication Number Publication Date
CN116069730A true CN116069730A (en) 2023-05-05

Family

ID=86181974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310165996.0A Pending CN116069730A (en) 2023-02-15 2023-02-15 Knowledge management system and construction method thereof

Country Status (1)

Country Link
CN (1) CN116069730A (en)

Similar Documents

Publication Publication Date Title
US7689037B2 (en) System and method for identifying and labeling fields of text associated with scanned business documents
Arai et al. Automatic e-comic content adaptation
US6823311B2 (en) Data processing system for vocalizing web content
US6353840B2 (en) User-defined search template for extracting information from documents
US8107727B2 (en) Document processing apparatus, document processing method, and computer program product
US8244037B2 (en) Image-based data management method and system
US6766069B1 (en) Text selection from images of documents using auto-completion
US20040015775A1 (en) Systems and methods for improved accuracy of extracted digital content
US7743347B2 (en) Paper-based interface for specifying ranges
US7088859B1 (en) Apparatus for processing machine-readable code printed on print medium together with human-readable information
CN102196130A (en) Image processing apparatus and image processing method
JP2002132547A (en) Server for electronics information control, client therefor, method therefor and readable record medium recording program therefor
US20130259377A1 (en) Conversion of a document of captured images into a format for optimized display on a mobile device
US8850359B2 (en) Image processor and image processing method
CN105956098B (en) A kind of correlating method and system of paper printed matter and e-sourcing
US20150278248A1 (en) Personal Information Management Service System
CN111276149A (en) Voice recognition method, device, equipment and readable storage medium
CN110136688A (en) A kind of text-to-speech method and relevant device based on speech synthesis
US20060167899A1 (en) Meta-data generating apparatus
CN115774805A (en) File intelligent query method and system based on digital processing
CN115273840A (en) Voice interaction device and voice interaction method
CN112633042A (en) Digital file management system and method
CN112464907A (en) Document processing system and method
JPH1166196A (en) Document image recognition device and computer-readable recording medium where program allowing computer to function as same device is recorded
CN116069730A (en) Knowledge management system and construction method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination