WO2021137689A1 - Système de classification de matériaux de bibliothèque et procédé associé - Google Patents

Système de classification de matériaux de bibliothèque et procédé associé Download PDF

Info

Publication number
WO2021137689A1
WO2021137689A1 PCT/MY2020/050146 MY2020050146W WO2021137689A1 WO 2021137689 A1 WO2021137689 A1 WO 2021137689A1 MY 2020050146 W MY2020050146 W MY 2020050146W WO 2021137689 A1 WO2021137689 A1 WO 2021137689A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
patternprint
domain
module
library
Prior art date
Application number
PCT/MY2020/050146
Other languages
English (en)
Inventor
May Fern KOH
Soon Hin LEE
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2021137689A1 publication Critical patent/WO2021137689A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • This invention relates to data classification and more particularly to a system and method for classifying collection materials, preferably library materials.
  • a related prior art of US 5463773 A discloses a document classifying system for building a document classification tree by recursive optimization of keyword selection function.
  • Library materials in 773A are classified by utilizing document classifying building system to build a classification decision tree for data classification.
  • the 773A further utilizes keyword storage apparatus and sample data storage apparatus to store sample data formed by document data and to store keywords extracted from the sample data.
  • input data in 773A is assumed as correct data. Hence, the input data are directly classified without undergoing cleansing or data correction method.
  • 207B Another prior art of patent US 6976207 B1 (‘207B) discloses a method and apparatus for documents classification. Input data in ‘207B is also assumed to be as correct data and the input data is directly classified by utilizing a vector. Classification of library materials is determined by n-dimensional vector through one or more hyperplane to define subspace which represents each of classification schemes. Furthermore, inconsistency or error from data classification is determined by checking result of automatic classification with manual classification.
  • the present invention relates to a system for classifying collection materials
  • a material classification engine comprising a material classification engine, wherein the material classification engine further comprising a data retrieval module to retrieve data from at least one collection management system and to store the data in a database, and a data correction module to perform data cleansing by utilising fuzzy logic rules.
  • the system further comprising a data classification module to classify the cleansed data into domain specified categories by cross-referencing with a domain knowledge, wherein the data classification module utilises domain ontology to extract domain data to perform data classification, and a patternprint generator module to generate a patternprint for the classified data, wherein the patternprint represents a unique identification, ID to identify and group the collection materials in the domain knowledge.
  • the collection materials are library materials.
  • the collection management system is a library management system, LMS.
  • the domain knowledge is a library domain knowledge.
  • the present invention also relates to a method of classifying collection materials, comprising steps of retrieving and extracting data from a collection management system by a data retrieval module; performing data cleansing on the retrieved data using fuzzy logic rules by a data correction module; classifying the cleansed data using a domain knowledge by a data classification module; and generating patternprint for the classified data by a patternprint generator module, wherein the patternprint represents a unique identification, ID to identify and group the collection materials in the domain knowledge.
  • the step of performing data cleansing on the retrieved data using fuzzy logic rules by the data correction module further comprises steps of extracting fuzzy logic rules from the domain knowledge; executing the fuzzy logic rules; and performing data cleansing on the retrieved data with the fuzzy logic rules.
  • the step of classifying the cleansed data using a domain knowledge by the data classification module further comprises steps of identifying domain for each of the cleansed data; extracting domain data from the domain knowledge; and classifying the cleansed data of the collection material into domain specified categories based on the domain data.
  • the step of generating patternprint for the classified data by the patternprint generator module further comprises steps of retrieving and extracting the classified data; generating the patternprint wherein the patternprint comprises material properties; and updating the classified data with the patternprint.
  • Figure 1 is a diagram illustrating a block diagram of system architecture for classifying library materials in accordance to the present invention.
  • Figure 2 is a diagram illustrating an exemplary embodiment of a domain knowledge in accordance to the present invention.
  • Figure 3 is a flow chart representing a method of classifying library materials in accordance to the present invention.
  • Figure 4 is a flow chart representing an exemplary embodiment for a step of retrieving and extracting data from a library management system in accordance to the present invention.
  • Figure 5 is a flow chart representing an exemplary embodiment for a step of performing data cleansing on the retrieved data using fuzzy logic rules in accordance to the present invention.
  • FIG. 6 (a) and (b) illustrates an example of data cleansing in accordance to the present invention.
  • Figure 7 is a flow chart representing an exemplary embodiment for a step of classifying the cleansed data using the domain knowledge in accordance to the present invention.
  • Figure 8 is a flow chart representing an exemplary embodiment for a step of generating patternprint for the classified data in accordance to the present invention.
  • Figure 9 illustrates an example of the patternprint for ‘book harrypotterjkrowling’ in accordance to the present invention.
  • Figure 10 further illustrates the example of the patternprint for ‘book harrypotterjkrowling’ in accordance to the present invention.
  • the present invention relates to a system for classifying collection materials (100) from at least one collection management system (20), comprises of a material classification engine (10), a database (30) and a domain knowledge (40).
  • the material classification engine (10) further comprises a data retrieval module (11), a data correction module (12), a data classification module (13) and a patternprint generator module (14).
  • the collection materials are library materials and the collection management system (20) refers to a library management system, LMS.
  • the database (30) is preferably a local storage or a relational database management system, RDBMS. In one embodiment, the database (30) is PostgreSQL.
  • the domain knowledge (40) is a library domain knowledge which comprises of a plurality of major grouping not limited to material, resources, creators, subjects and fuzzy rules.
  • the material is preferably but not limited to a list of library materials such as book, electronic book, serial, proceeding paper, audio, video, digital and so on.
  • the resources comprises of work, expression, manifestation and item.
  • the work is an original handwritten by main author and the expression is numerous adaptions or translation of the original unfinished and unedited work.
  • the manifestation is a form of publication of the prints as well as editions, etc., while item is the physical copies of any manifestation which is available in library to be borrowed.
  • the creators comprises of a person or family or corporate body of the library materials.
  • the subjects is a form of object, event, place and concept of the library material.
  • the fuzzy rules stores a list of fuzzy logic rules such as “Soundex”, “edit distance”, and “number distance” which will be selected for execution by the material classification engine (10) in the present invention. Result after correction is made based on the fuzzy logic rules is stored in the database (30).
  • the patternprint comprises of material properties such as material type, title and author (if any).
  • the data retrieval module (11) is configured to collect data of library materials from the LMS and store the data in the database (30).
  • the data retrieved from the LMS may be in various formats and is preferably being transformed into a standard format before being stored in the database (30).
  • the material classification engine (10) connects to the LMS of the “MIMOS” library, and the data retrieval module (11 ) retrieves all data in the MIMOS’s LMS and stored in the database (30).
  • the data stored in the database (30) is further processed in the data correction module (12).
  • the data correction module (12) involves data cleansing includes fixing data duplication.
  • the data correction module (12) extracts the fuzzy logic rules from the library domain knowledge and executes the fuzzy logic rules on the data.
  • the fuzzy logic is applied based on material properties available to generate the patternprint.
  • fuzzy logic rules with “Soundex” function may be selected.
  • ‘JK Rowling’ is also same as ‘J.K. Rowling’, ‘J.K Rowling’, ‘JK. Rowling’, ‘Rowling J.K.’.
  • fuzzy logic rules with “edit distance” function may be selected, for example, ‘Harry Potter’ is also same as ‘Harry Poter’, ‘Potter Harry’.
  • fuzzy logic rules with “number distance” function may be selected and for date of publication, fuzzy logic rules with “date distance” function may be selected, for example ⁇ 997 Dec 2’ is also same as ‘2 December 1997’.
  • the inconsistency of how data was input by different individual in the LMS are corrected with the fuzzy logic rules and therefore fix the duplicated data.
  • Results after correction made by the data correction module (12) is stored in the database (30) for further processing by the data classification module (13).
  • the data classification module (13) classifies the library materials into domain specified categories namely classified data.
  • the data classification module (13) utilises domain ontology to extract domain data to perform data classification.
  • data classification is categorising the data according to domain of the data based on the library domain knowledge as illustrated in Figure 2.
  • the data classification module (13) categorise ‘book’ as the domain for ‘book material’.
  • the result generated from the data classification module (13) is subsequently stored in the database (30) as the classified data.
  • the patternprint generator module (14) generates a patternprint, wherein the patternprint is a unique identification, ID that represents material grouping to identify the library materials.
  • the patternprint serves as a unique key to match and classified similar material.
  • the patternprint is generated using material properties, such as material type, title and author (if any).
  • the patternprint can also be further configured or add-on for other attributes such as ISBN, ISSN, publisher, publication date, language and etc.
  • the patternprint is updated to each of the classified data and is subsequently stored in the database (30).
  • the present invention also relates to a method of classifying library materials (200) as illustrated in Figure 3.
  • the method (200) comprising steps of retrieving and extracting data from the collection management system (210) by the data retrieval module, and performing data cleansing on the retrieved data using fuzzy logic rules (220) by the data correction module.
  • the method (200) further comprising steps of classifying the cleansed data using the domain knowledge (230) by the data classification module, and generating patternprint for the classified data (240) by the patternprint generator module, wherein the patternprint represents a unique identification, ID to identify and group the collection materials in the domain knowledge.
  • the method (200) also comprises a step of storing the patternprint in the database (250).
  • the method (200) begins by the material classification engine connects to at least one or a plurality of the library management system, LMS for retrieving and extracting data from the collection management system (210) by the data retrieval module. Then, either by manually invoking the LMS or by a scheduled arrangement to trigger for data retrieval, the data retrieval module retrieves and collects data from the LMS (211 ). In one embodiment, the retrieved data are extracted and transformed from various formats into a standard format. The transformed data are then stored in the database (212).
  • the method (200) proceeds to the step of performing data cleansing on the retrieved data using fuzzy logic rules (220) by the data correction module.
  • Said step (220) further comprises steps of extracting the transformed data from the database and extracting a set of fuzzy logic rules from the library domain knowledge (221 ).
  • the list of fuzzy logic rules such as “Soundex”, “edit distance”, and “number distance” is extracted and executed on the data that are stored in the database (222).
  • the data correction module performs data correction on the transformed data by applying the fuzzy logic rules to cleanse the data (223).
  • the cleansed data are updated and stored in the database (224) for further classification.
  • Figures 6 illustrates an example of data cleansing by the step 220, wherein Figure 6(a) illustrates inconsistent data for title and author of a book before the data cleansing, and Figure 6(b) illustrates the cleansed and corrected data after the data cleansing.
  • the method (200) continues with the step of classifying the cleansed data using the domain knowledge (230) by the data classification module with reference to Figure 7.
  • the step (230) further comprises step of identifying domain for the each of the cleansed data (231), for example, the data classification module may categorises ‘library’ as the domain for ‘library material’.
  • the step 230 further comprises extracting domain data from the domain knowledge (232), preferably the library domain knowledge in accordance to the present invention.
  • the data classification module classifies all the cleansed data by inferencing the library domain knowledge on the library materials such as book, audio, video, digital and so on.
  • all library materials are classified into domain specified categories (233).
  • the domain specified categories may refer to book, audio, video and digital form as classified data.
  • the classified data are further stored in the database.
  • the method (200) proceeds with generating patternprint for the classified data (240) by the patternprint generator module.
  • Said step (240) includes the step of retrieving and extracting the classified data (241) from the database. Then, the patternprint generator module generates the patternprint on all the classified data (242).
  • Figure 9 illustrates an example of the patternprint for book harrypotterjkrowling.
  • the patternprint comprises but not limited to material properties such as material type, material title and material author (if any). As shown in Figure 10, different unique patternprint differentiate the material categories. For example with reference to Figure 10, the patternprint for Flarry Potter are as follows: book harrypotterjkrowling for Book Material, audiojiarrypotterjkrowling for Audio Material, videojiarrypotterjkrowling for Video Material, and digital_harrypotterJkrowling for Digital Material. The patternprint is further updated to each of the classified data (243) by performing grouping using the patternprint.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un système de classification de matériaux de collecte (100), en particulier pour la classification de matériaux de bibliothèque, comprenant un moteur de classification de matériau (10). Le moteur de classification de matériau (10) comprend en outre un module de récupération de données (11) pour récupérer des données à partir d'au moins un système de gestion de collecte (20) et pour stocker les données dans une base de données (30). Le moteur de classification de matériau (10) comprenant en outre un module de correction de données (12) pour effectuer un nettoyage de données en utilisant une règle de logique floue, un module de classification de données (13) pour classer les données nettoyées en catégories spécifiées par domaine et un module de générateur d'impression de motif (14) pour générer une impression de motif pour les données classifiées. L'invention concerne également un procédé de classification des matériaux de collecte (200).
PCT/MY2020/050146 2019-12-31 2020-11-12 Système de classification de matériaux de bibliothèque et procédé associé WO2021137689A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2019007942 2019-12-31
MYPI2019007942 2019-12-31

Publications (1)

Publication Number Publication Date
WO2021137689A1 true WO2021137689A1 (fr) 2021-07-08

Family

ID=76686744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2020/050146 WO2021137689A1 (fr) 2019-12-31 2020-11-12 Système de classification de matériaux de bibliothèque et procédé associé

Country Status (1)

Country Link
WO (1) WO2021137689A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298696A (zh) * 2022-01-24 2022-04-08 嘉应学院 一种基于云计算的数字图书馆知识管理系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040187075A1 (en) * 2003-01-08 2004-09-23 Maxham Jason G. Document management apparatus, system and method
US20080294616A1 (en) * 2007-05-21 2008-11-27 Data Trace Information Services, Llc System and method for database searching using fuzzy rules
US20120254085A1 (en) * 2008-03-28 2012-10-04 International Business Machines Corporation Information classification system, information processing apparatus, information classification method and program
US20160180245A1 (en) * 2014-12-19 2016-06-23 Medidata Solutions, Inc. Method and system for linking heterogeneous data sources
US20190147103A1 (en) * 2017-11-13 2019-05-16 Accenture Global Solutions Limited Automatic Hierarchical Classification and Metadata Identification of Document Using Machine Learning and Fuzzy Matching

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040187075A1 (en) * 2003-01-08 2004-09-23 Maxham Jason G. Document management apparatus, system and method
US20080294616A1 (en) * 2007-05-21 2008-11-27 Data Trace Information Services, Llc System and method for database searching using fuzzy rules
US20120254085A1 (en) * 2008-03-28 2012-10-04 International Business Machines Corporation Information classification system, information processing apparatus, information classification method and program
US20160180245A1 (en) * 2014-12-19 2016-06-23 Medidata Solutions, Inc. Method and system for linking heterogeneous data sources
US20190147103A1 (en) * 2017-11-13 2019-05-16 Accenture Global Solutions Limited Automatic Hierarchical Classification and Metadata Identification of Document Using Machine Learning and Fuzzy Matching

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298696A (zh) * 2022-01-24 2022-04-08 嘉应学院 一种基于云计算的数字图书馆知识管理系统

Similar Documents

Publication Publication Date Title
CA2423033C (fr) Systeme de categorisation de documents
Barbosa et al. Combining classifiers to identify online databases
He et al. Statistical schema matching across web query interfaces
US7409404B2 (en) Creating taxonomies and training data for document categorization
US7047236B2 (en) Method for automatic deduction of rules for matching content to categories
US20040249808A1 (en) Query expansion using query logs
US8788464B1 (en) Fast ingest, archive and retrieval systems, method and computer programs
TW200426622A (en) Method and apparatus for content representation and retrieval in concept model space
GB2377046A (en) Metadata generation
CN111177432B (zh) 一种基于分层深度哈希的大规模图像检索方法
CN112597283A (zh) 通知文本信息实体属性抽取方法、计算机设备及存储介质
Färber et al. A high-quality gold standard for citation-based tasks
US20040122660A1 (en) Creating taxonomies and training data in multiple languages
WO2021137689A1 (fr) Système de classification de matériaux de bibliothèque et procédé associé
US20040186833A1 (en) Requirements -based knowledge discovery for technology management
Mahdi et al. A citation-based approach to automatic topical indexing of scientific literature
Evangelista et al. Adaptive and flexible blocking for record linkage tasks
Vadivel et al. An Effective Document Category Prediction System Using Support Vector Machines, Mann-Whitney Techniques
KR100659370B1 (ko) 시소러스 매칭에 의한 문서 db 형성 방법 및 정보검색방법
Irshad et al. SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data.
WO2020067870A1 (fr) Procédé et système pour fournir une liste de contenus sur la base d'une requête de recherche
Borges et al. A classification-based approach for bibliographic metadata deduplication
Ramachandran et al. Document Clustering Using Keyword Extraction
WO2020101478A1 (fr) Système et procédé de gestion d'entités dupliquées faisant appel à une cardinalité de relation dans un référentiel de base de connaissances de production
CN116860977B (zh) 一种面向矛盾纠纷调解的异常检测系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910850

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910850

Country of ref document: EP

Kind code of ref document: A1