WO2015051499A1 - Method and system for processing content information - Google Patents

Method and system for processing content information Download PDF

Info

Publication number
WO2015051499A1
WO2015051499A1 PCT/CN2013/084854 CN2013084854W WO2015051499A1 WO 2015051499 A1 WO2015051499 A1 WO 2015051499A1 CN 2013084854 W CN2013084854 W CN 2013084854W WO 2015051499 A1 WO2015051499 A1 WO 2015051499A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
information
metadata
index
queried
Prior art date
Application number
PCT/CN2013/084854
Other languages
French (fr)
Chinese (zh)
Inventor
施有铸
陈晓峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201380079592.4A priority Critical patent/CN105531697B/en
Priority to PCT/CN2013/084854 priority patent/WO2015051499A1/en
Publication of WO2015051499A1 publication Critical patent/WO2015051499A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Definitions

  • the present invention relates to the field of database technologies, and in particular, to a content information processing method and system. Background technique
  • ECM Enterprise Content Management
  • Enterprise Content management is used to create content (Create Store (Distribute Discovery ( Archive ) and Manage ( Manage ) and deliver relevant content to the user when needed).
  • the types of data contained in the content can be generally divided into two types.
  • One is data that can be represented by the same hierarchical structure, that is, structured data, which is usually stored in the database in the form of a data table; the other is to
  • Various forms of multimedia content exist, such as txt text, word text, textual content of pdf text, spreadsheets, presentation files and e-mail binary files, audio, graphics, images, video and other multimedia format data.
  • Metadata refers to a type of data information that describes data and its environment.
  • Content Metadata refers to data describing the attributes of the content and its environment, including but not limited to: the name and content of the content. The size, the storage format of the content, the title of the content, a summary of the content, the keywords in the content, and the author of the content.
  • RDB Relational DataBase
  • content metadata As a storage system for content metadata, this only applies if the number of content managed by the ECM system is small. When the number of content managed by the ECM system is large (for example, hundreds of millions), because RDB is limited to its storage capacity, it is difficult to store such massive content metadata, especially when the information of a single content metadata is large, The content metadata (Add), delete (Delete Edit (Search), etc. becomes very slow and inefficient. If content information such as user comments, document body, etc. is also stored in the RDB, use The ECM system of a relational database can manage fewer things.
  • embodiments of the present invention provide a method and system for processing content information, which can effectively enhance the management capability of the content management system for content information containing large amounts of data.
  • a content information processing system including:
  • a content index creation module for capturing content and creating in the content index database for the above a content index of the content, the content index being a unique identifier of the content in the content information processing system;
  • the content information extraction module is configured to extract first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and other related information of the content other than the metadata of the content;
  • the content information storage processing module is configured to compare each metadata of the content in the first information of the content with a threshold of a preset data amount size, and store the content index and the metadata of the content not higher than the threshold value to the content In the index database, metadata related to the content of the threshold and other related information of the content other than the metadata of the content and the content index of the first information of the content are stored in the content information database.
  • the content information processing system further includes: a threshold setting module, configured to set a threshold of a data amount size for comparing metadata of the content.
  • a threshold setting module configured to set a threshold of a data amount size for comparing metadata of the content.
  • the foregoing content information processing system further includes: a content legal verification module, configured to perform legality verification on other related information of the content other than the content metadata to obtain second information of the content verified as legal, and the content is The second information is sent to the content information storage module.
  • the content information storage processing module is further configured to store the metadata of the content higher than the threshold, the second information of the content, and the content index into the content information database.
  • the content information processing system further includes: a retrievability determining module, configured to perform retrievability judgment on other related information or content second information of the content other than the content metadata Broken, and the information judged by the retrievability is identified as the third information of the content.
  • a retrievability determining module configured to perform retrievability judgment on other related information or content second information of the content other than the content metadata Broken, and the information judged by the retrievability is identified as the third information of the content.
  • the content information processing system further includes: a full-text search library information importing module, configured to import metadata, content third information, and content index of the content higher than the threshold according to a preset configuration template to the full text Search in the library.
  • a full-text search library information importing module configured to import metadata, content third information, and content index of the content higher than the threshold according to a preset configuration template to the full text Search in the library.
  • the content information processing system further includes: a full-text search library information processing module, configured to delete data of the content in the full-text search library when receiving a notification that the content is temporarily deleted; and when receiving the When the content is restored, the notification content information retrieval module re-imports the metadata of the content, the third information of the content, and the content index higher than the threshold into the full-text search library according to the preset configuration template.
  • a full-text search library information processing module configured to delete data of the content in the full-text search library when receiving a notification that the content is temporarily deleted.
  • the content information processing system further includes: a full-text search library information processing module, configured to: when receiving the notification that the content is temporarily deleted, set the “content available” field of the content in the full-text search library to “ “Not available”; and used to reset the “Content Available” field of the content in the full-text search library to "Available” when a notification that a content has been restored is received.
  • a full-text search library information processing module configured to: when receiving the notification that the content is temporarily deleted, set the “content available” field of the content in the full-text search library to “ “Not available”; and used to reset the "Content Available” field of the content in the full-text search library to "Available” when a notification that a content has been restored is received.
  • the content information processing system further includes:
  • the query content obtaining module is configured to receive a content information query request, parse the query request, and obtain the content to be queried;
  • the content information querying module is configured to search the content to be queried in the content index database, and when the information of the content to be queried is retrieved, the information of the content to be queried is fed back to the query result sending module; when the result is not retrieved, Then the content to be queried is searched in the full-text search library, if the search
  • the information of the content to be queried is sent to the query result sending module, if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used in the content information database to obtain a query.
  • the query result sending module is configured to send the information of the content to be queried to the issuer of the content information query request.
  • the present invention also provides a method of processing content information, the method comprising: capturing content and creating a content index for the content in a content index database, the content index being the content in the content information processing system Uniquely identifies;
  • the method further includes: receiving a setting of a data size threshold.
  • the method further includes: performing legality verification on other related information of the content other than the content metadata to obtain second information of the content verified to be legal; Metadata of the content higher than the threshold and other related information of the content other than the content metadata in the first information of the content and the content index are stored in a content information database, Specifically, the metadata of the content, the second information of the content, and the content index higher than the threshold are stored in the content information database.
  • the metadata of the content, the third information of the content, and the content index that are higher than the threshold are imported into the full-text search library according to a preset configuration template.
  • the threshold is higher than the threshold
  • the content information query request is received, and the query request is parsed to obtain the content to be queried;
  • the content to be queried is searched in the full-text search database, and if the information of the content to be queried is retrieved, information about the content to be queried is sent to the content
  • the sender of the information query request if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used to query the content information database to obtain the information of the content to be queried, and The information describing the inquiry content is sent to the issuer of the content information inquiry request.
  • the content information processing method and system provided by the embodiment of the present invention, by using metadata of content not higher than a set threshold, metadata of content higher than a set threshold, and content metadata except
  • the other information is separately managed, that is, the content metadata not higher than the set threshold is stored in the content index database, and the metadata of the content higher than the set threshold and the content metadata are excluded.
  • the other relevant information of the content is stored in the content information database, which reduces the storage pressure of the content index database that is often used for retrieval, so that the content index database can store more content, and the content information database can be stored.
  • Other related information of potentially more content thus solving the problem of storage and management of mass content, and effectively improving the management ability of the content management system for content information containing large amounts of data.
  • FIG. 1 is a schematic diagram of a networking structure of Embodiment 1 of a content information processing system according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a networking structure of Embodiment 2 of a content information processing system according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing the networking structure of Embodiment 3 of the content information processing system according to the embodiment of the present invention.
  • FIG. 4 is a schematic diagram showing the networking structure of Embodiment 4 of the content information processing system according to the embodiment of the present invention.
  • FIG. 5 is a schematic diagram showing the networking structure of Embodiment 5 of the content information processing system according to the embodiment of the present invention.
  • FIG. 6 is a schematic diagram showing the networking structure of Embodiment 6 of the content information processing system according to the embodiment of the present invention.
  • FIG. 7 is a schematic flowchart diagram of a content information processing method according to an embodiment of the present invention.
  • program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.
  • embodiments can be implemented in other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
  • Computing device Embodiments can also be implemented in a distributed computing environment where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules can be located in both local and remote memory storage devices.
  • Embodiments may be implemented as a computer implemented process, a computing system, or a computer storage medium such as a computer program product or a computer program of a computer system for executing the instructions of the example process.
  • a computer readable storage medium can be implemented via one or more of volatile computer memory, nonvolatile memory, a hard drive, a flash drive, a floppy disk or a compact disk and the like.
  • server generally refers to a computing device that typically executes one or more software programs in a networked environment.
  • the server can also be implemented as a virtual server (software program) executing on one or more computing devices that are considered servers on the network.
  • system package includes:
  • a content index creation module 1 10 for capturing content and creating a content index for the content in a content index database, the content index being a unique identifier of the content in the content information processing system;
  • the content information extraction module 130 is configured to extract first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and a location other than the content metadata Other relevant information about the content;
  • the content information storage processing module 150 is configured to compare each metadata of the content in the first information of the content with a threshold of a preset data amount size, and index the content and not higher than Metadata of the content of the threshold is stored in the content index database, and the metadata of the content higher than the threshold and the first information of the content are other than the content metadata Other relevant information of the content and the content index are stored in the content information database.
  • the processing system of the content information described above stores the metadata of the content whose data size is not higher than a certain threshold value into the content index database in consideration of the size of the data amount in the metadata of the content, and the data size is higher than a certain amount.
  • the metadata of a threshold content and other related information other than the metadata of the content are stored in the content information database, and by the above separate storage, it is ensured that the content index database can store the content index of the mass content, and the content is improved.
  • the ability of the content index database to read data; and the content information database can be used to store large amounts of information of content, Database access capabilities for content information.
  • the processing system can be deployed in a server cloud consisting of one server or multiple servers.
  • the foregoing threshold for comparing the data size of the metadata of the content may be built in the system, or may be set in advance by the user through the Ul (User Interface) interface, and the embodiments of the present invention are not Add limit.
  • the content information processing system introduced by the embodiment of the present invention may further include: a threshold setting module 120, configured to set a data amount for comparing metadata of the content.
  • the threshold of size.
  • the content index database 170 is a database for storing the data index of the content index and the metadata of the content, and may be a relational database (Relational Database) and other collection-based algebras.
  • Relational Database Relational Database
  • Concepts and methods to process data databases including but not limited to: Oracle, SQL (Structured Query Language), Access v Db2, SQLServer, Sybase, etc.
  • the content information database 190 is metadata for storing content whose data amount is larger than a set threshold, and other related information of content other than metadata of the content (including but not limited to: user-entered tags, classification information, Rating level of content, comment information on content, rating, etc.). In the specific implementation process, you can use the storage with massive data and the non-off of the reading ability.
  • a type database includes but is not limited to: Apache Hbase database.
  • the system can also The content legal verification module 140 is configured to perform legality verification on other related information of the content other than the metadata of the content to obtain the second information of the verified content, and send the second information of the content to the content information.
  • the storage module 150; the content information storage processing module 150 is further configured to store the metadata of the content above the threshold, the second information of the content, and the content index into the content information database 190.
  • the system adds a retrievability judgment module 180 for the metadata of the content.
  • the other information of the external content or the second information of the content is subjected to a searchability judgment, and the information judged by the searchability is identified as the third information of the content.
  • a full-text search library 160 is added to provide a full-text search function.
  • the system may further include a full-text search library information importing module 161, configured to import the metadata of the content higher than the threshold, the third information of the content, and the content index into the full-text search library 160 according to the preset configuration template. .
  • a full-text search library information importing module 161 configured to import the metadata of the content higher than the threshold, the third information of the content, and the content index into the full-text search library 160 according to the preset configuration template.
  • the content information processing system further includes: a full-text search library information processing module 162, configured to delete the full-text search library 160 when receiving a notification that the content is temporarily deleted Data of the content; and for notifying the content information retrieval module of metadata of the content higher than the threshold, third information of the content, and when receiving the notification that the content is restored
  • the content index is re-imported into the full-text search library 160 according to a preset configuration template.
  • the full-text search library information processing module 162 is further configured to: when receiving the notification that the content is temporarily deleted, set the "content available" field of the content in the full-text search library to "not available”; Upon receiving the notification that the content is restored, the "content available" field of the content in the full-text search library 160 is reset to "available”.
  • the full-text search library information processing module 162 can enable the corresponding content to be retrieved through the full-text search library when the content is temporarily deleted, and when the content is restored, the related information of the content can be inquired in the full-text search library. To. The data addition and deletion functions of the full-text database are better improved.
  • the content information processing system may further add the following module: the query content obtaining module 210 is configured to receive the content information query request, parse the query request, and obtain the content to be queried;
  • the content information querying module 230 is configured to retrieve the content to be queried in the content index database 170. When the information of the content to be queried is retrieved, the information of the content to be queried is fed back to the query result.
  • the sending module 250 when the result is not retrieved, the content to be queried is searched in the full-text search database 160, and if the information of the content to be queried is retrieved, the information of the content to be queried is Feedback to the query result sending module 250, if the content identifier of the content to be queried is retrieved, the content of the content to be queried is used to identify the content
  • the information in the information database 190 is queried to obtain the information of the content to be queried, and the information of the content to be queried is fed back to the query result sending module 250;
  • the query result sending module 250 is configured to send information of the content to be queried to the issuer of the content information query request.
  • the system supports the user's database query operation, and in the process of query, the content added in the embodiment of the present invention
  • the index database 170 stores metadata information of content not higher than a preset threshold
  • the content information database 190 stores metadata of content higher than a preset threshold and other information related to content other than metadata of the content, such that The retrieval of the query information is first performed in the content index database.
  • the content index database 170 is not retrieved, the content information database 190 is retrieved, so that the content index database can store more indexes of the content. Satisfy the storage requirements for big data, on the other hand, it can improve the data reading performance of the content index database.
  • An embodiment of the present invention further provides a method for processing content information. Referring to FIG. 6, the method includes:
  • step 310 there is no sequential relationship between step 310 and step 330. In the specific implementation process, the two can be replaced.
  • the processing method of the content information in consideration of the size of the data amount in the metadata of the content, storing the metadata of the content whose data size is not higher than a certain threshold value into the content index database, and the data amount is higher than a certain amount
  • the metadata of a threshold content and other related information other than the metadata of the content are stored in the content information database, and by the above separate storage, it is ensured that the content index database can store the content index of the mass content, and the content is improved.
  • the content index database has the ability to read data; and the content information database can be used to store large data volume information of the content, which improves the database access capability of the content information.
  • optional, for the data size threshold can be preset by the user.
  • the user may evaluate the content, classify the information, score the information, etc., in order to ensure the legality of the information, optionally,
  • the above method can increase the legality verification process, namely:
  • the method Before storing the metadata of the content higher than the foregoing threshold and the first information of the content in the first information except the metadata of the content and the content index into the content information database, the method further includes:
  • the above-mentioned method adds a searchability judgment link, that is, Other related information of the content other than the metadata of the content or the second information of the content is subjected to a searchability judgment, and the information judged by the searchability is identified as the third information of the content.
  • the metadata of the content, the third information of the content, and the content index higher than the threshold are imported into the full-text search library according to the preset configuration template.
  • the above method may further include:
  • the content index is re-imported into the full-text search library according to a preset configuration template.
  • Another method for implementing data deletion and recovery is to set a "content available" field for the content in the full-text search library, and when the notification that the content is temporarily deleted is received, the content of the content in the full-text search library is available. "The field is set to "Not available”; and is used to reset the "Content Available” field of the content in the full-text search library to "Available” when a notification is received that the content is restored.
  • the method further includes: receiving a content information query request, parsing the query request, and obtaining the to-be-queried content;
  • the content to be queried is searched in the full-text search database, and if the information of the content to be queried is retrieved, information about the content to be queried is sent to the content
  • the sender of the information query request if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used to query the content information database to obtain the information of the content to be queried, and The information describing the inquiry content is sent to the issuer of the content information inquiry request.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention contributes in essence or to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Disclosed are a method and system for processing content information. The method comprises: capturing content and creating a content index of the context in a content index database, wherein the content index is the unique identification of the content; extracting first information corresponding to the content index, wherein the first information comprises: metadata of the content and other relevant information about the content except the metadata of the content; and comparing each piece of metadata of the content with a preset threshold of data volume respectively, storing the content index and the metadata of the content that is not greater than the threshold in the content index database, and storing the metadata of the content that is greater than the threshold, other relevant information about the content except the metadata of the content, and the content index in a content information database. In this way, the serious problems of storing and managing massive contents are solved, and the capability of a content management system to manage content information containing a huge amount of data is improved effectively.

Description

一种内容信息处理方法和系统 技术领域  Content information processing method and system
本发明涉及数据库技术领域,尤其涉及一种内容信息处理方法和系统。 背景技术  The present invention relates to the field of database technologies, and in particular, to a content information processing method and system. Background technique
企业内容管理( ECM , Enterprise Content Management )技术是一种通 过计算机系统对内容( Content )进行管理的技术,在企事业单位、 政府机关 中被广泛使用 ,有时被简称为内容管理( Content Management )b 企业内容 管理被用来对内容进行创建( Create 存储( Store 分发( Distribute 发现( Discover 归档( Archive )以及管理( Manage ) ,并在用户需要时传 递( Deliver )相关内容给用户。  ECM (Enterprise Content Management) technology is a technology for managing content through computer systems. It is widely used in enterprises, government agencies, and sometimes referred to as content management (b). Enterprise content management is used to create content (Create Store (Distribute Discovery ( Archive ) and Manage ( Manage ) and deliver relevant content to the user when needed).
内容所包含的数据种类一般可分为两种, 一种是可以用相同的层次结构 来表示的数据,即结构化数据,通常以数据表的形式存放在数据库之中 ;另 外一种就是以各种形式的多媒体内容存在,如 txt文本、 word文本、 pdf文本 存在的文本类内容, 电子表格、 简报档案与电子邮件等二进制文件,声音、 图形、 图像、 视频等多媒体格式数据等。  The types of data contained in the content can be generally divided into two types. One is data that can be represented by the same hierarchical structure, that is, structured data, which is usually stored in the database in the form of a data table; the other is to Various forms of multimedia content exist, such as txt text, word text, textual content of pdf text, spreadsheets, presentation files and e-mail binary files, audio, graphics, images, video and other multimedia format data.
在数据库领域,元数据( Metadata )指的是描述数据及其环境的一类数 据信息。 相应的,在 ECM系统中 ,内容元数据( Content Metadata )指的是 对内容的属性及其环境进行描述的数据,包括但不限于: 内容的名称、 内容 的大小、 内容的存储格式、 内容的标题、 内容的摘要、 内容中的关键词以及 内容的作者等。 In the database world, Metadata refers to a type of data information that describes data and its environment. Correspondingly, in the ECM system, Content Metadata refers to data describing the attributes of the content and its environment, including but not limited to: the name and content of the content. The size, the storage format of the content, the title of the content, a summary of the content, the keywords in the content, and the author of the content.
除内容元数据之外,还有其他的内容信息需要被管理,包括但不限于: 用户对内容的评论信息,文档的正文信息、 内容所属的专题分类信息,以及 传统的 ECM系统一般通过关系数据库( RDB , Relational DataBase ) 作为内容元数据的存储系统,而这仅仅适用于 ECM系统管理的内容数目较少 的情况下。 当 ECM系统所管理的内容数目很多时(譬如:上亿条),因 RDB 受限于其存储容量,就难以存储如此海量的内容元数据,特别当单个内容元 数据的信息很大时,对该内容元数据的增加( Add )、 删除( Delete 修改 ( Modify 查询( Search )等操作就变得非常缓慢,效率低下。 如果将用户 评论、文档正文等类型的内容信息也存储在 RDB中 ,使用关系数据库的 ECM 系统所能管理的内容的数目就更少。  In addition to the content metadata, there are other content information that needs to be managed, including but not limited to: user's comment information on the content, the body information of the document, the topic classification information to which the content belongs, and the traditional ECM system generally through a relational database. ( RDB , Relational DataBase ) As a storage system for content metadata, this only applies if the number of content managed by the ECM system is small. When the number of content managed by the ECM system is large (for example, hundreds of millions), because RDB is limited to its storage capacity, it is difficult to store such massive content metadata, especially when the information of a single content metadata is large, The content metadata (Add), delete (Delete Edit (Search), etc. becomes very slow and inefficient. If content information such as user comments, document body, etc. is also stored in the RDB, use The ECM system of a relational database can manage fewer things.
发明内容 Summary of the invention
鉴于此,本发明的实施例提供一种内容信息的处理方法和系统,能够有 效地提升内容管理系统对含有大数据量内容信息的管理能力。  In view of this, embodiments of the present invention provide a method and system for processing content information, which can effectively enhance the management capability of the content management system for content information containing large amounts of data.
一方面,提供了一种内容信息处理系统,包括:  In one aspect, a content information processing system is provided, including:
内容索引创建模块,用于捕获内容并在内容索引数据库中创建针对上述 内容的内容索引 ,该内容索引是所述内容在所述内容信息处理系统的唯一标 识; a content index creation module for capturing content and creating in the content index database for the above a content index of the content, the content index being a unique identifier of the content in the content information processing system;
内容信息提取模块,用于提取上述内容索引对应的内容的第一信息,其 中 ,该内容的第一信息包括: 内容的元数据,以及除内容的元数据之外的内 容的其他相关信息;  The content information extraction module is configured to extract first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and other related information of the content other than the metadata of the content;
内容信息存储处理模块,用于对内容的第一信息中的内容的各个元数据 分别与预设的数据量大小的阈值进行比较,将内容索引以及不高于阈值的内 容的元数据存储到内容索引数据库中 ,将高于阈值的内容的元数据以及内容 的第一信息中除内容的元数据之外的内容的其他相关信息以及内容索引存储 到内容信息数据库中。  The content information storage processing module is configured to compare each metadata of the content in the first information of the content with a threshold of a preset data amount size, and store the content index and the metadata of the content not higher than the threshold value to the content In the index database, metadata related to the content of the threshold and other related information of the content other than the metadata of the content and the content index of the first information of the content are stored in the content information database.
可选的 ,上述内容信息处理系统还包括: 阈值设定模块,用于设定对内 容的元数据进行比较的数据量大小的阈值。  Optionally, the content information processing system further includes: a threshold setting module, configured to set a threshold of a data amount size for comparing metadata of the content.
可选的 ,上述内容信息处理系统还包括: 内容合法验证模块,用于对除 内容元数据之外的内容的其他相关信息进行合法性验证得到验证为合法的内 容的第二信息,并将内容的第二信息发送给内容信息存储模块; 内容信息存 储处理模块,具体还用于将高于阈值的内容的元数据、 内容的第二信息以及 内容索引存储到所述内容信息数据库中。  Optionally, the foregoing content information processing system further includes: a content legal verification module, configured to perform legality verification on other related information of the content other than the content metadata to obtain second information of the content verified as legal, and the content is The second information is sent to the content information storage module. The content information storage processing module is further configured to store the metadata of the content higher than the threshold, the second information of the content, and the content index into the content information database.
可选的 ,上述内容信息处理系统还包括: 可检索性判断模块,用于对除 内容元数据之外的内容的其他相关信息或内容的第二信息进行可检索性判 断,并将通过可检索性判断的信息标识为内容的第三信息。 Optionally, the content information processing system further includes: a retrievability determining module, configured to perform retrievability judgment on other related information or content second information of the content other than the content metadata Broken, and the information judged by the retrievability is identified as the third information of the content.
可选的 ,上述内容信息处理系统还包括:全文检索库信息导入模块,用 于将高于所述阈值的内容的元数据、 内容的第三信息以及内容索引根据预设 的配置模板导入至全文检索库中。  Optionally, the content information processing system further includes: a full-text search library information importing module, configured to import metadata, content third information, and content index of the content higher than the threshold according to a preset configuration template to the full text Search in the library.
可选的 ,上述内容信息处理系统还包括:全文检索库信息处理模块,用 于当接收到某内容被临时删除的通知时,删除全文检索库中该内容的数据; 以及用于当接收到某内容被恢复的通知时,通知内容信息检索模块将高于所 述阈值的所述内容的元数据、 该内容的第三信息以及内容索引根据预设的配 置模板重新导入至全文检索库中。  Optionally, the content information processing system further includes: a full-text search library information processing module, configured to delete data of the content in the full-text search library when receiving a notification that the content is temporarily deleted; and when receiving the When the content is restored, the notification content information retrieval module re-imports the metadata of the content, the third information of the content, and the content index higher than the threshold into the full-text search library according to the preset configuration template.
可选的 ,上述内容信息处理系统还包括:全文检索库信息处理模块,还 用于当接收到某内容被临时删除的通知时,将全文检索库中该内容的" 内容 可用" 字段设置为" 不可用" ;以及用于当接收到某内容被恢复的通知时,将 全文检索库中该内容的" 内容可用" 字段重新设置为" 可用"。  Optionally, the content information processing system further includes: a full-text search library information processing module, configured to: when receiving the notification that the content is temporarily deleted, set the “content available” field of the content in the full-text search library to “ "Not available"; and used to reset the "Content Available" field of the content in the full-text search library to "Available" when a notification that a content has been restored is received.
可选的 ,所述内容信息处理系统还包括:  Optionally, the content information processing system further includes:
查询内容获取模块,用于接收内容信息查询请求,对该查询请求进行解析, 获取待查询内容;  The query content obtaining module is configured to receive a content information query request, parse the query request, and obtain the content to be queried;
内容信息查询模块,用于将待查询内容在内容索引数据库中进行检索,当 检索到待查询内容的信息时,则将待查询内容的信息反馈给查询结果发送模 块; 当未检索到结果时,则将待查询内容在全文检索库中进行检索,若检索 到待查询内容的信息时,则将待查询内容的信息反馈给查询结果发送模块, 若检索到待查询内容的内容标识时,则利用待查询内容的内容标识在内容信 息数据库中查询得到待查询内容的信息,并将待查询内容的信息反馈给查询 结果发送模块; The content information querying module is configured to search the content to be queried in the content index database, and when the information of the content to be queried is retrieved, the information of the content to be queried is fed back to the query result sending module; when the result is not retrieved, Then the content to be queried is searched in the full-text search library, if the search When the information of the content to be queried is sent to the query result sending module, if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used in the content information database to obtain a query. Content information, and feedback information of the content to be queried to the query result sending module;
查询结果发送模块,用于将待查询内容的信息发送给内容信息查询请求 的发出者。  The query result sending module is configured to send the information of the content to be queried to the issuer of the content information query request.
另一方面,本发明还提供了一种内容信息的处理方法,该方法包括: 捕获内容并在内容索引数据库中创建针对所述内容的内容索引 ,上述内 容索引是该内容在内容信息处理系统的唯一标识;  In another aspect, the present invention also provides a method of processing content information, the method comprising: capturing content and creating a content index for the content in a content index database, the content index being the content in the content information processing system Uniquely identifies;
提取内容索引对应的内容的第一信息,其中 , 内容的第一信息包括: 内 容的元数据,以及除内容元数据之外的内容的其他相关信息;  Extracting first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and other related information of the content other than the content metadata;
对所述内容的第一信息中的所述内容的各个元数据分别与预设的数据量 大小阈值进行比较,将所述内容索引以及不高于所述阈值的所述内容的元数 据存储到所述内容索引数据库中 ,将高于所述阈值的所述内容的元数据以及 所述内容的第一信息中除所述内容元数据之外的所述内容的其他相关信息以 及所述内容索引存储到内容信息数据库中。  Comparing each metadata of the content in the first information of the content with a preset data amount size threshold, and storing the content index and metadata of the content not higher than the threshold to In the content index database, metadata of the content higher than the threshold and other related information of the content other than the content metadata in the first information of the content and the content index Stored in the content information database.
可选的 ,该方法还包括:可接收对数据量大小阈值的设定。  Optionally, the method further includes: receiving a setting of a data size threshold.
可选的 ,在将高于所述阈值的所述内容的元数据以及所述内容的第一信 息中除所述内容元数据之外的所述内容的其他相关信息以及所述内容索引存 储到内容信息数据库中之前,所述方法还包括:对除所述内容元数据之外的 所述内容的其他相关信息进行合法性验证得到验证为合法的所述内容的第二 信息;所述将高于所述阈值的所述内容的元数据以及所述内容的第一信息中 除所述内容元数据之外的所述内容的其他相关信息以及所述内容索引存储到 内容信息数据库中 ,具体包括:将高于所述阈值的所述内容的元数据、 所述 内容的第二信息以及所述内容索引存储到所述内容信息数据库中。 Optionally, in the metadata of the content higher than the threshold and the first information of the content, other related information of the content except the content metadata and the content index Before being stored in the content information database, the method further includes: performing legality verification on other related information of the content other than the content metadata to obtain second information of the content verified to be legal; Metadata of the content higher than the threshold and other related information of the content other than the content metadata in the first information of the content and the content index are stored in a content information database, Specifically, the metadata of the content, the second information of the content, and the content index higher than the threshold are stored in the content information database.
可选的 ,对除所述内容元数据之外的所述内容的其他相关信息或所述内 容的第二信息进行可检索性判断,并将通过可检索性判断的信息标识为所述 内容的第三信息。  Optionally, performing retrievability judgment on other related information of the content other than the content metadata or second information of the content, and identifying information determined by the retrievability as the content Third information.
可选的 ,将高于所述阈值的所述内容的元数据、 所述内容的第三信息以 及所述内容索引根据预设的配置模板导入至全文检索库中。  Optionally, the metadata of the content, the third information of the content, and the content index that are higher than the threshold are imported into the full-text search library according to a preset configuration template.
可选的 , 当接收到某内容被临时删除的通知时,删除所述全文检索库中 所述内容的数据;以及当接收到所述某内容被恢复的通知时,将高于所述阈 值的所述内容的元数据、 所述内容的第三信息以及所述内容索引根据预设的 配置模板重新导入至所述全文检索库中。  Optionally, when receiving the notification that the content is temporarily deleted, deleting the data of the content in the full-text search library; and when receiving the notification that the content is restored, the threshold is higher than the threshold The metadata of the content, the third information of the content, and the content index are re-imported into the full-text search library according to a preset configuration template.
可选的 , 当接收到某内容被临时删除的通知时,将所述全文检索库中所 述内容的" 内容可用" 字段设置为" 不可用" ;以及当接收到所述某内容被恢 复的通知时,将所述全文检索库中所述内容的" 内容可用" 字段重新设置为 " 可用"。 可选的,接收内容信息查询请求,对所述查询请求进行解析,获取待查询 内容; Optionally, when receiving the notification that the content is temporarily deleted, setting the "content available" field of the content in the full-text search library to "not available"; and when receiving the content is restored When notified, the "content available" field of the content in the full-text search library is reset to "available". Optionally, the content information query request is received, and the query request is parsed to obtain the content to be queried;
将所述待查询内容在所述内容索引数据库中的进行检索, 当检索到所述 待查询内容的信息时,则将所述待查询内容的信息发送给所述内容信息查询 请求的发出者; 当未检索到结果时,则将所述待查询内容在所述全文检索库 中进行检索,若检索到所述待查询内容的信息时,则将所述待查询内容的信 息发送给所述内容信息查询请求的发出者,若检索到所述待查询内容的内容 标识时,则利用所述待查询内容的内容标识在所述内容信息数据库中查询得 到所述待查询内容的信息,并将所述待查询内容的信息发送给所述内容信息 查询请求的发出者。  Searching the content to be queried in the content index database, and when the information of the content to be queried is retrieved, sending the information of the content to be queried to the sender of the content information query request; When the result is not retrieved, the content to be queried is searched in the full-text search database, and if the information of the content to be queried is retrieved, information about the content to be queried is sent to the content The sender of the information query request, if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used to query the content information database to obtain the information of the content to be queried, and The information describing the inquiry content is sent to the issuer of the content information inquiry request.
基于上述技术方案,本发明实施例所提供的内容信息的处理方法和系统 , 通过将不高于设定阈值的内容的元数据、 高于设定阈值的内容的元数据以及 除内容元数据之外的其他信息分开进行管理,即将不高于设定阈值的所述内 容元数据存储到所述内容索引数据库中 ,而将高于设定阈值的内容的元数据 以及除所述内容元数据之外的所述内容的其他相关信息存储到内容信息数据 库中 ,这样就减小了经常用来检索的内容索引数据库的存储压力 ,使得内容 索引数据库能够存储更多内容,而内容信息数据库能够存储尽可能多的内容 的其他相关信息,这样就解决了海量内容的存储和管理的难题,有效地提升 内容管理系统对含有大数据量内容信息的管理能力。 附图说明 Based on the foregoing technical solution, the content information processing method and system provided by the embodiment of the present invention, by using metadata of content not higher than a set threshold, metadata of content higher than a set threshold, and content metadata except The other information is separately managed, that is, the content metadata not higher than the set threshold is stored in the content index database, and the metadata of the content higher than the set threshold and the content metadata are excluded. The other relevant information of the content is stored in the content information database, which reduces the storage pressure of the content index database that is often used for retrieval, so that the content index database can store more content, and the content information database can be stored. Other related information of potentially more content, thus solving the problem of storage and management of mass content, and effectively improving the management ability of the content management system for content information containing large amounts of data. DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中 所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本 发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的 前提下,还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the present invention, Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.
图 1是本发明实施例内容信息处理系统的实施例 1的组网结构示意图。 图 2是本发明实施例内容信息处理系统的实施例 2的组网结构示意图。 图 3是本发明实施例提供的内容信息处理系统的实施例 3的组网结构示 意图。  FIG. 1 is a schematic diagram of a networking structure of Embodiment 1 of a content information processing system according to an embodiment of the present invention. FIG. 2 is a schematic diagram of a networking structure of Embodiment 2 of a content information processing system according to an embodiment of the present invention. FIG. 3 is a schematic diagram showing the networking structure of Embodiment 3 of the content information processing system according to the embodiment of the present invention.
图 4是本发明实施例提供的内容信息处理系统的实施例 4的组网结构示 意图。  FIG. 4 is a schematic diagram showing the networking structure of Embodiment 4 of the content information processing system according to the embodiment of the present invention.
图 5是本发明实施例提供的内容信息处理系统的实施例 5的组网结构示 意图。  FIG. 5 is a schematic diagram showing the networking structure of Embodiment 5 of the content information processing system according to the embodiment of the present invention.
图 6是本发明实施例提供的内容信息处理系统的实施例 6的组网结构示 意图。  FIG. 6 is a schematic diagram showing the networking structure of Embodiment 6 of the content information processing system according to the embodiment of the present invention.
图 7是本发明实施例提供的内容信息处理方法的流程示意图。  FIG. 7 is a schematic flowchart diagram of a content information processing method according to an embodiment of the present invention.
具体实施方式 下面将结合本发明实施例中的附图 ,对本发明实施例中的技术方案进行 清楚、 完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不 是全部实施例。 基于本发明中的实施例,本领域普通技术人员在没有做出创 造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。 detailed description The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.
一般的 ,程序模块包括执行特定任务或实现特定抽象数据类型的例程、 程序、 组件、 数据结构、 以及其他类型的结构。 此外,本领域的技术人员可 以明白 ,各实施例可以用其他计算机系统配置来实施,包括手持式设备、 多 处理器系统、 基于微处理器或可编程消费电子产品、 小型计算机、 大型计算 机以及类似计算设备。 各实施例还能在任务由通过通信网络链接的远程处理 设备来执行的分布式计算环境中实现。 在分布式计算环境中 ,程序模块可位 于本地和远程存储器存储设备中。  In general, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that embodiments can be implemented in other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. Computing device. Embodiments can also be implemented in a distributed computing environment where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote memory storage devices.
各实施例可被实现为计算机实现的过程、 计算系统、 或者诸如计算机程 序产品或计算机系统执行示例过程的指令的计算机程序的计算机存储介质。 例如:计算机可读存储介质可经由易失性计算机存储器、 非易失性存储器、 硬盘驱动器、 闪存驱动器、 软盘或紧致盘和类似介质中的一个或多个来实现。  Embodiments may be implemented as a computer implemented process, a computing system, or a computer storage medium such as a computer program product or a computer program of a computer system for executing the instructions of the example process. For example, a computer readable storage medium can be implemented via one or more of volatile computer memory, nonvolatile memory, a hard drive, a flash drive, a floppy disk or a compact disk and the like.
贯穿本说明书,术语" 服务器" 一般指通常在联网环境中执行一个或多 个软件程序的计算设备。 然而,服务器还可以被实现为在被视作网络上的服 务器的一个或多个计算设备上执行的虚拟服务器(软件程序)。  Throughout this specification, the term "server" generally refers to a computing device that typically executes one or more software programs in a networked environment. However, the server can also be implemented as a virtual server (software program) executing on one or more computing devices that are considered servers on the network.
图 1 示出了本发明所实现的一种内容信息处理系统的实施例,该系统包 括: 1 shows an embodiment of a content information processing system implemented by the present invention, the system package Includes:
内容索引创建模块 1 10 ,用于捕获内容并在内容索引数据库中创建针对 所述内容的内容索引 ,所述内容索引是所述内容在所述内容信息处理系统的 唯一标识;  a content index creation module 1 10 for capturing content and creating a content index for the content in a content index database, the content index being a unique identifier of the content in the content information processing system;
内容信息提取模块 130 ,用于提取所述内容索引对应的内容的第一信息, 其中 ,所述内容的第一信息包括:所述内容的元数据,以及除所述内容元数 据之外的所述内容的其他相关信息;  The content information extraction module 130 is configured to extract first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and a location other than the content metadata Other relevant information about the content;
内容信息存储处理模块 150 ,用于对所述内容的第一信息中的所述内容 的各个元数据分别与预设的数据量大小的阈值进行比较,将所述内容索引以 及不高于所述阈值的所述内容的元数据存储到所述内容索引数据库中 ,将高 于所述阈值的所述内容的元数据以及所述内容的第一信息中除所述内容元数 据之外的所述内容的其他相关信息以及所述内容索引存储到内容信息数据库 中。  The content information storage processing module 150 is configured to compare each metadata of the content in the first information of the content with a threshold of a preset data amount size, and index the content and not higher than Metadata of the content of the threshold is stored in the content index database, and the metadata of the content higher than the threshold and the first information of the content are other than the content metadata Other relevant information of the content and the content index are stored in the content information database.
上述内容信息的处理系统,考虑到内容的元数据中的数据量的大小,将 数据量大小不高于某一阈值的内容的元数据存储到内容索引数据库中 ,以及 将数据量大小高于某一阈值的内容的元数据以及除内容的元数据之外的其他 相关信息存储到内容信息数据库中 ,通过上述的分开存储,就保证了内容索 引数据库能够存储海量内容的内容索引 ,并且提升了该内容索引数据库的数 据读取的能力 ;而内容信息数据库可以用来存储内容的大数据量信息,提升 了内容信息的数据库访问能力。 The processing system of the content information described above stores the metadata of the content whose data size is not higher than a certain threshold value into the content index database in consideration of the size of the data amount in the metadata of the content, and the data size is higher than a certain amount. The metadata of a threshold content and other related information other than the metadata of the content are stored in the content information database, and by the above separate storage, it is ensured that the content index database can store the content index of the mass content, and the content is improved. The ability of the content index database to read data; and the content information database can be used to store large amounts of information of content, Database access capabilities for content information.
从实现的角度,该处理系统可以部署在一台服务器或多台服务器所构成 的服务器云中。  From an implementation perspective, the processing system can be deployed in a server cloud consisting of one server or multiple servers.
上述用来对内容的元数据进行数据量大小进行比较的阈值,可以内置在 系统中 ,也可以通过用户通过 Ul ( User Interface )接口进行提前设定,对此, 本发明的各个实施例均不加限定。  The foregoing threshold for comparing the data size of the metadata of the content may be built in the system, or may be set in advance by the user through the Ul (User Interface) interface, and the embodiments of the present invention are not Add limit.
基于上述的描述,可选的 ,本发明实施例所介绍的内容信息的处理系统 , 参看图 2 ,还可以包括:阈值设定模块 120 ,用于设定对内容的元数据进行比 较的数据量大小的阈值。 通过增设阈值设定模块 120 ,能够提供用户可自定 义的阈值,提高系统的灵活性。  Based on the above description, optionally, the content information processing system introduced by the embodiment of the present invention, referring to FIG. 2, may further include: a threshold setting module 120, configured to set a data amount for comparing metadata of the content. The threshold of size. By adding a threshold setting module 120, a user-definable threshold can be provided to increase the flexibility of the system.
参看图 2 , 内容索引数据库 170是用于存储上述内容索引以及内容的元 数据中数据量较小的数据信息的数据库,具体实现中 ,可以为关系类数据库 ( Relational Database )以及其他基于集合代数等概念和方法来处理数据的 数据库,包括但不限于: Oracle、 SQL ( Structured Query Language ,结构 化查询语言 )、 Access v Db2、 SQLServer , Sybase等。  Referring to FIG. 2, the content index database 170 is a database for storing the data index of the content index and the metadata of the content, and may be a relational database (Relational Database) and other collection-based algebras. Concepts and methods to process data databases, including but not limited to: Oracle, SQL (Structured Query Language), Access v Db2, SQLServer, Sybase, etc.
内容信息数据库 190是用于存储数据量大小高于设定阈值的内容的元数 据,以及除内容的元数据之外的内容的其他相关信息(包括但不限于:用户 输入的标签、 分类信息、 对内容的评价等级、 对内容的评论信息、 打分等)。 在具体的实现过程中 , 可以采用具有海量大数据的存储以及读取能力的非关 系型数据库,作为举例,这类数据库包括但不限于: Apache Hbase数据库等。 为了确保内容信息数据库所包含的数据的合法性,譬如:确保数据符合 内容信息数据库的要求,以及避免把包含不良信息的数据存储进入内容信息 数据库,可选的 ,参看图 2 ,该系统还可增设内容合法验证模块 140 ,用于对 除内容的元数据之外的内容的其他相关信息进行合法性验证得到验证为合法 的内容的第二信息,并将上述内容的第二信息发送给内容信息存储模块 150; 内容信息存储处理模块 150 ,还用于将高于阈值的内容的元数据、 内容 的第二信息以及内容索引存储到内容信息数据库 190中。 The content information database 190 is metadata for storing content whose data amount is larger than a set threshold, and other related information of content other than metadata of the content (including but not limited to: user-entered tags, classification information, Rating level of content, comment information on content, rating, etc.). In the specific implementation process, you can use the storage with massive data and the non-off of the reading ability. A type database, by way of example, includes but is not limited to: Apache Hbase database. In order to ensure the legality of the data contained in the content information database, for example, to ensure that the data meets the requirements of the content information database, and to avoid storing data containing bad information into the content information database, optionally, see Figure 2, the system can also The content legal verification module 140 is configured to perform legality verification on other related information of the content other than the metadata of the content to obtain the second information of the verified content, and send the second information of the content to the content information. The storage module 150; the content information storage processing module 150 is further configured to store the metadata of the content above the threshold, the second information of the content, and the content index into the content information database 190.
为了保证除内容的元数据之外的内容的其他相关信息的可检索性,参看 图 3和图 4 ,可选的,该系统增设可检索性判断模块 180 ,用于对除内容的元 数据之外的内容的其他相关信息或内容的第二信息进行可检索性判断,并将 通过可检索性判断的信息标识为内容的第三信息。  In order to ensure the retrievability of other related information of the content other than the metadata of the content, referring to FIG. 3 and FIG. 4, optionally, the system adds a retrievability judgment module 180 for the metadata of the content. The other information of the external content or the second information of the content is subjected to a searchability judgment, and the information judged by the searchability is identified as the third information of the content.
在该内容信息处理系统,参看图 5 ,增设全文检索库 160 ,用于提供全文 检索的功能。  In the content information processing system, referring to Fig. 5, a full-text search library 160 is added to provide a full-text search function.
可选的 ,该系统还可增设全文检索库信息导入模块 161 ,用于将高于阈 值的内容的元数据、 内容的第三信息以及内容索引根据预设的配置模板导入 至全文检索库 160中。  Optionally, the system may further include a full-text search library information importing module 161, configured to import the metadata of the content higher than the threshold, the third information of the content, and the content index into the full-text search library 160 according to the preset configuration template. .
可选的 ,所述内容信息处理系统还包括:全文检索库信息处理模块 162 , 用于当接收到某内容被临时删除的通知时,删除所述全文检索库 160中所述 内容的数据;以及用于当接收到所述某内容被恢复的通知时,通知所述内容 信息检索模块将将高于所述阈值的所述内容的元数据、 所述内容的第三信息 以及所述内容索引根据预设的配置模板重新导入至所述全文检索库 160中。 全文检索库信息处理模块 162 ,还用于当接收到某内容被临时删除的通知时, 将所述全文检索库中所述内容的" 内容可用" 字段设置为" 不可用" ;以及用 于当接收到所述某内容被恢复的通知时,将所述全文检索库 160中所述内容 的" 内容可用" 字段重新设置为" 可用"。采用全文检索库信息处理模块 162 , 能够使得当内容被临时删除时,通过全文检索库就无法检索到相应的内容, 而当内容被恢复时,保证该内容的相关信息又能够在全文检索库查询到。 较 好地提升了全文数据库的数据增删功能。 Optionally, the content information processing system further includes: a full-text search library information processing module 162, configured to delete the full-text search library 160 when receiving a notification that the content is temporarily deleted Data of the content; and for notifying the content information retrieval module of metadata of the content higher than the threshold, third information of the content, and when receiving the notification that the content is restored The content index is re-imported into the full-text search library 160 according to a preset configuration template. The full-text search library information processing module 162 is further configured to: when receiving the notification that the content is temporarily deleted, set the "content available" field of the content in the full-text search library to "not available"; Upon receiving the notification that the content is restored, the "content available" field of the content in the full-text search library 160 is reset to "available". The full-text search library information processing module 162 can enable the corresponding content to be retrieved through the full-text search library when the content is temporarily deleted, and when the content is restored, the related information of the content can be inquired in the full-text search library. To. The data addition and deletion functions of the full-text database are better improved.
为了满足用户的数据库查询请求,该内容信息处理系统还可增设如下模 块:查询内容获取模块 210 ,用于接收内容信息查询请求,对查询请求进行 解析,获取待查询内容;  In order to satisfy the database query request of the user, the content information processing system may further add the following module: the query content obtaining module 210 is configured to receive the content information query request, parse the query request, and obtain the content to be queried;
内容信息查询模块 230 ,用于将所述待查询内容在所述内容索引数据库 170 中进行检索, 当检索到所述待查询内容的信息时,则将所述待查询内容 的信息反馈给查询结果发送模块 250; 当未检索到结果时,则将所述待查询 内容在所述全文检索库 160中进行检索,若检索到所述待查询内容的信息时, 则将所述待查询内容的信息反馈给所述查询结果发送模块 250 ,若检索到所 述待查询内容的内容标识时,则利用所述待查询内容的内容标识在所述内容 信息数据库 190中查询得到所述待查询内容的信息,并将所述待查询内容的 信息反馈给所述查询结果发送模块 250; The content information querying module 230 is configured to retrieve the content to be queried in the content index database 170. When the information of the content to be queried is retrieved, the information of the content to be queried is fed back to the query result. The sending module 250: when the result is not retrieved, the content to be queried is searched in the full-text search database 160, and if the information of the content to be queried is retrieved, the information of the content to be queried is Feedback to the query result sending module 250, if the content identifier of the content to be queried is retrieved, the content of the content to be queried is used to identify the content The information in the information database 190 is queried to obtain the information of the content to be queried, and the information of the content to be queried is fed back to the query result sending module 250;
查询结果发送模块 250 ,用于将待查询内容的信息发送给内容信息查询 请求的发出者。  The query result sending module 250 is configured to send information of the content to be queried to the issuer of the content information query request.
通过上述增设的查询内容获取模块 210、 内容信息查询模块 230、以及查 询结果发送模块 250 ,使得该系统支持用户的数据库查询操作,在查询的过 程中 , 由于本发明的实施例中所增设的内容索引数据库 170存储有不高于预 设阈值的内容的元数据信息, 内容信息数据库 190存储有高于预设阈值的内 容的元数据以及除内容的元数据之外的内容相关的其他信息,使得对查询信 息的检索先在内容索引数据库中进行,当内容索引数据库 170中检索不到时, 才去内容信息数据库 190中检索,这样一方面能够保证内容索引数据库能够 存储更多的内容的索引 ,满足对大数据的存储要求,另一方面,又能够提高 内容索引数据库的数据读取性能。  Through the above-mentioned additional query content obtaining module 210, content information query module 230, and query result sending module 250, the system supports the user's database query operation, and in the process of query, the content added in the embodiment of the present invention The index database 170 stores metadata information of content not higher than a preset threshold, and the content information database 190 stores metadata of content higher than a preset threshold and other information related to content other than metadata of the content, such that The retrieval of the query information is first performed in the content index database. When the content index database 170 is not retrieved, the content information database 190 is retrieved, so that the content index database can store more indexes of the content. Satisfy the storage requirements for big data, on the other hand, it can improve the data reading performance of the content index database.
本发明实施例还提供一种内容信息的处理方法,参看图 6 ,其中 ,该方法 包括:  An embodiment of the present invention further provides a method for processing content information. Referring to FIG. 6, the method includes:
310:捕获内容并在内容索引数据库中创建针对所述内容的内容索引 ,所 述内容索引是所述内容在所述内容信息处理系统的唯一标识;  310: Capture content and create a content index for the content in a content index database, the content index being a unique identifier of the content in the content information processing system;
330:提取所述内容索引对应的内容的第一信息,其中 ,所述内容的第一 信息包括:所述内容的元数据,以及除所述内容元数据之外的所述内容的其 他相关信息; 330: Extract first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and the content of the content other than the content metadata His relevant information;
350:对所述内容的第一信息中的所述内容的各个元数据分别与预设的数 据量大小阈值进行比较,将所述内容索引以及不高于所述阈值的所述内容的 元数据存储到所述内容索引数据库中 ,将高于所述阈值的所述内容的元数据 以及所述内容的第一信息中除所述内容元数据之外的所述内容的其他相关信 息以及所述内容索引存储到内容信息数据库中。  350: Comparing respective metadata of the content in the first information of the content with a preset data amount size threshold, and indexing the content and metadata of the content not higher than the threshold Storing into the content index database, metadata of the content higher than the threshold, and other related information of the content other than the content metadata in the first information of the content and the The content index is stored in the content information database.
可以理解,步骤 310和步骤 330没有先后的顺序关系,在具体的实现过 程中 ,两者是可以更换执行顺序的。  It can be understood that there is no sequential relationship between step 310 and step 330. In the specific implementation process, the two can be replaced.
上述内容信息的处理方法,考虑到内容的元数据中的数据量的大小,将 数据量大小不高于某一阈值的内容的元数据存储到内容索引数据库中 ,以及 将数据量大小高于某一阈值的内容的元数据以及除内容的元数据之外的其他 相关信息存储到内容信息数据库中 ,通过上述的分开存储,就保证了内容索 引数据库能够存储海量内容的内容索引 ,并且提升了该内容索引数据库的数 据读取的能力 ;而内容信息数据库可以用来存储内容的大数据量信息,提升 了内容信息的数据库访问能力。  The processing method of the content information, in consideration of the size of the data amount in the metadata of the content, storing the metadata of the content whose data size is not higher than a certain threshold value into the content index database, and the data amount is higher than a certain amount The metadata of a threshold content and other related information other than the metadata of the content are stored in the content information database, and by the above separate storage, it is ensured that the content index database can store the content index of the mass content, and the content is improved. The content index database has the ability to read data; and the content information database can be used to store large data volume information of the content, which improves the database access capability of the content information.
为了提升系统的可用性,可选的 ,对于数据量大小阈值,可由用户预先 设定。  In order to improve the availability of the system, optional, for the data size threshold, can be preset by the user.
考虑到除内容的元数据之外的内容的其他相关信息,可能为用户对内容 的评价信息,分类信息,打分信息等,为了保证这些信息的合法性,可选的, 上述方法可增加合法性验证过程,即: Considering other related information of the content other than the metadata of the content, the user may evaluate the content, classify the information, score the information, etc., in order to ensure the legality of the information, optionally, The above method can increase the legality verification process, namely:
在将高于上述阈值的内容的元数据以及内容的第一信息中除内容的元数 据之外的内容的其他相关信息以及内容索引存储到内容信息数据库中之前, 该方法还包括:  Before storing the metadata of the content higher than the foregoing threshold and the first information of the content in the first information except the metadata of the content and the content index into the content information database, the method further includes:
对除内容的元数据之外的内容的其他相关信息进行合法性验证得到验证 为合法的内容的第二信息;  Legally verifying other relevant information of content other than the metadata of the content to obtain second information that is verified as legitimate content;
将高于上述阈值的内容的元数据以及内容的第一信息中除内容的元数据 之外的内容的其他相关信息以及内容索引存储到内容信息数据库中 ,具体包 括:将高于所述阈值的内容的元数据、 内容的第二信息以及内容索引存储到 所述内容信息数据库中。  And storing the metadata of the content that is higher than the foregoing threshold and the other related information of the content of the first information of the content and the content index of the content, and the content index into the content information database, specifically, including: The metadata of the content, the second information of the content, and the content index are stored in the content information database.
考虑到需要将除内容的元数据之外的内容的其他相关信息导入到全文检 索库,为了保证上述信息的可检索性,可选的 ,上述方法中增加可检索性判 断环节,即:对除内容的元数据之外的内容的其他相关信息或内容的第二信 息进行可检索性判断,并将通过可检索性判断的信息标识为内容的第三信息。  In order to import other related information of the content other than the metadata of the content into the full-text search library, in order to ensure the retrievability of the above information, optionally, the above-mentioned method adds a searchability judgment link, that is, Other related information of the content other than the metadata of the content or the second information of the content is subjected to a searchability judgment, and the information judged by the searchability is identified as the third information of the content.
可选的 ,将高于阈值的所述内容的元数据、 内容的第三信息以及内容索 引根据预设的配置模板导入至全文检索库中。  Optionally, the metadata of the content, the third information of the content, and the content index higher than the threshold are imported into the full-text search library according to the preset configuration template.
在数据库使用的过程中 ,可能会涉及到数据的删除和恢复,可选的 ,上 述方法还可以包括:  In the process of using the database, data deletion and recovery may be involved. Optionally, the above method may further include:
当接收到某内容被临时删除的通知时,删除所述全文检索库中所述内容 的数据;以及用于当接收到所述某内容被恢复的通知时,通知所述内容信息 检索模块将高于所述阈值的所述内容的元数据、 所述内容的第三信息以及所 述内容索引根据预设的配置模板重新导入至所述全文检索库中。 Deleting the content in the full-text search library when receiving a notification that a content is temporarily deleted And data for notifying the content information retrieval module that the content of the content is higher than the threshold, third information of the content, and the notification when the notification that the content is restored is received The content index is re-imported into the full-text search library according to a preset configuration template.
另一种实现数据删除和恢复的方法是,针对全文检索库中的内容,设置 " 内容可用" 字段, 当接收到某内容被临时删除的通知时,将全文检索库中 该内容的" 内容可用" 字段设置为" 不可用" ;以及用于当接收到某内容被恢 复的通知时,将全文检索库中该内容的" 内容可用" 字段重新设置为" 可用"。  Another method for implementing data deletion and recovery is to set a "content available" field for the content in the full-text search library, and when the notification that the content is temporarily deleted is received, the content of the content in the full-text search library is available. "The field is set to "Not available"; and is used to reset the "Content Available" field of the content in the full-text search library to "Available" when a notification is received that the content is restored.
可选的,该方法还包括:接收内容信息查询请求,对所述查询请求进行解 析,获取待查询内容;  Optionally, the method further includes: receiving a content information query request, parsing the query request, and obtaining the to-be-queried content;
将所述待查询内容在所述内容索引数据库中的进行检索, 当检索到所述 待查询内容的信息时,则将所述待查询内容的信息发送给所述内容信息查询 请求的发出者; 当未检索到结果时,则将所述待查询内容在所述全文检索库 中进行检索,若检索到所述待查询内容的信息时,则将所述待查询内容的信 息发送给所述内容信息查询请求的发出者,若检索到所述待查询内容的内容 标识时,则利用所述待查询内容的内容标识在所述内容信息数据库中查询得 到所述待查询内容的信息,并将所述待查询内容的信息发送给所述内容信息 查询请求的发出者。  Searching the content to be queried in the content index database, and when the information of the content to be queried is retrieved, sending the information of the content to be queried to the sender of the content information query request; When the result is not retrieved, the content to be queried is searched in the full-text search database, and if the information of the content to be queried is retrieved, information about the content to be queried is sent to the content The sender of the information query request, if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used to query the content information database to obtain the information of the content to be queried, and The information describing the inquiry content is sent to the issuer of the content information inquiry request.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各 示例的单元及算法步骤,能够以电子硬件、 计算机软件或者二者的结合来实 现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一 般性地描述了各示例的组成及步骤。 这些功能究竟以硬件还是软件方式来执 行,取决于技术方案的特定应用和设计约束条件。 专业技术人员可以对每个 特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超 出本发明的范围。 Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. Now, in order to clearly illustrate the interchangeability of hardware and software, the components and steps of the examples have been generally described in terms of functions in the above description. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述 描述的系统、 装置和单元的具体工作过程,可以参考前述方法实施例中的对 应过程,在此不再赘述。  A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中 ,应该理解到,所掲露的系统、 装置和 方法,可以通过其它的方式实现。 例如,以上所描述的装置实施例仅仅是示 意性的 ,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可 以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个 系统,或一些特征可以忽略,或不执行。 另外,所显示或讨论的相互之间的 耦合或直接耦合或通信连接可以是通过一些接口、 装置或单元的间接耦合或 通信连接,也可以是电的,机械的或其它的形式连接。  In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的 ,作 为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方, 或者也可以分布到多个网络单元上。 可以根据实际的需要选择其中的部分或 者全部单元来实现本发明实施例方案的目的。 另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中 , 也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个 单元中。 上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能 单元的形式实现。 The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention. In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售 或使用时,可以存储在一个计算机可读取存储介质中。 基于这样的理解,本 发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案 的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一 个存储介质中 ,包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步 骤。而前述的存储介质包括: U盘、移动硬盘、只读存储器( ROM , Read-Only Memory )、 随机存取存储器 ( RAM , Random Access Memory )、 磁碟或 者光盘等各种可以存储程序代码的介质。  The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention contributes in essence or to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限 于此,任何熟悉本技术领域的技术人员在本发明掲露的技术范围内 ,可轻易 想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围 之内。 因此,本发明的保护范围应以权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any equivalent person can easily think of various equivalents within the technical scope of the present invention. Modifications or substitutions are intended to be included within the scope of the invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

权 利 要 求 书 claims
1、 一种内容信息处理系统,其特征在于,所述系统包括: 1. A content information processing system, characterized in that the system includes:
内容索引创建模块,用于捕获内容并在内容索引数据库中创建针对所述内 容的内容索引 ,所述内容索引是所述内容在所述内容信息处理系统的唯一标识; 内容信息提取模块,用于提取所述内容索引对应的内容的第一信息,其中 , 所述内容的第一信息包括:所述内容的元数据,以及除所述内容元数据之外的 所述内容的其他相关信息; A content index creation module, used to capture content and create a content index for the content in a content index database, where the content index is the unique identifier of the content in the content information processing system; a content information extraction module, used to Extract first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and other related information of the content except the content metadata;
内容信息存储处理模块,用于对所述内容的第一信息中的所述内容的各个 元数据分别与预设的数据量大小的阈值进行比较,将所述内容索引以及不高于 所述阈值的所述内容的元数据存储到所述内容索引数据库中 ,将高于所述阈值 的所述内容的元数据以及所述内容的第一信息中除所述内容元数据之外的所述 内容的其他相关信息以及所述内容索引存储到内容信息数据库中。 A content information storage and processing module, configured to compare each metadata of the content in the first information of the content with a preset threshold of data size, and index the content to a value not higher than the threshold. The metadata of the content is stored in the content index database, and the metadata of the content that is higher than the threshold and the content other than the content metadata in the first information of the content are stored in the content index database. Other relevant information and the content index are stored in the content information database.
2、 根据权利要求 1所述的内容信息处理系统,其特征在于,所述系统还包 括: 2. The content information processing system according to claim 1, characterized in that the system further includes:
阈值设定模块,用于设定对所述内容的元数据进行比较的数据量大小的阈 值。 A threshold setting module is used to set a threshold for comparing the amount of data for comparing the metadata of the content.
3、 根据权利要求 1或 2所述的内容信息处理系统,其特征在于,所述系统 还包括: 3. The content information processing system according to claim 1 or 2, characterized in that the system further includes:
内容合法验证模块,用于对除所述内容元数据之外的所述内容的其他相关 信息进行合法性验证得到验证为合法的所述内容的第二信息,并将所述内容的 第二信息发送给所述内容信息存储模块; Content legality verification module, used to verify other relevant information about the content in addition to the content metadata. Perform legality verification on the information to obtain the second information of the content that is verified to be legal, and send the second information of the content to the content information storage module;
所述内容信息存储处理模块,具体还用于将高于所述阈值的所述内容的元 数据、 所述内容的第二信息以及所述内容索引存储到所述内容信息数据库中。 The content information storage and processing module is specifically configured to store the metadata of the content higher than the threshold, the second information of the content, and the content index into the content information database.
4、 根据权利要求 1 -3任一所述的内容信息处理系统,其特征在于,所述系 统还包括: 4. The content information processing system according to any one of claims 1 to 3, characterized in that the system further includes:
可检索性判断模块,用于对除所述内容的元数据之外的所述内容的其他相 关信息或所述内容的第二信息进行可检索性判断,并将通过可检索性判断的信 息标识为所述内容的第三信息。 A searchability judgment module, configured to judge the searchability of other relevant information of the content other than the metadata of the content or the second information of the content, and identify the information that passes the searchability judgment. is the third information of the content.
5、 根据权利要求 4所述的内容信息处理系统,其特征在于,所述内容信息 处理系统还包括: 5. The content information processing system according to claim 4, wherein the content information processing system further includes:
全文检索库信息导入模块,用于将高于所述阈值的所述内容的元数据、所述 内容的第三信息以及所述内容索引根据预设的配置模板导入至全文检索库中。 The full-text retrieval library information import module is used to import the metadata of the content higher than the threshold, the third information of the content, and the content index into the full-text retrieval library according to a preset configuration template.
6、 根据权利要求 5所述的内容信息处理系统,其特征在于,所述内容信息 处理系统还包括: 6. The content information processing system according to claim 5, characterized in that the content information processing system further includes:
全文检索库信息处理模块,用于当接收到某内容被临时删除的通知时,删除 所述全文检索库中所述内容的数据;以及用于当接收到所述某内容被恢复的通 知时,通知所述内容信息检索模块将将高于所述阈值的所述内容的元数据、 所 述内容的第三信息以及所述内容索引根据预设的配置模板重新导入至所述全文 检索库中。 A full-text retrieval library information processing module, configured to delete the data of the content in the full-text retrieval library when receiving a notification that a certain content is temporarily deleted; and to delete the data of the content in the full-text retrieval library when receiving a notification that the certain content is restored. Notify the content information retrieval module to re-import the metadata of the content, the third information of the content and the content index that are higher than the threshold into the full text according to a preset configuration template Search the library.
7、 根据权利要求 5所述的内容信息处理系统,其特征在于,所述内容信息 处理系统还包括: 7. The content information processing system according to claim 5, characterized in that the content information processing system further includes:
全文检索库信息处理模块,还用于当接收到某内容被临时删除的通知时, 将所述全文检索库中所述内容的" 内容可用" 字段设置为" 不可用" ;以及用于 当接收到所述某内容被恢复的通知时,将所述全文检索库中所述内容的" 内容 可用" 字段重新设置为" 可用"。 The full-text retrieval library information processing module is also used to set the "content available" field of the content in the full-text retrieval library to "unavailable" when receiving a notification that a certain content has been temporarily deleted; When the notification that the certain content is restored is received, the "content available" field of the content in the full-text search library is reset to "available".
8、 根据权利要求 1 -7任一所述的内容信息处理系统,其特征在于,所述内 容信息处理系统还包括: 8. The content information processing system according to any one of claims 1 to 7, characterized in that the content information processing system further includes:
查询内容获取模块,用于接收内容信息查询请求,对所述查询请求进行解析, 获取待查询内容; The query content acquisition module is used to receive a content information query request, parse the query request, and obtain the content to be queried;
内容信息查询模块,用于将所述待查询内容在所述内容索引数据库中进行检 索, 当检索到所述待查询内容的信息时,则将所述待查询内容的信息反馈给查 询结果发送模块; 当未检索到结果时,则将所述待查询内容在所述全文检索库 中进行检索,若检索到所述待查询内容的信息时,则将所述待查询内容的信息 反馈给所述查询结果发送模块,若检索到所述待查询内容的内容标识时,则利 用所述待查询内容的内容标识在所述内容信息数据库中查询得到所述待查询内 容的信息,并将所述待查询内容的信息反馈给所述查询结果发送模块; The content information query module is used to retrieve the content to be queried in the content index database. When the information of the content to be queried is retrieved, the information of the content to be queried is fed back to the query result sending module. ; When no result is retrieved, the content to be queried is retrieved in the full-text retrieval library. If the information of the content to be queried is retrieved, the information of the content to be queried is fed back to the The query result sending module, if the content identifier of the content to be queried is retrieved, uses the content identifier of the content to be queried to query the content information database to obtain the information of the content to be queried, and sends the content identifier to the content to be queried. The information of the query content is fed back to the query result sending module;
所述查询结果发送模块,用于将所述待查询内容的信息发送给所述内容信 息查询请求的发出者。 The query result sending module is used to send information about the content to be queried to the content information. The sender of the information query request.
9、 一种内容信息的处理方法,其特征在于,包括: 9. A content information processing method, characterized by including:
捕获内容并在内容索引数据库中创建针对所述内容的内容索引 ,所述内容 索引是所述内容在所述内容信息处理系统的唯一标识; Capture content and create a content index for the content in a content index database, where the content index is the unique identifier of the content in the content information processing system;
提取所述内容索引对应的内容的第一信息,其中 ,所述内容的第一信息包 括:所述内容的元数据,以及除所述内容元数据之外的所述内容的其他相关信 碧、 , Extract the first information of the content corresponding to the content index, where the first information of the content includes: metadata of the content, and other relevant information of the content except the content metadata, ,
对所述内容的第一信息中的所述内容的各个元数据分别与预设的数据量大 小阈值进行比较,将所述内容索引以及不高于所述阈值的所述内容的元数据存 储到所述内容索引数据库中 ,将高于所述阈值的所述内容的元数据以及所述内 容的第一信息中除所述内容元数据之外的所述内容的其他相关信息以及所述内 容索引存储到内容信息数据库中。 Each metadata of the content in the first information of the content is compared with a preset data size threshold, and the content index and the metadata of the content that are not higher than the threshold are stored in In the content index database, metadata of the content that is higher than the threshold and other relevant information of the content other than the content metadata in the first information of the content and the content index Stored in the content information database.
10、 根据权利要求 9所述的方法,其特征在于,接收对数据量大小阈值的 设定。 10. The method according to claim 9, characterized by receiving a setting of a data size threshold.
11、 根据权利要求 9或 10所述的方法,其特征在于,在将高于所述阈值的 所述内容的元数据以及所述内容的第一信息中除所述内容元数据之外的所述内 容的其他相关信息以及所述内容索引存储到内容信息数据库中之前,所述方法 还包括: 11. The method according to claim 9 or 10, wherein all metadata of the content that is higher than the threshold and the first information of the content except the metadata of the content are Before storing other relevant information of the content and the content index into the content information database, the method further includes:
对除所述内容元数据之外的所述内容的其他相关信息进行合法性验证得到 验证为合法的所述内容的第二信息; Perform legality verification on other relevant information of the content in addition to the content metadata to obtain Second information verifying that the content is legal;
所述将高于所述阈值的所述内容的元数据以及所述内容的第一信息中除所 述内容元数据之外的所述内容的其他相关信息以及所述内容索引存储到内容信 息数据库中 ,具体包括:将高于所述阈值的所述内容的元数据、 所述内容的第 二信息以及所述内容索引存储到所述内容信息数据库中。 storing the metadata of the content that is higher than the threshold and other relevant information of the content other than the content metadata in the first information of the content and the content index into a content information database Specifically, the method specifically includes: storing the metadata of the content, the second information of the content, and the content index that are higher than the threshold into the content information database.
12、 根据权利要求 9-1 1任一所述的方法,其特征在于,所述方法还包括: 对除所述内容的元数据之外的所述内容的其他相关信息或所述内容的第二 信息进行可检索性判断,并将通过可检索性判断的信息标识为所述内容的第三 碧、 ο 12. The method according to any one of claims 9-11, characterized in that the method further includes: other relevant information of the content other than the metadata of the content or the third element of the content. The second information is judged for retrieval, and the information that passes the retrieval judgment is marked as the third part of the content.
13、 根据权利要求 12所述的方法,其特征在于,所述方法还包括: 将高于所述阈值的所述内容的元数据、 所述内容的第三信息以及所述内容 索引根据预设的配置模板导入至全文检索库中。 13. The method according to claim 12, characterized in that the method further comprises: adjusting the metadata of the content higher than the threshold, the third information of the content and the content index according to a preset value. The configuration template is imported into the full-text search library.
14、 根据权利要求 13所述的方法,其特征在于,所述方法还包括: 当接收到某内容被临时删除的通知时,删除所述全文检索库中所述内容的 数据;以及用于当接收到所述某内容被恢复的通知时,通知所述内容信息检索 模块将高于所述阈值的所述内容的元数据、 所述内容的第三信息以及所述内容 索引根据预设的配置模板重新导入至所述全文检索库中。 14. The method according to claim 13, characterized in that the method further includes: when receiving a notification that a certain content is temporarily deleted, deleting the data of the content in the full-text retrieval library; and When receiving a notification that a certain content is restored, the content information retrieval module is notified to retrieve metadata of the content higher than the threshold, third information of the content, and content index according to a preset configuration. The template is re-imported into the full-text search library.
15、 根据权利要求 13所述的方法,其特征在于,所述方法还包括: 当接收到某内容被临时删除的通知时,将所述全文检索库中所述内容的' 内 容可用" 字段设置为" 不可用" ;以及用于当接收到所述某内容被恢复的通知时, 将所述全文检索库中所述内容的" 内容可用" 字段重新设置为" 可用"。 15. The method according to claim 13, characterized in that the method further includes: when receiving a notification that a certain content is temporarily deleted, retrieving the contents of the content in the full-text retrieval library. Set the "content available" field to "unavailable"; and be used to reset the "content available" field of the content in the full-text retrieval library to "available" when receiving a notification that the content is restored.
16、 根据权利要求 9-15任一所述的方法,其特征在于,所述方法还包括: 接收内容信息查询请求,对所述查询请求进行解析,获取待查询内容; 将所述待查询内容在所述内容索引数据库中的进行检索,当检索到所述待查 询内容的信息时,则将所述待查询内容的信息发送给所述内容信息查询请求的 发出者; 当未检索到结果时,则将所述待查询内容在所述全文检索库中进行检 索,若检索到所述待查询内容的信息时,则将所述待查询内容的信息发送给所 述内容信息查询请求的发出者,若检索到所述待查询内容的内容标识时,则利 用所述待查询内容的内容标识在所述内容信息数据库中查询得到所述待查询内 容的信息,并将所述待查询内容的信息发送给所述内容信息查询请求的发出者。 16. The method according to any one of claims 9 to 15, characterized in that the method further includes: receiving a content information query request, parsing the query request, and obtaining the content to be queried; Search in the content index database, and when the information of the content to be queried is retrieved, the information of the content to be queried is sent to the sender of the content information query request; when no result is retrieved. , then the content to be queried is retrieved in the full-text retrieval database. If the information of the content to be queried is retrieved, the information of the content to be queried is sent to the sender of the content information query request. , if the content identifier of the content to be queried is retrieved, the content identifier of the content to be queried is used to query the content information database to obtain the information of the content to be queried, and the information of the content to be queried is Sent to the sender of the content information query request.
PCT/CN2013/084854 2013-10-08 2013-10-08 Method and system for processing content information WO2015051499A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201380079592.4A CN105531697B (en) 2013-10-08 2013-10-08 A kind of content information processing method and system
PCT/CN2013/084854 WO2015051499A1 (en) 2013-10-08 2013-10-08 Method and system for processing content information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2013/084854 WO2015051499A1 (en) 2013-10-08 2013-10-08 Method and system for processing content information

Publications (1)

Publication Number Publication Date
WO2015051499A1 true WO2015051499A1 (en) 2015-04-16

Family

ID=52812425

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/084854 WO2015051499A1 (en) 2013-10-08 2013-10-08 Method and system for processing content information

Country Status (2)

Country Link
CN (1) CN105531697B (en)
WO (1) WO2015051499A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948440A (en) * 2021-03-09 2021-06-11 北京小米移动软件有限公司 Page data processing method and device, terminal and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1949223A (en) * 2006-12-01 2007-04-18 金蝶软件(中国)有限公司 Multidimensional data reading and writing method and apparatus in on-line analysing processing system
US20100146004A1 (en) * 2005-07-20 2010-06-10 Siew Yong Sim-Tang Method Of Creating Hierarchical Indices For A Distributed Object System
CN102024057A (en) * 2010-12-24 2011-04-20 中兴通讯股份有限公司 Method and device for building index of mass data record
CN102542019A (en) * 2011-12-19 2012-07-04 北京地拓科技发展有限公司 Identification code storage method and identification code storage system as well as identification code indexing method and identification code indexing system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101340036B1 (en) * 2007-07-10 2013-12-10 삼성전자주식회사 Method for generating Electronic Content Guide and apparatus therefor
KR20090025607A (en) * 2007-09-06 2009-03-11 삼성전자주식회사 Method for updating a metadata of contents and apparatus therefor
WO2009146087A1 (en) * 2008-04-01 2009-12-03 Yahoo! Inc. Open framework for integrating, associating and interacting with content objects

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100146004A1 (en) * 2005-07-20 2010-06-10 Siew Yong Sim-Tang Method Of Creating Hierarchical Indices For A Distributed Object System
CN1949223A (en) * 2006-12-01 2007-04-18 金蝶软件(中国)有限公司 Multidimensional data reading and writing method and apparatus in on-line analysing processing system
CN102024057A (en) * 2010-12-24 2011-04-20 中兴通讯股份有限公司 Method and device for building index of mass data record
CN102542019A (en) * 2011-12-19 2012-07-04 北京地拓科技发展有限公司 Identification code storage method and identification code storage system as well as identification code indexing method and identification code indexing system

Also Published As

Publication number Publication date
CN105531697A (en) 2016-04-27
CN105531697B (en) 2018-12-14

Similar Documents

Publication Publication Date Title
US20240111812A1 (en) System and methods for metadata management in content addressable storage
US10104021B2 (en) Electronic mail data modeling for efficient indexing
US11301425B2 (en) Systems and computer implemented methods for semantic data compression
EP3251031B1 (en) Techniques for compact data storage of network traffic and efficient search thereof
US10037341B1 (en) Nesting tree quotas within a filesystem
US8645349B2 (en) Indexing structures using synthetic document summaries
US7610285B1 (en) System and method for classifying objects
US8938430B2 (en) Intelligent data archiving
US10997037B1 (en) Method and system for enhanced backup database indexing
JP2016512634A (en) Content class for object storage indexing system
CN111274294B (en) Universal distributed heterogeneous data integrated logic convergence organization, release and service method and system
WO2012149884A1 (en) File system, and method and device for retrieving, writing, modifying or deleting file
WO2013097231A1 (en) File access method and system
US20140195532A1 (en) Collecting digital assets to form a searchable repository
US20120089648A1 (en) Crowd sourcing for file recognition
WO2021043088A1 (en) File query method and device, and computer device and storage medium
US11573961B2 (en) Delta graph traversing system
CN104462096B (en) Public sentiment method for monitoring and analyzing and device
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
CN111221785A (en) Semantic data lake construction method of multi-source heterogeneous data
CN116860825B (en) Verifiable retrieval method and system based on blockchain
KR20160050930A (en) Apparatus for Processing Transaction with Modification of Data in Large-Scale Distributed File System and Computer-Readable Recording Medium with Program
WO2024082525A1 (en) File snapshot method and system, electronic device, and storage medium
WO2015051499A1 (en) Method and system for processing content information
CN108228101B (en) Method and system for managing data

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201380079592.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13895152

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13895152

Country of ref document: EP

Kind code of ref document: A1